169 45
English Pages 717 [692] Year 2023
Lecture Notes in Networks and Systems 689
Harish Sharma Vivek Shrivastava Kusum Kumari Bharti Lipo Wang Editors
Communication and Intelligent Systems Proceedings of ICCIS 2022, Volume 2
Lecture Notes in Networks and Systems Volume 689
Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Fernando Gomide, Department of Computer Engineering and Automation—DCA, School of Electrical and Computer Engineering—FEEC, University of Campinas—UNICAMP, São Paulo, Brazil Okyay Kaynak, Department of Electrical and Electronic Engineering, Bogazici University, Istanbul, Türkiye Derong Liu, Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, USA Institute of Automation, Chinese Academy of Sciences, Beijing, China Witold Pedrycz, Department of Electrical and Computer Engineering, University of Alberta, Alberta, Canada Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Marios M. Polycarpou, Department of Electrical and Computer Engineering, KIOS Research Center for Intelligent Systems and Networks, University of Cyprus, Nicosia, Cyprus Imre J. Rudas, Óbuda University, Budapest, Hungary Jun Wang, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
The series “Lecture Notes in Networks and Systems” publishes the latest developments in Networks and Systems—quickly, informally and with high quality. Original research reported in proceedings and post-proceedings represents the core of LNNS. Volumes published in LNNS embrace all aspects and subfields of, as well as new challenges in, Networks and Systems. The series contains proceedings and edited volumes in systems and networks, spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems and other. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution and exposure which enable both a wide and rapid dissemination of research output. The series covers the theory, applications, and perspectives on the state of the art and future developments relevant to systems and networks, decision making, control, complex processes and related areas, as embedded in the fields of interdisciplinary and applied sciences, engineering, computer science, physics, economics, social, and life sciences, as well as the paradigms and methodologies behind them. Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science. For proposals from Asia please contact Aninda Bose ([email protected]).
Harish Sharma · Vivek Shrivastava · Kusum Kumari Bharti · Lipo Wang Editors
Communication and Intelligent Systems Proceedings of ICCIS 2022, Volume 2
Editors Harish Sharma Department of Computer Science and Engineering Rajasthan Technical University Kota, India Kusum Kumari Bharti PDPM Indian Institute of Information Technology, Design and Manufacturing Jabalpur, Madhya Pradesh, India
Vivek Shrivastava National Institute of Technology Delhi New Delhi, India Lipo Wang School of Electrical and Electronic Engineering Nanyang Technological University Singapore, Singapore
ISSN 2367-3370 ISSN 2367-3389 (electronic) Lecture Notes in Networks and Systems ISBN 978-981-99-2321-2 ISBN 978-981-99-2322-9 (eBook) https://doi.org/10.1007/978-981-99-2322-9 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Preface
This book contains outstanding research papers as the proceedings of the 4th International Conference on Communication and Intelligent Systems (ICCIS 2022), which was held on 19–20 December 2022, at National Institute of Technology Delhi, India, under the technical sponsorship of the Soft Computing Research Society, India. The conference is conceived as a platform for disseminating and exchanging ideas, concepts, and results of researchers from academia and industry to develop a comprehensive understanding of the challenges of the advancements of intelligence in computational viewpoints. This book will help in strengthening congenial networking between academia and industry. This book presents novel contributions in areas of communication and intelligent systems, and it serves as reference material for advanced research. The topics covered: intelligent system: algorithms and applications, intelligent data analytics and computing, informatics and applications, and communication and control systems. ICCIS 2022 received a significant number of technical contributed articles from distinguished participants from home and abroad. ICCIS 2022 received 410 research submissions. After a very stringent peer-reviewing process, only 108 high-quality papers were finally accepted for the presentation and the final proceedings. This book presents second volume of 52 research papers related to communication and intelligent systems and serves as the reference material for advanced research. Kota, India New Delhi, India Singapore Jabalpur, India
Harish Sharma Vivek Shrivastava Lipo Wang Kusum Kumari Bharti
v
Contents
Development of PMSM Servo Driver for CNC Machines Using TMS28379D Microcontroller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Quoc Van Nguyen, Chi-Ngon Nguyen, and Thang Viet Tran
1
A Probabilistic Method to Identify HTTP/1.1 Slow Rate DoS Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nemalikanti Anand and M. A Saifulla
17
Transmission Pricing Using MW Mile Method in Deregulated Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gummadi Srinivasa Rao, V. Hari Vamsi, and B. Venkateswararao
29
Performance Analysis of Nonlinear Companding Techniques for PAPR Mitigation in 5G GFDM Systems . . . . . . . . . . . . . . . . . . . . . . . . . . Neethu Radha Gopan and S. Sujatha
39
Tensor Completion-Based Data Imputation Framework for IoT-Based Underwater Sensor Network . . . . . . . . . . . . . . . . . . . . . . . . . . Govind P. Gupta and Prince Rajak
53
Pre-training Classification and Clustering Models for Vietnamese Automatic Text Summarization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ti-Hon Nguyen and Thanh-Nghi Do
65
Identifying Critical Transition in Bitcoin Market Using Topological Data Analysis and Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anusha Bansal, Aakanksha Singh, Sakshi Vats, and Khyati Ahlawat
79
Healthcare Information Exchange Using Blockchain Technology . . . . . . . Aman Ramani, Dhairya Chhabra, Varun Manik, Gautam Dayama, and Amol Dhumane
91
vii
viii
Contents
Multilevel Credit Card Fraud Detection Using Face Recognition and Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Tushar Deshpande, Shourjadeep Datta, Rushabh Shah, Vatsal Doshi, and Deepika Dongre Transfer Learning-Based End-to-End Indian English Recognition System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Shambhu Sharan, Amita Dev, and Poonam Bansal Impact of COVID-19 on the Sectors of the Indian Economy and the World . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Rahul Gite, H. Vathsala, and Shashidhar S. Koolagudi Wind Farm Layout Optimization Problem Using Teaching–Learning-Based Optimization Algorithm . . . . . . . . . . . . . . . . . . . 151 Mukesh Kumar and Ajay Sharma An Ensemble Multimodal Fusion Using Naive Bayes Approach for Haptic Identification of Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 R. Aravind Sekhar and K. G. Sreeni Chaotic Maps and DCT-based Image Steganography-cum-encryption Hybrid Approach . . . . . . . . . . . . . . . . . . . . 181 Butta Singh, Manjit Singh, and Himali Sarangal An Empirical Analysis and Challenging Era of Blockchain in Green Society . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 G. A. Senthil, R. Prabha, B. Divya, and A. Sathya 3D Modeling of Automated Robot for Seeding and Transplantation of Rice and Wheat Crops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 G. Venkata Sai Krishna, Palanki Amitasree, P. V. Manitha, and M. Rajesh Metaheuristics based Task Offloading Framework in Fog Computing for Latency-sensitive Internet of Things Applications . . . . . . 221 Priya Thomas and Deepa V. Jose Empirical Evaluation of Microservices Architecture . . . . . . . . . . . . . . . . . . 241 Neha Kaushik, Harish Kumar, and Vinay Raj Thermoelastic Energy Dissipation Trimming at High Temperatures in Cantilever Microbeam Sensors for IoT Applications . . . . . . . . . . . . . . . . 255 R. Resmi, V. Suresh Babu, and M. R. Baiju Contactless Fingerprint Matching: A Pandemic Obligation . . . . . . . . . . . . 265 Payal Singh and Diwakar Agarwal
Contents
ix
Deep Reinforcement Learning to Solve Stochastic Vehicle Routing Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 Sergio Flavio Marroquín-Cano, Elías Neftalí Escobar-Gómez, Eduardo F. Morales, Eduardo Chandomi-Castellanos, and Elizeth Ramirez-Alvarez Applying Machine Learning for American Sign Language Recognition: A Brief Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 Shashank Kumar Singh and Amrita Chaturvedi Identification of Promising Biomarkers in Cancer Diagnosis Using a Hybrid Model Combining ReliefF and Grey Wolf Optimization . . . . . . 311 Sayantan Dass, Sujoy Mistry, and Pradyut Sarkar Prediction of Ectopic Pregnancy in Women Using Hybrid Machine Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 Vimala Nagabotu and Anupama Namburu Redundancy Reduction and Adaptive Bit Length Encoding-Based Purely Lossless ECG Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 Butta Singh, Neetika Soni, and Indu Saini Antecedents, Barriers, and Challenges of Artificial Intelligence Adoption for Supply Chains: A Tactical Review . . . . . . . . . . . . . . . . . . . . . . 357 Kalya Lakshmi Sainath and Lakshmi Devasena C Statistical Influence of Parameters on the Performance of SDN . . . . . . . . 369 Deepjyot Kaur Ryait and Manmohan Sharma Investigations on Channel Characteristics and Range Prediction of 5G mmWave (39 GHz) Wireless Communication System . . . . . . . . . . . . 385 I. Johnsi Stella and B. Victoria Jancee Cervical Spine Fracture Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401 Flemin Thomas and P. Savaridassan Unmanned Ground Vehicle for Survey of Endangered Species . . . . . . . . . 411 Kesia Mary Joies, Rahul Sunil, Jisha Jose, and Vishnu P. Kumar AI/AR and Indian Classical Dance—An Online Learning System to Revive the Rich Cultural Heritage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419 Gayatri Ghodke and Pranita Ranade Performance Analysis of Classification Algorithms for the Prediction of Cardiac Disease . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433 N. Jagadeesan and T. Velmurugan Fractional Order Controller Design Based on Inverted Decouple Model and Smith Predictor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447 R. Hanuma Naik, P. V. Gopikrishna Rao, and D. V. Ashok Kumar
x
Contents
IoT-Sensed Data for Data Integration Using Intelligent Decision-Making Algorithm Through Fog Computing . . . . . . . . . . . . . . . . 463 B. Maria Joseph and K. K. Baseer Jamming Attack Classification in Wireless Networks Using Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477 R. Arathy, Bidisha Bhabani, Madhuri Malakar, Judhistir Mahapatro, and Pushpendu Kar Heuristic Scheduling in Automotive, Shipping, Oil and Gas, and Healthcare Domains: A Mini Case Study . . . . . . . . . . . . . . . . . . . . . . . . 493 Preethi Sheba Hepisba Darius, Joshua Devadason, and Darius Gnanaraj Solomon Detection of Mental Health Using Deep Learning Technique . . . . . . . . . . . 507 Cynthia Jayapal, S. M. Yamuna, S. Manavallan, and M. Devasenan Performance Analysis of Neural Machine Translation Models for ASL to ASL Gloss Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521 Prachi P. Waghmare and Ashwini M. Deshpande Contribution Title High Accuracy Dataset Control from Solar Photovoltaic Arrays by Decision Tree-Based System . . . . . . . . . . . . . . . . . . 531 R. Usha Rani and M. Lakshmi Swarupa Design of Flexible Antenna with Defected Ground Structure . . . . . . . . . . . 541 Sultana Khatoon, Neetu Marwah, and Jamil Akhtar A Reduced-Memory Multi-layer Perceptron with Systematic Network Weights Generated and Trained Through Distribution Hyper-parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553 Neha Vinayak and Shandar Ahmad Evaluation of Depression Detection in Sentiment Analysis Through Machine Learning Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567 Kusumlata Jain, Smaranika Mohapatra, Riyanshi Bohra, and V. V. S. S. Varun Metabolic Pathway Class Prediction Using Graph Convolutional Network (GCN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577 Ippatapu Venkata Srisurya, K. Mukesh, and I. R. Oviya Cardiovascular Disease Prediction Using Machine Learning Techniques with HyperOpt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585 D. Yaso Omkari and Snehal B. Shinde A Lightweight Object Detection Model to Detect Pneumonia Types . . . . 599 Mohammed Saifuddin Munna and Quazi Delwar Hossain
Contents
xi
Classification and Prediction of Medical Dataset Using Ensemble DeepQ Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611 Rahama Salman, Subodhini Gupta, and Neelesh Jain Object Detection by Tiny-YOLO on TurtleBot3 as an Educational Robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 619 Reza Moezzi, Adrian Saw, Stefan Bischoff, Jindrich Cyrus, and Jaroslav Hlava Intelligent Structural Damage Detection with MEMS-Like Sensors Noisy Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 631 Jonathan Melchiorre, Laura Sardone, Marco Martino Rosso, and Angelo Aloisio A SENS Score of Rheumatoid Arthritis Detection Using Customized Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . 643 G. S. Mate, A. N. Paithane, and N. M. Ranjan Comparative Analysis of High-Risk Pregnancy Prediction Using Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653 Priyanka, Sonali Goyal, and Ruby Bhatia A Customizable Mathematical Model for Determining the Difficulty of Guitar Triad Chords for Machine Learning . . . . . . . . . . . 667 Nipun Sharma and Swati Sharma Load Frequency Control of Interconnected Hybrid Power System . . . . . . 681 Deepesh Sharma and Rajni Bala Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693
Editors and Contributors
About the Editors Harish Sharma is an associate professor at Rajasthan Technical University, Kota, in the Department of Computer Science and Engineering. He has worked at Vardhaman Mahaveer Open University Kota and Government Engineering College Jhalawar. He received his B.Tech. and M.Tech. degree in Computer Engineering from Government Engineering College, Kota, and Rajasthan Technical University, Kota, in 2003 and 2009, respectively. He obtained his Ph.D. from ABV—Indian Institute of Information Technology and Management, Gwalior, India. He is the secretary and one of the founder members of Soft Computing Research Society of India. He is a lifetime member of Cryptology Research Society of India, ISI, Kolkata. He is an associate editor of International Journal of Swarm Intelligence (IJSI) published by Inderscience. He has also edited special issues of the many reputed journals like Memetic Computing, Journal of Experimental and Theoretical Artificial Intelligence, Evolutionary Intelligence, etc. His primary area of interest is nature-inspired optimization techniques. He has contributed more than 105 papers published in various international journals and conferences. Dr. Vivek Shrivastava has approx. 20 years of diversified experience of scholarship of teaching and learning, accreditation, research, industrial, and academic leadership in India, China, and USA. Presently, he is holding the position of Dean of Research and Consultancy at the National Institute of Technology, Delhi. Prior to his academic assignments, he has worked as a system reliability engineer at SanDisk Semiconductors, Shanghai, China, and USA. Dr. Shrivastava has significant industrial experience collaborating with industry and government organizations at SanDisk Semiconductors; he has made a significant contribution to the design development of memory products. He has contributed to the development and delivery of Five-Year Integrated B.Tech—M. Tech. Program (Electrical Engineering) and Master program (Power Systems) at Gautam Buddha University Greater Noida. He has extensive experience
xiii
xiv
Editors and Contributors
in academic administration in various capacities of Dean (Research and Consultancy), Dean (Student Welfare), Faculty In-charge (Training and Placement), Faculty In-charge (Library), Nodal Officer (Academics, TEQIP-III), Nodal Officer RUSA, Experts in various committees in AICTE, UGC, etc. Dr. Shrivastava has carried out research and consultancy and attracted significant funding projects from the Ministry of Human Resources and Development, Government of India, Board of Research in Nuclear Science (BRNS) subsidiary organization of Bhabha Atomic Research Organization. Dr. Shrivastava has published over 80 journal articles, presented papers at conferences, and published several chapters in books. He has supervised five Ph.D. and 16 Master’s students and currently supervising several Ph.D. students. His diversified research interests are in the areas of reliability engineering, renewable energy, and conventional power systems which include wind, photovoltaic (PV), hybrid power systems, distributed generation, grid integration of renewable energy, power systems analysis, and smart grid. Dr. Shrivastava is an editor/associate editor of the journals, International Journal of Swarm Intelligence (IJSI) and International Journal of System Assurance Engineering and Management. He is a fellow of the Institution of Engineers (India) and a senior member of the Institute of Electrical and Electronics Engineers (IEEE). Dr. Kusum Kumari Bharti is an assistant professor at PDPM IIITDM Jabalpur. Dr. Bharti has obtained her Ph.D. in Computer Science and Engineering from ABVIIITM Gwalior. She has guided six M.Tech. and presently guiding two Ph.D. students and five M.Tech. students. She has published more than 12 journal and conference papers in the area of text clustering, data mining, online social network, and soft computing. She has been an active member of many organizing committees of various conferences, workshops, and faculty development programs. Her research areas include machine learning, data mining, machine translation, online social network, and soft computing. Dr. Lipo Wang received Bachelor’s degree from the National University of Defense Technology (China) and Ph.D. from Louisiana State University (USA). He is presently on the faculty of the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore. His research interest is artificial intelligence with applications to image/video processing, biomedical engineering, and data mining. He has 330+ publications, a US patent in neural networks and a patent in systems. He has co-authored two monographs and (co-)edited 15 books. He has 8,000+ Google Scholar citations, with H-index 43. He was a keynote speaker for 36 international conferences. He is/was an associate editor/editorial board member of 30 international journals, including four IEEE Transactions, and a guest editor for 10 journal special issues. He was a member of the Board of Governors of the International Neural Network Society, IEEE Computational Intelligence Society (CIS), and the IEEE Biometrics Council. He served as a CIS vice-president for Technical Activities and chair of Emergent Technologies Technical Committee, as well as chair of Education Committee of the IEEE Engineering in Medicine and Biology Society (EMBS). He was the president of the Asia-Pacific Neural Network Assembly
Editors and Contributors
xv
(APNNA) and received the APNNA Excellent Service Award. He was the founding chair of both the EMBS Singapore Chapter and CIS Singapore Chapter. He serves/ served as a chair/committee member of over 200 international conferences.
Contributors Diwakar Agarwal GLA University, Mathura, India Khyati Ahlawat Department of Computer Science and Engineering, Indira Gandhi Delhi Technical University for Women, Delhi, India Shandar Ahmad School of Computational and Integrative Sciences (SCIS), Jawaharlal Nehru University (JNU), New Delhi, India Jamil Akhtar Department of Electronics and Communication Engineering, Manipal University Jaipur, Jaipur, India Angelo Aloisio DICEAA, Department of Civil, Construction-Architecture and Environmental Engineering, Università degli Studi dell’Aquila, Monteluco di Roio, L’Aquila, Italy Palanki Amitasree Department of Electrical and Electronics Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Bengaluru, India Nemalikanti Anand University of Hyderabad, Hyderabad, India; SCIS, Hyderabad, India R. Arathy National Institute of Technology, Rourkela, India R. Aravind Sekhar Department of Electronics and Communication, College of Engineering Trivandrum, Trivandrum, India V. Suresh Babu College of Engineering Trivandrum, APJ Abdul Kalam Technological University, Thiruvananthapuram, India M. R. Baiju University of Kerala, Thiruvananthapuram, Kerala, India Rajni Bala Department of Physics, Maharshi Dayanand University, Rohtak, Haryana, India Anusha Bansal Department of Computer Science and Engineering, Indira Gandhi Delhi Technical University for Women, Delhi, India Poonam Bansal Indira Gandhi Delhi Technical University for Women, New Delhi, India K. K. Baseer Department of Information Technology, Sree Vidyanikethan Engineering College, Tirupati, Andhra Pradesh, India Bidisha Bhabani National Institute of Technology, Rourkela, India
xvi
Editors and Contributors
Ruby Bhatia Department of Obstetrics and Gynaecology, Maharishi Markandeshwar (Deemed to Be University), Mullana, Haryana, India Stefan Bischoff Faculty of Electrical Engineering and Computer Science, Görlitz University of Applied Sciences, Zittau, Germany Riyanshi Bohra Department of IT, Manipal University Jaipur, Jaipur, Rajasthan, India Eduardo Chandomi-Castellanos Tecnológico Nacional de México, Instituto Tecnológico de Tuxtla Gutierrez, Tuxtla Gutierrez, Chiapas, Mexico Amrita Chaturvedi Indian Institute of Technology (BHU), Varanasi, India Dhairya Chhabra Department Computer Science, Pimpri-Chinchwad College of Engineering, Pune, India Jindrich Cyrus Institute for Nanomaterials, Advanced Technologies and Innovation, Technical University of Liberec, Liberec, Czech Republic Preethi Sheba Hepisba Darius CMR Institute of Technology, Bengaluru, India Sayantan Dass Department of Computer Science and Engineering, Maulana Abul Kalam Azad University of Technology, Nadia, West Bengal, India Shourjadeep Datta Department of Computer Engineering, D. J. Sanghvi College of Engineering, Mumbai, India Gautam Dayama Department Computer Science, Pimpri-Chinchwad College of Engineering, Pune, India Ashwini M. Deshpande Electronics and Telecommunication Department, MKSSS’s Cummins College of Engineering for Women, Pune, Maharashtra, India Tushar Deshpande Department of Computer Engineering, D. J. Sanghvi College of Engineering, Mumbai, India Amita Dev Indira Gandhi Delhi Technical University for Women, New Delhi, India Joshua Devadason Coimbatore Institute of Technology, Coimbatore, India M. Devasenan Department of Computer Science and Engineering, Kumaraguru College of Technology, Coimbatore, India Amol Dhumane Department Computer Science, Pimpri-Chinchwad College of Engineering, Pune, India B. Divya Department of Computer Science and Engineering, Sri Sai Ram Institute of Technology, Chennai, India Thanh-Nghi Do Can Tho University, Can Tho, Vietnam; UMI UMMISCO 209, Paris, France
Editors and Contributors
xvii
Deepika Dongre Department of Computer Engineering, D. J. Sanghvi College of Engineering, Mumbai, India Vatsal Doshi Department of Computer Engineering, D. J. Sanghvi College of Engineering, Mumbai, India Elías Neftalí Escobar-Gómez Tecnológico Nacional de México, Tecnológico de Tuxtla Gutierrez, Tuxtla Gutierrez, Chiapas, Mexico
Instituto
Gayatri Ghodke Symbiosis International (Deemed University), Symbiosis Institute of Design, Pune, Maharashtra, India Rahul Gite Department of CSE, NITK, Mangalore, Karnataka, India Neethu Radha Gopan Department of ECE, Rajagiri School of Engineering and Technology, Kochi, India Sonali Goyal Department of CSE, MMEC, Maharishi Markandeshwar (Deemed to be University), Mullana, Haryana, India Govind P. Gupta Department of Information Technology, National Institute of Technology, Raipur, India Subodhini Gupta Department of Computer Application, School of Information Technology, SAM Global University, Bhopal, MP, India V. Hari Vamsi Department of EEE, Velagapudi Ramakrishna Siddhartha Engineering College, Kanuru, A.P., India Jaroslav Hlava Faculty of Mechatronics, Informatics and Interdisciplinary Studies, Technical University of Liberec, Liberec, Czech Republic Quazi Delwar Hossain Chittagong University of Engineering and Technology, Chittagong, Bangladesh N. Jagadeesan Department of Information Technology and B.C.A, Dwaraka Doss Goverdhan Doss Vaishnav College, Chennai, India Kusumlata Jain Department of CCE, Manipal University Jaipur, Jaipur, Rajasthan, India Neelesh Jain Department of Computer Science and Engineering, SAM Global University, Bhopal, MP, India B. Victoria Jancee St. Joseph’s College of Engineering, Chennai, India Cynthia Jayapal Department of Computer Science and Engineering, Kumaraguru College of Technology, Coimbatore, India Kesia Mary Joies Department of Computer Science and Engineering, Mar Baselios College of Engineering and Technology, Thiruvananthapuram, Kerala, India Deepa V. Jose CHRIST (Deemed to Be University), Bangalore, Karnataka, India
xviii
Editors and Contributors
Jisha Jose Department of Computer Science and Engineering, Mar Baselios College of Engineering and Technology, Thiruvananthapuram, Kerala, India B. Maria Joseph CSE Department, Jawaharlal Nehru Technological University Anantapur, Ananthapuramu, Andhra Pradesh, India Pushpendu Kar University of Nottingham Ningbo China, Ningbo, China Neha Kaushik Department of Computer Engineering, JC Bose University of Science and Technology, Faridabad, India Sultana Khatoon Department of Electronics and Communication Engineering, Manipal University Jaipur, Jaipur, India Shashidhar S. Koolagudi Department of CSE, NITK, Mangalore, Karnataka, India G. Venkata Sai Krishna Department of Electrical and Electronics Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Bengaluru, India D. V. Ashok Kumar RGM College of Engineering and Technology, Nandyal, India Harish Kumar Department of Computer Engineering, JC Bose University of Science and Technology, Faridabad, India Mukesh Kumar Rajasthan Technical Univesity, Kota, India Vishnu P. Kumar Department of Computer Science and Engineering, Mar Baselios College of Engineering and Technology, Thiruvananthapuram, Kerala, India Lakshmi Devasena C IBS Hyderabad, IFHE University, Hyderabad, India Judhistir Mahapatro National Institute of Technology, Rourkela, India Madhuri Malakar National Institute of Technology, Rourkela, India S. Manavallan Department of Computer Science and Engineering, Kumaraguru College of Technology, Coimbatore, India Varun Manik Department Computer Science, Pimpri-Chinchwad College of Engineering, Pune, India P. V. Manitha Department of Electrical and Electronics Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Bengaluru, India Sergio Flavio Marroquín-Cano Tecnológico Nacional de México, Instituto Tecnológico de Tuxtla Gutierrez, Tuxtla Gutierrez, Chiapas, Mexico Neetu Marwah Department of Electronics and Communication Engineering, Manipal University Jaipur, Jaipur, India G. S. Mate JSPM’s Rajarshi Shahu College of Engineering Pune University, Pune, India
Editors and Contributors
xix
Jonathan Melchiorre DISEG, Department of Structural, Geotechnical and Building Engineering, Politecnico di Torino, Turin, Italy Sujoy Mistry Department of Computer Science and Engineering, Maulana Abul Kalam Azad University of Technology, Nadia, West Bengal, India Reza Moezzi Institute for Nanomaterials, Advanced Technologies and Innovation, Technical University of Liberec, Liberec, Czech Republic; Faculty of Mechatronics, Informatics and Interdisciplinary Studies, Technical University of Liberec, Liberec, Czech Republic Smaranika Mohapatra Department of IT, Manipal University Jaipur, Jaipur, Rajasthan, India Eduardo F. Morales Instituto Nacional de Astrofísica Óptica y Electrónica, Tonantzintla, Puebla, Mexico K. Mukesh Department of Computer Science and Engineering (AIE), Amrita School of Computing, Chennai, India Mohammed Saifuddin Munna Chittagong University of Engineering and Technology, Chittagong, Bangladesh Vimala Nagabotu School of Computer Science and Engineering, VIT-AP University, Near Vijayawada, Andhra Pradesh, India R. Hanuma Naik RGM College of Engineering and Technology, Nandyal, India Anupama Namburu School of Computer Science and Engineering, VIT-AP University, Near Vijayawada, Andhra Pradesh, India Chi-Ngon Nguyen College of Engineering, Can Tho University, Can Tho City, Vietnam Ti-Hon Nguyen Can Tho University, Can Tho, Vietnam D. Yaso Omkari School of Computer Science and Engineering, VIT-AP University, Andhra Pradesh, India I. R. Oviya Department of Computer Science and Engineering (AIE), Amrita School of Computing, Chennai, India A. N. Paithane JSPM’s Rajarshi Shahu College of Engineering Pune University, Pune, India R. Prabha Department of Electronics and Communication Engineering, Sri Sai Ram Institute of Technology, Chennai, India Priyanka Department of CSE, MMEC, Maharishi Markandeshwar (Deemed to be University), Mullana, Haryana, India Vinay Raj Department of Computer Applications, National Institute of Technology Tiruchirappalli, Tiruchirappalli, India
xx
Editors and Contributors
Prince Rajak Department of Information Technology, National Institute of Technology, Raipur, India M. Rajesh Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Bengaluru, India Aman Ramani Department Computer Science, Pimpri-Chinchwad College of Engineering, Pune, India Elizeth Ramirez-Alvarez Tecnológico Nacional de México, Instituto Tecnolgico de Lázaro Cárdenas, Lázaro Cárdenas, Michoacán, Mexico Pranita Ranade Symbiosis International (Deemed University), Symbiosis Institute of Design, Pune, Maharashtra, India R. Usha Rani CSE Department, CVR College of Engineering, Ibrahimpatan, Hyderabad, India N. M. Ranjan JSPM’s Rajarshi Shahu College of Engineering Pune University, Pune, India Gummadi Srinivasa Rao Department of EEE, Siddhartha Engineering College, Kanuru, A.P., India
Velagapudi
Ramakrishna
P. V. Gopikrishna Rao RGM College of Engineering and Technology, Nandyal, India R. Resmi LBS Institute of Technology for Women, University of Kerala, Thiruvananthapuram, Kerala, India Marco Martino Rosso DISEG, Department of Structural, Geotechnical and Building Engineering, Politecnico di Torino, Turin, Italy Deepjyot Kaur Ryait School of Computer Applications, Lovely Professional University, Phagwara, India M. A Saifulla University of Hyderabad, Hyderabad, India; SCIS, Hyderabad, India Kalya Lakshmi Sainath IBS Hyderabad, IFHE University, Hyderabad, India Indu Saini Department of Electronics and Communication Engineering, Dr. B. R. Ambedkar National Institute of Technology, Jalandhar, India Rahama Salman Department of Computer Science, SAM Global University, Bhopal, MP, India Himali Sarangal Department of Engineering and Technology, Guru Nanak Dev University, Regional Campus Jalandhar, India Laura Sardone DICAR, Department of Civil Engineering and Architecture Sciences, Politecnico di Bari, Bari, Italy
Editors and Contributors
xxi
Pradyut Sarkar Department of Computer Science and Engineering, Maulana Abul Kalam Azad University of Technology, Nadia, West Bengal, India A. Sathya Department of Artificial Intelligence and Data Science, Sri Sai Ram Institute of Technology, Chennai, India P. Savaridassan Department of Networking and Communications, School of Computing, SRM IST, Chennai, India Adrian Saw Faculty of Electrical Engineering and Computer Science, Görlitz University of Applied Sciences, Zittau, Germany G. A. Senthil Department of Information Technology, Agni College of Technology, Chennai, India Rushabh Shah Department of Computer Engineering, D. J. Sanghvi College of Engineering, Mumbai, India Shambhu Sharan Indira Gandhi Delhi Technical University for Women, New Delhi, India Ajay Sharma Rajasthan Technical Univesity, Kota, India Deepesh Sharma Department of Electrical Engineering, Deenbandhu Chhotu Ram University of Science & Technology, Murthal, Sonepat, Haryana, India Manmohan Sharma School of Computer Applications, Lovely Professional University, Phagwara, India Nipun Sharma Presidency University, Bangalore, India Swati Sharma Presidency University, Bangalore, India Snehal B. Shinde School of Computer Science and Engineering, VIT-AP University, Andhra Pradesh, India Aakanksha Singh Department of Computer Science and Engineering, Indira Gandhi Delhi Technical University for Women, Delhi, India Butta Singh Department of Engineering and Technology, Guru Nanak Dev University, Regional Campus Jalandhar, India Manjit Singh Department of Engineering and Technology, Guru Nanak Dev University, Regional Campus Jalandhar, India Payal Singh GLA University, Mathura, India Shashank Kumar Singh Indian Institute of Technology (BHU), Varanasi, India Darius Gnanaraj Solomon Vellore Institute of Technology, Vellore, India Neetika Soni Department of Engineering and Technology, Guru Nanak Dev University, Regional Campus, Jalandhar, India
xxii
Editors and Contributors
K. G. Sreeni Department of Electronics and Communication, College of Engineering Trivandrum, Trivandrum, India Ippatapu Venkata Srisurya Department of Computer Science and Engineering (AIE), Amrita School of Computing, Chennai, India I. Johnsi Stella St. Joseph’s College of Engineering, Chennai, India S. Sujatha Department of ECE, Christ (Deemed to be University), Bengaluru, India Rahul Sunil Department of Computer Science and Engineering, Mar Baselios College of Engineering and Technology, Thiruvananthapuram, Kerala, India M. Lakshmi Swarupa EEE Department, CVR College of Engineering, Ibrahimpatan, Hyderabad, India Flemin Thomas Department of Networking and Communications, School of Computing, SRM IST, Chennai, India Priya Thomas CHRIST (Deemed to Be University), Bangalore, Karnataka, India Thang Viet Tran Department of Science and Technology, Nguyen Tat Thanh University, Ho Chi Minh City, Vietnam Quoc Van Nguyen Institute of Engineering, HUTECH University, Ho Chi Minh City, Vietnam V. V. S. S. Varun Department of CSE, Manipal University Jaipur, Jaipur, Rajasthan, India H. Vathsala CDAC, Bangalore, Karnataka, India Sakshi Vats Department of Computer Science and Engineering, Indira Gandhi Delhi Technical University for Women, Delhi, India T. Velmurugan Research Department of Computer Science, Dwaraka Doss Goverdhan Doss Vaishnav College, Chennai, India B. Venkateswararao Department of EEE, Velagapudi Ramakrishna Siddhartha Engineering College, Kanuru, A.P., India Neha Vinayak School of Computational and Integrative Sciences (SCIS), Jawaharlal Nehru University (JNU), New Delhi, India Prachi P. Waghmare Electronics and Telecommunication Department, MKSSS’s Cummins College of Engineering for Women, Pune, Maharashtra, India S. M. Yamuna Department of Computer Science and Engineering, Kumaraguru College of Technology, Coimbatore, India
Development of PMSM Servo Driver for CNC Machines Using TMS28379D Microcontroller Quoc Van Nguyen, Chi-Ngon Nguyen, and Thang Viet Tran
Abstract AC servo motor mechanism is the key components on machine tools, manipulators, automation machines, etc. This paper describes in detail the development and experimentation of a PMSM servo motor driver for CNC machines using TMS320F28379D microcontroller. The PMSM servo driver is controlled based on field-oriented control (FOC) method combined with closed-loop PID controllers for regulating the flux- and torque-producing currents, velocity, and position. The main target in the servo driver is fast, accurate, small fluctuations, and no overshoot in both position and velocity. To achieve that, a trapezoidal motion profile is used. A PC-based software that communicates with the PMSM servo driver and allows easy control and monitoring of the developed PMSM servo driver is designed. Many useful features are developed in the software such as home setting, jog running, and position running. The scaling factor, acceleration, and deceleration setting are also integrated in the software. In addition, the driver also supports Pulse/Dir Run mode required by CNC controller. The experiment was carried out with a PMSM 4 Kw driving a single-axis sliding lead screw of a CNC machine. The experimental results demonstrate the effectiveness of the proposed system. Keywords Position tracking · PMSM · Field-oriented control · FOC · PID control · Trapezoidal motion profile · Ball-screw
Q. Van Nguyen Institute of Engineering, HUTECH University, Ho Chi Minh City, Vietnam C.-N. Nguyen College of Engineering, Can Tho University, Can Tho City, Vietnam T. V. Tran (B) Department of Science and Technology, Nguyen Tat Thanh University, Ho Chi Minh City, Vietnam e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 689, https://doi.org/10.1007/978-981-99-2322-9_1
1
2
Q. Van Nguyen et al.
1 Introduction In modern manufacturing, automatic machines with servo mechanisms play an important role to replace the human workforce, which is becoming more and more expensive. Therefore, the need to use an AC servo system for precise control of positions is necessary. PMSM is one of the most popular actuators for AC servo systems due to high power density. Many efforts have been made to increase the quality, reduce energy consumption, and increase the accuracy of the AC servo system [1–6]. There are a lot of approaches to control PMSM. Field-oriented control (FOC), a classical approach, uses many PID and PI controllers, namely position loop, speed loop, current-producing flux loop, and current-producing torque loop [7–9]. Each control loop runs independently, and the current rises high during initialization, speed changes, or load changes. Furthermore, uncertain factors such as nonlinearity, time-varying parameters, and load torque noise make it difficult to control the PMSM servo system [10]. In this article, we would like to propose a new experimental method for developing a PMSM servo driver using Texas Instrument TMS28379D microcontroller. The microcontroller has high computation power to execute complex control algorithms along with the libraries that reduce the time and effort required for driver development. The PMSM servo driver is controlled based on FOC control method combined with closed-loop PID controllers for regulating the flux- and torqueproducing currents, velocity, and position. To improve position tracking accuracy without overshoot and overcurrent, a trapezoidal motion profile is used. A PC-based software that communicates with the PMSM servo driver and allows control and monitoring of the PMSM servo system easily is designed. Many useful features are developed in the software such as home setting, jog running, and position running. The scaling factor, and acceleration and deceleration setting are also integrated in the software. Furthermore, the driver also supports Pulse/Dir Run mode that is necessary for CNC machines. The experiment was carried out with a 4 kW PMSM driving a single-axis ball-screw slide of a CNC machine. The experimental results demonstrate the effectiveness of the proposed system.
2 Literature Review Hu Li et al. proposed an adaptive fuzzy-PI control scheme. In this paper, an antisaturation PI controller is used as the main controller to control a permanent magnet synchronous motor (PMSM) [11]. Meanwhile, the adaptive fuzzy tuner optimizes the PI gains which effectively solves the system uncertainties. The system became more dynamic and stable. Zhaowu Ping et al. used a systematic internal model control (IMC) approach to solve load torque disturbance and parametric uncertainties that often exist in PMSM [12]. The paper mentioned that its method can not only achieve highly accurate speed tracking performance under time-varying reference speed and
Development of PMSM Servo Driver for CNC Machines Using …
3
load torque disturbance, but also allow all the motor parameters to be uncertain. However, both the papers only control the motor speed, while position tracking control is more necessary and requires more effort. Ping Li et al. presented an internal model control (IMC) with an extended state observer (ESO) for position tracking control of a servo motor [13]. In this paper, a 2-degree-of-freedom IMC-based PID controller with ESO is developed to improve the control accuracy. The dynamics of the tracking error is analyzed based on an order reduction technique. The ultimately bounded stability of the control system can be guaranteed. A robust adaptive tracking control scheme is also developed for servo mechanisms with nonlinear friction dynamics [14]. Another approach to improve control quality of the AC servo system is using smooth motion profiles where the acceleration and deceleration control method was concentrated for motion planning. One of the most popular control methods is using the S-curve motion profile [15, 16].
3 Methods In this section, methods that are applied for developing the PMSM servo driver are introduced.
3.1 PMSM Servo Driver Hardware Configuration The function block diagram of the PMSM servo driver is described in (Fig. 1). It includes eight main modules as follows: • Main controller module: Using TMS320F28379D microcontroller that provides a single-chip control system. • Encoder module: Velocity and position feedback to the main controller. • Power supply module: Supply DC 5 V to the main controller and 15 V to the driving gate signals for the IGBT module. • Rectifier and inverter (IGBT) module: Converting and supplying power to the PMSM. This module is controlled by the main controller. • Current sensor module: Motor phase current feedback to the main controller. • Overcurrent protection module: Preventing power explosion by overcurrent. • Inputs/Outputs module: Allowing running the PMSM servo motor through Pulse/ Direction running mode. There are three inputs (Pulse, Direction, Run Enable signals) and one output (Alarm signal). • Communication module: Allowing to communicate to a computer or HMI to monitor or control the PMSM servo driver.
4
Q. Van Nguyen et al.
Rectifier and Inverter
PMSM MOTOR
Gate Drive DC Power Supply Communication
Over Current Protection Current Sense
Microcontroller TMS320F28339D
Encoder
PC/HMI I/O Pulse Dir Run
Fig. 1 Function block diagram of PMSM servo driver
3.2 Scale Factor The most common single-axis for the PMSM servo motor system consists of a ballscrew and a PMSM as shown in (Fig. 2). The ball-screw is assembled to the machine base by a rotary bearing, which is driven by the PMSM through a flexible coupling or pulley or reducer. The encoder is a sensor that gives feedback to the main controller about the speed and position of the motor. A rotary encoder is usually specified in pulses per revolution. Because of the effect known as quadrature, the motion controller will 12mm Pitch Ball Screw
Pulley /Gear Box Encoder PMSM Motor 10,000 ppr
Fig. 2 Single-axis servo system with ball-screw
Development of PMSM Servo Driver for CNC Machines Using …
5
actually read 4 times the resolution of the encoder. Therefore, if a 2500 pulses/ revolution encoder is used, 1 revolution of the motor creates 10,000 encoder counts. A scaling factor determines the number of encoder counts that occur during the movement of one user unit. A simple example would be a motor equipped with an encoder making 10,000 counts per revolution and the axis would be scaled in degrees, so 1 revolution of the motor would produce 10,000 encoder counts and move 360°. Thus, the scaling factor for one user unit of “degrees” would be 10,000/360 = 27.78 encoder counts per degree. A linear mechanism such as ball-screw slide is often specified in terms of distance. The pitch of the ball-screw is 12 mm with an encoder of 10,000 counts per revolution. A linear transition has 10,000 counts per 12 mm so the scaling factor will be set to 833.33 encoder counts per mm to scale the axis in mm of the linear movement.
3.3 Dynamic Model for PMSM According to the FOC control principle for PMSM, the mathematical model in the d-q coordinate system can be depicted as: di d + Rs i d − n p ωL q i q dt
(1)
di q + Rs i q + n p ωL d i d + n p ωφ dt
(2)
ud = L d uq = L q
dω = τe − τ L − Bω dt τe = n p L d − L q i d i q + φi q J
(3) (4)
where L d và L q : stator inductance in d-q coordinate; i d , i q , u d , u q : stator current and voltage in d-q coordinate; Rs : stator resistance; n p : the number of rotor permanent magnet pole pairs; ω: the rotor mechanical rotational speed; φ: magnetic flux; J : the rotor inertia moment; τe : the electrical rotor torque; τ L : load torque; B : coefficient of friction.
3.4 Applying PID Controller to the PMSM Servo Driver The transfer function of a PID controller has the following form: G C (s) = K p +
Ki + Kd s s
(5)
6
Q. Van Nguyen et al.
where K p , K i , and K d are proportional gain, integral gain, and derivative gain, respectively. Another type of PID controller has the following form: G C (s) = K p (1 +
1 + Td s (Ti s)
(6)
K
where Ti = Kpi and Td = KK dp are integral constant and derivative constant, respectively. In some cases, we can use PI controller instead of PID controller with the following form: G C (s) = K p +
Ki s
(7)
The block diagram of FOC control for the PMSM servo system using PID and PI controller is given as shown in (Fig. 3). In this block diagram, there are two closed loops using PID controller for velocity and position and two closed loops using PI controller for the flux- and torque-producing currents i d and i q respectively. Knowledge of the rotor flux position is the core of the FOC control. In the PMSM, the rotor speed is the same as the rotor flux speed. Therefore, rotor flux position is directly determined by the encoder sensor. Theoretically, the FOC control for the PMSM allows the motor torque to be controlled independently of the flux like the operation of a DC motor. As in a PMSM, the rotor flux is determined by the magnets; no need to create one. Therefore, when controlling a PMSM, the flux-producing current, i d_ ref , should be set to zero. The torque-producing current, i q_ ref , can be connected to the output of the speed regulator. The angular speed reference, ωref , can be connected to the output of the PID position regulator. Load torque is considered as noise.
Fig. 3 Block diagram of FOC control for PMSM servo driver
Development of PMSM Servo Driver for CNC Machines Using …
7
3.5 Trapezoidal Motion Profile Calculation The motion of an axis moving from one point to another can be specified by a number of parameters, which together define the motion profile. For a simple trapezoidal motion profile, these parameters are distance traveled (D), speed (Sp), acceleration (Accel), and deceleration (Decel). Distance traveled is abbreviated as distance. From (Fig. 4), the motion profiles consist of three stages: acceleration, constant speed, and deceleration. From (Fig. 4), all of the relevant properties can be clearly identified as follows: Ta =
Sp Accel
(8)
Td =
Sp Decel
(9)
Ds Sp
(10)
Ts =
where Ta is the acceleration time period, Ts is the constant speed time period, and Td is the deceleration time period. We know that the distance traveled is equal to the area under the speed curve so the distance traveled during acceleration can be calculated given by using the equation for the area of a triangle: Da =
1 Ta Sp 2
(11)
Dd =
1 Td Sp 2
(12)
D s = D − Da − D d Fig. 4 Trapezoidal motion profile calculation for PMSM servo driver
(13)
8
Q. Van Nguyen et al.
where Da is distance traveled during acceleration, Dd is distance traveled during deceleration, Ds is distance traveled during constant speed, and D is total distance traveled. After determined Ta , Td , Ts , the motion profile can be calculated easily as follows. First stage: acceleration. 1 Accel × τ12 2
(14)
1 Sp × Ta + Sp × τ2 2
(15)
∀t ∈ [t0 , t1 ]{vi (t) = Accel × τ1 si (t) = Second stage: constant speed. ∀t ∈ [t1 , t2 ]{vi (t) = Sp si (t) = Third stage: deceleration.
∀t ∈ [t2 , t3 ]{vi (t) = Sp − decel × τ3 si (t) 1 = Sp × Ta + Sp × Ts 2 1 + Sp × τ3 − decel × τ32 2
(16)
where si : path length at ith stage; t0 ∼ t3 : point of time, ti−1 is start point time and ti is end point time of section ith; τ0 ∼ τ3 : internal time of each section. It starts from zero until Ti for section ith; T1 ∼ T3 : : time period of each section (T1 = Ta ; T2 = Ts ; T3 = Td ).
3.6 Features for PMSM Servo Driver There are four basic features developed for the PMSM servo driver including home setting, jog running, position running, and Pulse/Dir running mode. • Home setting Although the motion profile is calculated based on relative coordinate, it is more convenient to use absolute coordinate in the operation of the PMSM servo system. Therefore “home setting” is used to determine the zero point on the ball-screw slide. There are two ways to set the home position for the PMSM servo system on the ball-screw slide: – Set home position by trigger signal: The current position of the motor is the home position. The home position is set manually. – Set home position using proximity sensor: The position of the proximity sensor is the home position. The home position is set automatically.
Development of PMSM Servo Driver for CNC Machines Using …
9
– Jog running “Jog run” is when the PMSM servo driver is required to run at the specified constant speed, while the “jog run” command is called. The acceleration and deceleration in jog running are predetermined based on the hardware and system requirements. “Jog running” is often performed with a “press-and-hold” type switch. There are two mode of “jog running” including “forward jog” and “backward jog”. When “forward jog” is used, the servo motor will run in the positive direction and vice versa for “backward jog”. Jog running speed must be determined before using “jog run”. • Position running “Position run” is when a PMSM servo driver is required to run from one position to another at a predetermined constant speed. The acceleration and deceleration are predetermined based on hardware and system requirements. The position must be determined in absolute coordinate after “home set” is determined. • Pulse/Dir running mode Instead of running by using commands from a PC or HMI control interface, the driver can run using input signals including pulse input, direction input, and run enable input. This feature is commonly used in a tool machine such as a CNC machine, where the pulse generator is integrated in the CNC controller.
3.7 PC-Based Software Design To realize the above features, a PC-based software that communicates with the driver through the COM port and allows easy control and monitoring of the PMSM servo system is designed. The design ideas for the PC-based software are given as shown in (Fig. 5). On the design in (Fig. 5), there are seven function blocks as follows: • The first block is used to select the COM port. A status lamp indicates whether the COM port is successfully connected to the driver. A button is used to enable or disable the COM port. • The second block is “servo on/off”. This block is used to enable or disable the driver. When the servo is turned on without a run command, the motor will stay fixed in one position, this state is called the “ready state”. There are two status lamps. One status lamp indicates the on/off status of the driver. Other status lamp indicates the driver failure status. • The third block is the jog instruction block. There is an input text box that allows us to enter the desired jog running speed. In addition, there are two push and hold buttons, the first one when pressed will make the motor run in the positive direction, the second one when pressed will make the motor run in the opposite direction.
10
Q. Van Nguyen et al.
Fig. 5 Design ideas for the PC-based software
• The fourth block is the “home setting” block. When the button of this block is pressed for 3 s, the controller will execute the command to set the current position as zero point for the driver. There is a status lamp to indicate whether the zero point has been set or not. • The fifth block has three green textboxes and two white textboxes. The three green textboxes show the current position, the current speed, and the current load of the motor. The two white input textboxes allow setting the acceleration and deceleration of the motor driver. • The sixth block allows you to set the moving position and the corresponding speed. There are seven position and speed pairs. The position is determined in absolute coordinate. There are seven command buttons; when the corresponding button is pressed, the motor will move to the point with the corresponding speed. In addition, there is an input textbox that allows the scaling factor to be set. • The seventh block, the “run cycle” block, is used in the experiment for the driver. When the cycle run is performed, the driver will control the motor to run sequentially through seven points with the corresponding speed that are set in the sixth block. There is a status lamp indicating that the system is in “run cycle” mode. There is also a blue textbox that shows how long the run cycle has been running.
Development of PMSM Servo Driver for CNC Machines Using …
11
4 Results In this section, experimental results are given to demonstrate the effectiveness of the proposed system. The PMSM servo system including PMSM driven by the motor driver is developed based on the TMS320F28379D microcontroller. The experimental setup is given in (Fig. 6). The PMSM motor specifications are given in Table 1. The motor has 4.4 kW power, rated speed 1500 rpm, rated current 32.8 A, rated torque 28.4 N m, and 2500 pulses/rev at encoder resolution. The control algorithm is implemented with a sampling time 100 µs. The PC-based software developed by MATLAB GUI is shown in (Fig. 7). With the help of the PC-based software, the PID gains for the flux- and torqueproducing currents, velocity, and position are corrected and an acceptable result for PID gains are given in Table 2. From Table 2, we can see that the K d the gain for velocity is zero in this test system. To demonstrate the quality and the reliability of the driver, continuous tracking points for the PMSM servo system are given as shown in (Fig. 8), where the slide driven by the PMSM servo motor will start moving from the home position (0 mm) to the position 1 (120 mm) at 200 rpm. After reaching the position 1, the slide continues
Laptop/PC
AC Servo Driver
1-Axis CNC Machine
Power
Data
Matlab Gui
Posion
TMS320F28379D
Fig. 6 Structure of the PMSM servo system
Table 1 PMSM motor specification used in the experiments
PMSM specifications Mfg name
Yaskawa
Part number
SGMG44A2AB
Rated output
4.4 kW
Rated speed
1500 rpm
Rated current
32.8 Amp
Rated torque
28.4 N m
Encoder resolution
2500 pulses/rev
12
Q. Van Nguyen et al.
Fig. 7 PC-based software implemented using MATLAB
Table 2 Parameter of the PI/ PID controllers in the experiments
Items
Controller parameters Kp
Ki
Kd
Position
2.000000
0.000010
0.000699
Speed
3.000000
0.000215
0.000000
Flux current
1.850000
0.000010
–
Torque current
2.200000
0.001500
–
to move to the position 2 (0 mm) at − 300 rpm. The movement is repeated until the slide reaches the position 7 (0 mm). The experimental result for position and speed tracking of the AC motor servo system is shown in (Fig. 9). The real speed tracks the reference speed with a bounded tracking error. However, the position tracking is well done. Therefore, this result is acceptable. To more clearly about position tracking, Fig. 10 shows the position tracking error. We can see that the position tracking error is bounded in ± 0.7 mm. Figure 11 shows the flux-producing current Id and torque-producing current Iq . Follow the theory of the FOC control for PMSM, the flux-producing reference current is zero. The upper in (Fig. 11) shows the flux-producing tracking current. The experimental result shows that the flux-producing real current is bounded by zero in ± 2 A.
Development of PMSM Servo Driver for CNC Machines Using …
Speed: 200rpm Home 0mm
Speed: -300rpm Position 1 120mm
13
Speed: 200rpm Position 2 0mm
Position 3 240mm Speed: -300rpm
Speed: 300rpm
Speed: -200rpm Position 7 0mm
Position 6 240mm
Speed: -200rpm Position 5 0mm
Position 4 120mm
Fig. 8 Position moving with different speed in experiment
Fig. 9 Position and speed tracking in the experiment
The lower in (Fig. 11) shows the torque-producing tracking current. Along the movement of the slide, the torque-producing real current is bounded by the reference value in ±3 A and bounded by zero in ± 6 A.
14
Q. Van Nguyen et al.
Fig. 10 Position tracking error in the experiment
5 Discussion The position tracking error is quite large and bounded in ± 0.7 mm. However, this tracking error is an error in the PID position loop that is large from the starting point and rapidly converges to zero at the end point. The tracking error can be reduced by increasing the encoder resolution. By applying a trapezoidal motion profile, the system was prevented from overcurrent. In experience, the system runs through seven points with different speeds reliably. The running and stopping of the system are very smooth. The limitation of this study is to focus only on developing experimental solutions. In the future, this paper will also improve the control algorithms instead of classical PID and PI controllers.
6 Conclusions and Future Work This paper has succeeded in developing a PMSM servo system for CNC machines using TMS28379D microcontroller. The control method is based on FOC control with PID regulating algorithms. Many useful features for the PMSM servo system are introduced and implemented in the PC-based software such as home setting,
Development of PMSM Servo Driver for CNC Machines Using …
15
Fig. 11 Flux- and torque-producing tracking currents in the experiment
jog running, and position running. The scaling factor, and acceleration and deceleration setting are also integrated in the PC-based software. A trapezoidal motion profile is used to improve movement accuracy and prevent overcurrent. The Pulse/ Dir run mode that is necessary for CNC machines is also integrated in the motor driver. Finally, the experimental results demonstrate the effectiveness of the proposed system. Acknowledgements The research project is funded by the Department of Science and Technology of Nguyen Tat Thanh University. Conflicts of Interest The authors have no conflicts of interest to declare.
16
Q. Van Nguyen et al.
References 1. Ding X, Cheng J, Zhao Z, Luk PCK (2021) A high-precision and high-efficiency PMSM driver based on power amplifiers and RTSPSs. IEEE Trans Power Electron 36(9):10470–10480 2. Liu CY, Laghrouche S, Depernet D, Djerdir A, Cirrincione M (2021) Disturbance-observerbased complementary sliding-mode speed control for PMSM drives: a super-twisting slidingmode observer-based approach. IEEE J Emerg Sel Top Power Electron 9(5):5416–5428 3. Zhang W, Cao B, Nan N, Li M, Chen YQ (2021) An adaptive PID-type sliding mode learning compensation of torque ripple in PMSM position servo systems towards energy efficiency. ISA Trans 110:258–270 4. Gao L, Zhang G, Yang H, Mei L (2021) A novel method of model predictive control on permanent magnet synchronous machine with Laguerre functions. Alex Eng J 60(6):5485–5494 5. Zhao Y, Yu H, Wang S (2021) Development of optimized cooperative control based on feedback linearization and error port-controlled Hamiltonian for permanent magnet synchronous motor. IEEE Access 9:1–15 6. Fu X, Xu Y, He H, Fu X (2021) Initial rotor position estimation by detecting vibration of permanent magnet synchronous machine. IEEE Trans Industr Electron 68(8):6595–6606 7. Favato A, Carlet PG, Toso F, Torchio R, Bolognani S (2021) Integral model predictive current control for synchronous motor drives. IEEE Trans Power Electron 36(11):13293–13303 8. Hussain HA (2021) Tuning and performance evaluation of 2DOF PI current controllers for PMSM drives. IEEE Trans Transp Electrification 7(3):1401–1414 9. Cho B-G, Hong C, Lee J, Lee W-J (2021) Simple position sensorless V/f scalar control method for permanent-magnet synchronous motor drives. J Power Electron 21:1020–1029 10. Wang S, Yu H, Yu J (2019) Robust adaptive tracking control for servo mechanisms with continuous friction compensation. Control Eng Pract 87:76–82 11. Li H, Song B, Chen T, Xie Y, Zhou XD (2021) Adaptive fuzzy PI controller for permanent magnet synchronous motor drive based on predictive functional control. J Franklin Inst 358(15):7333–7364 12. Ping Z, Li Y, Song Y, Huang Y, Wang H, Lu JG (2021) Nonlinear speed tracking control of PMSM servo system: a global robust output regulation approach. Control Eng Pract 112:104832 13. Li P, Zhu G (2019) IMC-based PID control of servo motors with extended state observer. Mechatronics 62:102252 14. Liu Y, Wang ZZ, Wang YF, Wang DH, Xu JF (2021) Cascade tracking control of servo motor with robust adaptive fuzzy compensation. Inf Sci 569:450–468 15. Fang Y, Hu J, Liu W, Shao Q, Qi J, Peng Y (2019) Smooth and time-optimal S-curve trajectory planning for automated robots and machines. Mech Mach Theory 137:127–153 16. Huang H (2018) An adjustable look-ahead acceleration/deceleration hybrid interpolation technique with variable maximum feedrate. Int J Adv Manuf Technol 95:1521–1538
A Probabilistic Method to Identify HTTP/1.1 Slow Rate DoS Attacks Nemalikanti Anand and M. A Saifulla
Abstract The most widely used application layer protocols at the moment on the Internet is HTTP/1.1. Since last decade, security researchers performed a detailed vulnerability assessment of the protocol and explored potential vulnerabilities that are exploitable to launch a number of DoS attacks at the application layer against it. One popular class of such attacks is the Slow Rate HTTP/1.1 DoS attack. These attacks are highly stealth and require minimal computational power from the attacker’s perspective. Since these attacks generate very less traffic, the approaches known to detect traditional DoS attacks cannot be detected by HTTP/1.1 Slow Rate DoS attacks. The probabilistic anomaly detection method provided in this paper produces a probability distribution of typical HTTP/1.1 traffic which is used as a benchmark for the detection of Slow Rate HTTP/1.1 DoS attacks. In a real network configuration, we test two different Slow Rate HTTP/1.1 DoS attack variants and provide the findings. The findings indicate that the suggested detection method identifies attacks with a very high degree of accuracy. Keywords Application layer · DoS attacks · HTTP/1.1 · Slow rate attacks
1 Introduction The Denial of Service (DoS) attacks target availability of a service by targeting finite resources such as computational power, physical memory, and network bandwidth. Once these resources are consumed, the legitimate clients cannot access a service running at the victim. Security experts have been aware of these attacks for a while; therefore, a number of defence strategies have been put forth in the literature to thwart them. N. Anand (B) · M. A. Saifulla University of Hyderabad, Hyderabad, India e-mail: [email protected] M. A. Saifulla e-mail: [email protected] SCIS, Hyderabad, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 689, https://doi.org/10.1007/978-981-99-2322-9_2
17
18
N. Anand and M. A. Saifulla
Recently, Application layer DoS assaults [1] are a new type of DoS attack that has grown in popularity among many underground hacking communities. This is because these attacks are easier to deploy, require less computational power for the execution, and are highly stealth. Since the last decade, researchers found several vulnerabilities in the application layer exploitable protocols for launching these Denial of Service attacks. The Slow Rate HTTP/1.1 DoS attack is a well-known illustration of an application layer DoS attack [2]. The slow rate attacks are highly stealth as compared to traditional DoS attacks, and thus are difficult to detect. Taking motivation from this, we present a probabilistic anomaly monitoring system in this paper to spot Slow Rate DoS attacks on HTTP/1.1 web servers. This paper organised in various sections. In Sect. 2 introducing the HTTP/1.1 and Slow Rate HTTP/1.1 DoS attacks. In Sect. 3 discussed about various authors work on slow rate attacks. In Sect. 4 discussed training phase, testing phase, and identifying the threshold value. In Sect. 5 detecting the performance of attack rates, absence of the attack and presence of the attack at a fixed rate. In Sect. 6 concludes up the paper.
2 Background This section primarily focuses on how HTTP/1.1 works. Subsequently, we present the details of Slow Rate HTTP/1.1 DoS attacks.
2.1 HTTP/1.1 HTTP/1.1 was published in RFC 2616 [3] in June 1999. Since then, it has been one of the most popular application layer protocol that is being used on the Internet. HTTP/1.1 is based on the client–server architecture, and the transport layer protocol is TCP. The web requests are sent by the HTTP/1.1 clients (e.g. browsers) to the web servers. On receiving these requests, the HTTP/1.1-enabled web server processes and sends the web browser a request and then returns the response. There are several methods that are used to access web resources present at a web server. The HTTP/1.1 GET and POST methods are the most widely used of all. The GET method is used by the client to fetch a web resource from the web server. On the other hand, POST method is used by the client to either send a message body (e.g. HTML form values) or upload a file (e.g. a JPEG image) to the web server. The hackers in the background community discovered a class of attack known as Slow Rate HTTP/1.1 DoS attacks to tamper the normal operation of these HTTP/1.1 methods. In the next subsection, we discuss the working of these attacks.
A Probabilistic Method to Identify HTTP/1.1 Slow Rate DoS Attacks
19
2.2 Slow Rate HTTP/1.1 DoS Attacks Slow Rate HTTP/1.1 DoS attacks aim to consume the connection queue space available at a web server. This is a finite capacity space and is used by the web server to store the unserved web requests. One incomplete web request is sent to target web server from each of the established connections that the attacker has made in order to launch the attack. The server waits to receive the remaining requests after getting these incomplete ones and puts them in the connection queue [4] space. The attacker, however, never delivers the remaining data, making the web server wait for a lengthy time before it can shut the connection. Once enough requests are sent, the server’s connection queue space is consumed due to which the requests sent by the legitimate clients are not entertained by the server. This results into a DoS scenario. Slow Rate HTTP/1.1 DoS attacks come in a variety of flavours depending on the technique utilised, including the following:
2.2.1
Slow Header Attack
An attacker uses incomplete HTTP/1.1 GET request headers to perform a Slow Header attack. For example, the attacker can simply remove the last few characters of a request header, in order to force the web server [5] to wait for receiving the remaining portion of the header.
2.2.2
Slow Message Body Attack
To launch a Slow Message Body attack, an attacker sends a web request having incomplete message body to the victim web server [6]. When the server receives such a request, it takes the value from the HTTP/1.1 header’s “Content-Length” field and compares it to the actual message body length. The server assumes that the entire message body was not received if the extracted value is bigger than the actual length of the message body. Thus, it transfers the request data to the connection queue space and starts waiting to receive the remaining content of the message body.
3 Related Work To identify Slow Rate HTTP/1.1 DoS attacks, the research community has put forth a lot of work. In this section, we go over the defence strategies counter Slow Rate DoS attacks against HTTP/1.1 that have been documented in the literature. The literature suggests a wide range of strategies to counter against Slow Rate DoS attacks. However, the most easily deployable ones are the anomaly-based detection schemes [8] to detect web traffic anomalies. This is because the such schemes
20
N. Anand and M. A. Saifulla
monitor the deviations based on the statistics of the normal flows [9, 10]. Due to their simplicity, they are also known to provide good detection rates [11]. In [12], authors proposed a detection approach that monitors anomalies based on the features such as amount of time needed to craft an HTTP/1.1 request. Using this feature, the sampling distribution obtained from the unknown traffic is compared with the legitimate HTTP/1.1 traffic in order to detect anomalies. Also, it was assumed that if a complete request is contained within a single HTTP/1.1 packet, the time required to craft the request is 0. Attackers can get around this detection method, though, by sending a incomplete web request in a single HTTP/1.1 packet and not sending the rest of the request in subsequent HTTP/1.1 packets. This technique allows the attacker to get enough time (e.g. for the Apache web server, 300 s rather than 990 s) to launch the described attacks against a victim web server. In [13], the authors used three different types of analysis—(i) statistical analysis of the HTTP/1.1 flows, (ii) to construct many server paths and determine the various costs of using the web server, study of the users’ access behaviour is required, and (iii) examination of how frequently the web server performs HTTP/1.1 operations. The disadvantage of this strategy is that it is unable to identify distributed Slow Rate HTTP/1.1 DoS attacks, i.e. if several computers are simultaneously launching the attack such that each computer is generating very low amount of attack traffic. In [14], the authors proposed a method for keeping track of how many packets a web server receives in a particular time duration. Two successive temporal intervals of the packet count are monitored to detect anomalies. However, the drawback of this approach is selecting the packet count as the feature [15]. This is because monitoring packet count can result into a very large number of false positives as high burst of traffic will also be detected as an attack [16]. The other drawback is that these techniques cannot detect Slow HTTP DDoS attacks [17]. In [15], the authors proposed a statistical abnormality measurement technique wherein normal the Hellinger distance (HD) is used to compare the HTTP traffic profile generated during the training phase to the traffic profiles generated at various time intervals during the testing phase. The time period under consideration is identified as including attack traffic if the computed HD exceeds a predetermined threshold. The drawback of this approach is that its deployment can result into large number of false positives as the approach does not take into account the clock time while comparing the profiles generated during the training and testing phases. For the Apache web server software, a few server-side implementation patches [18–21] are also available to prevent Slow Rate HTTP/1.1 DoS attacks. These patches, however, have a propensity to deny legitimate web requests [15]. Additionally, Reduction of Quality (RoQ) attacks [22] can be made against a web server that has these modules enabled. Thus, a large-scale DoS attack can certainly affect the victim web server’s performance. In order to identify HTTP/1.1 DoS attacks, the authors of [12] presented a wide range of neural models to analyse HTTP/1.1 data. They demonstrated through experiments that the applied neural models were unable to distinguish between regular HTTP/1.1 traffic and anomalous HTTP/1.1 traffic. The authors of [23] suggested a computationally simple method for spotting HTTP/1.1 DoS attacks using traffic flow features. To make processing of switch information easier, the authors used the NOX
A Probabilistic Method to Identify HTTP/1.1 Slow Rate DoS Attacks
21
platform [24]. Additionally, in order to identify HTTP/1.1 anomalies, the authors used Self-Organising Maps [25] to analyse flows. Recently, the probabilistic anomaly detection schemes have gained popularity and works presented in [26] used such approaches to counter DHCP starvation attacks and DoS attacks against HTTP/2.
4 Proposed Detection Approach We suggest a detection technique for Slow Rate HTTP/1.1 DoS attacks in this section. The two proposed phases of our suggested system are designated as training and testing. The system builds a profile of typical HTTP/1.1 behaviour during the training phase, and during the testing phase it compares the actual HTTP/1.1 activity to the profile created during the training phase to identify Slow Rate DoS attacks. The following subsections describe about training, testing, and choosing the threshold values. In Sect. 4.1 about training phase, in Sect. 4.2 about testing phase and in Sect. 4.3 about selecting the threshold value.
4.1 Training Phase The typical operation of HTTP/1.1 is learned during the training phase. The processes for producing probability distributions of type REQ in training phase are described by Algorithms 1 and 2. This algorithm first takes as inputs the number of days d and the time window duration WD, where WD refers to a window hours duration on a particular day, and outputs the probability distribution (PD) of a particular request type. The detection approach keeps track of count of REQ occurrences that occur within a WD time interval. For each WD refers to a training duration of a particular time period, this is repeated. Suppose by considering window period is WD=30minutes, there will be a total of 48 intervals per day, and there will be 96 of these intervals, each lasting 30 min, assuming the training session lasts for d = 2 days. Over these 96 intervals, count of type REQ will be counted. The overall count of counts of type REQ for a time slot of the day. Let’s consider time from 3:00 AM to 5:30 AM, which is then computed. The probability estimate module defined in Algorithm 3 receives the estimated number of count of type REQ at various time intervals together with the total number of intervals, as an argument. The probability estimate module determines the likelihood that an event of the type REQ will occur at various times by dividing the number of REQ occurrences that took place within a given time window by the sum of all such occurrences during that day’s time window periods. This generated a probability distribution for type REQ, which is the typical profile of our detection mechanism, for the entire day. The possibility of receiving a specific sort of HTTP/1.1 request at a specific hour of the day is represented by this distribution profile.
22
N. Anand and M. A. Saifulla
4.2 Testing Phase Algorithm 3 specifies the steps to take during the testing phase in order to find anomalies throughout a range of time intervals. Throughout testing, the detection system keeps track of the occurrences of the type REQ. When this time intervals has expired, the detection system initiates a different process to determine how many instances of the type REQ occurred within the same time period of a day during the training phase. Algorithm 4 contains a description of this process. This algorithm uses a probability distribution (profile) created during the training phase to extract the probability of REQ type occurrence throughout the same time interval. The average number of REQ occurrences during the same time period in the training phase is then calculated by multiplying the retrieved probability by the typical number of REQ occurrences per day. Algorithm 3 receives this count, Train_number _o f _r equest_count. Algorithm 3 then calculates the quantity of REQ type occurrences discovered during the training and testing phases. If the count difference exceeds a set threshold, β, the detecting mechanism notifies the network administrators.
4.3 Selecting Threshold Value(β) The network administrators should carefully consider their β threshold selection because smaller β values will result in more false positives, whereas bigger β values will result in more false negatives. For example, if the amount of incoming HTTP/1.1 traffic is high on a highly dynamic network, the specified β value should be significantly higher requests at a given moment varies greatly on a daily basis. On the other hand, networks should benefit more from choosing a smaller β number where the volume of HTTP/1.1 requests at a given time is essentially constant on a daily basis. The number of instances of type REQ in a particular testing phase time interval is always compared to the number of instances of type REQ in a corresponding training phase time interval, and the value of β does not need to change for different time periods.
5 Experimental Results Here, we go over the tests that were run to evaluate the effectiveness of the given proposed detection method. Used the heavy tailed distributions as the probability distribution [7]. We discuss our testbed configuration in the following few subsections to collect HTTP/1.1 traffic, and working of training and testing phases.
A Probabilistic Method to Identify HTTP/1.1 Slow Rate DoS Attacks
23
Algorithm 1 Training Input Parameters: HTTP/1.1 traffic over ‘n’ number of days, Windows duration (in hours) and Training duration (in Days). Output Parameters: Probability Distribution. Notations: Window duration—WD, Training duration—TD, interval—w and Probability Distribution—PD, Number of request count—RC, Start-SRT, Train number of request count— TRC and Total Count—TCNT 1: Total_intervals ← W24D 2: Initialize TCNT [1, 2, 3, · · · , T otal_inter vals] 3: for n = 1 to n do 4: for w = 1 to T otal_inter vals do 5: RC = 0 6: for x = x_SRT to x_SRT + W D do 7: REQ ← Detected a recent event of REQ type 8: RC ← RC + 1 9: end for 10: TCNT [T otal_inter vals] ← TCNT [Total_intervals] + RC 11: end for 12: end for 13: PD ← Pr obabilit y_Estimate(TCNT , T otal_inter vals)
Algorithm 2 Pr obabilit y_Estimate(TCNT [1, 2, 3, · · · , T otal_inter vals], T otal_inter vals) 1: 2: 3: 4: 5: 6: 7: 8:
T otal = 0 for k = 1 to Total_intervals do T otal ← T otal + TCNT [k] end for for k = 1 to Total_intervals do [k] PDk ← TCNT T otal end for Return PD
5.1 Testbed Architecture We chose a public web server that hosts an educational website as our source for training and testing data. In order to collect real HTTP/1.1 traffic across the Internet coming from clients present at various geographic areas, and collected traces from this web server. The required system specifications are “Red Hat Enterprise Linux 7 operating system (RHEL OS 7), Intel Processor with 2.4 GHz speed, 8 GB RAM and Apache web server 2.4.54”. From this setup, collected HTTP/1.1 traffic of total 4-day duration.
24
N. Anand and M. A. Saifulla
Algorithm 3 Testing Input Parameters: HTTP/1.1 traffic over ‘n’ number of days, Windows duration (in hours), Training duration (in Days) and Total count of REQ count Output Parameters: Slow Rate HTTP/1.1 DoS attack detection. Notations: Window duration—WD, Training duration—TD, interval—w , Probability Distribution—PD and Total count of REQ count—Total, Number of request count—RC, StartSRT, Train number of request count—TRC and Total Count—TCNT 1: while There is no interruption do 2: RC = 0 3: for x = x_test_SRT to x_test_end + W D do 4: REQ ← Detected a recent event of REQ type 5: RC ← RC + 1 6: end for 7: TRC ← Get_V alue(Time_test_SRT, Time_test_end, PD, Total, n) 8: if RC ≥ TRC + β then 9: DETECTED Slow Rate HTTP/1.1 DoS attack 10: end if 11: end while
Algorithm 4 Get_Value(Time_test_SRT, Time_test_end,PD,Total,n) 1: T est_PD ← Fetched Probability of REQ type during T ime_test_SRT to T ime_test_end from PD 2: Average_T otal ← T otal n 3: TRC ← T est_PD ∗ Average_T otal 4: Return TRC
5.2 Data Collection for Training Phase We used 3 days of typical HTTP/1.1 traffic for training purposes. For our experiments, we employed an interval duration WD of 10 min. There were 144 intervals per day as a result. Over the course of the three training days, 16995 complete GET requests were made, resulting in an average of 5665 complete GET requests per day (AvgSum=16995/3). Estimating the probability of requests occurring during a time window, which is the ratio of the average number of requests during that window to the average number of requests per day, is necessary for creating profiles. Figure 1a displays the probability distribution of complete GET requests over 144 intervals. In the same way, we generated the probability distribution of complete POST , incomplete GET , and incomplete POST requests also as shown in Fig. 1b–d respectively. In these graphs, the X-axis depicts 10-min time intervals. The likelihood that a certain request type will be made within a given time period WD is represented by the Y -axis. We observe that, over a certain number of time periods, the probability almost zero, causing the graphs to line up with the X-axis. This is because no web requests of the corresponding type were received during that time interval.
A Probabilistic Method to Identify HTTP/1.1 Slow Rate DoS Attacks
25
Fig. 1 Probability distribution for a complete GET requests, b complete POST requests, c incomplete GET requests, and d incomplete POST requests
5.3 Data Collection for Testing Phase We used 1 day of normal HTTP/1.1 traffic for our testing phase so there were total 144 intervals with a duration of 10 minutes. In each of these 144 intervals, we evaluated the accuracy of our detecting method. For detecting the attacks, we made use of probability distribution (profile) of complete GET requests (shown in Fig. 1a). Likewise, the probability distributions of complete POST , incomplete GET , or incomplete POST requests can also be used to detect the attacks. We ran three experiments to evaluate the performance of the proposed strategy, which is covered in the following subsections. The following experiments are Experiment-1: Varied attack rates, Experiment-2: In absence of the attack and Experiment-3: In the presence of the attack at a fixed rate. Experiment-1: Detection performance v/s varied attack rate In the first experiment, we injected Slow Rate HTTP/1.1 DoS attack traffic in all the intervals at different rates with the value of threshold β fixed to 15. In this way, we
26
N. Anand and M. A. Saifulla
Table 1 Attack rate v/s detection performance Attack rate (req/interval) Threshold β 20 40 60 80 100
TPR
15 15 15 15 15
40.28 84.72 94.44 98.61 100
Table 2 Detection performance in absence of attack Threshold FPR 5 10 15 20 25
20.83 14.58 08.33 05.55 03.47
performed the sensitivity of the proposed scheme’s detection performance against the different attack rates. The result for this sensitivity analysis is shown in Table 1. Experiment-2: Detection performance in the absence of the attack In the second experiment, we did not launch the attack in any of the 144 intervals and varied the value of threshold β ranging from 5 to 25. In this way, we performed the sensitivity of the proposed scheme’s detection performance against the different values of threshold β. The result for this sensitivity analysis is shown in Table 2. Experiment-3: Detection performance in the presence of the attack at a fixed rate In the third experiment, we launched the attack at a fixed rate of 6 attack requests per minute in all of the 144 intervals and varied the value of threshold β ranging from 5 to 25. In this way, we performed the sensitivity of the proposed scheme’s detection performance against the different values of threshold β in presence of the attack at a fixed rate. The result for this sensitivity analysis is shown in Table 3.
6 Conclusion Slow Rate HTTP/1.1 DoS attacks try to exhaust the web server’s connection queue space so that the victim web server is unable to fulfil legitimate web requests. These attacks are highly stealth, and launching them require very less computational power and bandwidth. To identify these attacks, we proposed a probabilistic anomaly detection technique in this work. In order to detect Slow Rate HTTP/1.1 DoS attacks, the
A Probabilistic Method to Identify HTTP/1.1 Slow Rate DoS Attacks
27
Table 3 Detection performance in presence of attack (6 attack req/min) Threshold Attack rate (req/interval) TPR 5 10 15 20 25
60 60 60 60 60
97.22 95.83 94.44 92.36 91.66
proposed scheme generates a probability distribution of normal HTTP/1.1 traffic as a standard profile and then compares the current activity to this profile. Using a variety of studies, we tested the detection performance of the proposed technique and shown that it was capable of fairly accurate attack detection.
References 1. Tripathi N (2022) Delays have dangerous ends: slow HTTP/2 DoS attacks into the wild and their real-time detection using event sequence analysis. https://doi.org/10.48550/arXiv.2203. 16796 2. Tripathi N, Hubballi N (2018) Slow rate denial of service attacks against HTTP/2 and detection analysis 72:255–272. https://doi.org/10.1016/j.cose.2017.09.009 3. Fielding R, Gettys J, Mogul J, Frystyk H, Masinter L, Leach P, Berners-Lee T (1999) RFC2616: hypertext transfer protocol—HTTP/1.1. https://dl.acm.org/doi/pdf/10.17487/RFC2616 4. Tripathi N, Hubballi N (2021) Application layer denial-of-service attacks and defense mechanisms: a survey 54:1–33. https://doi.org/10.1145/3448291 5. Mongelli M, Aiello M, Cambiaso E, Papaleo G (2015) Detection of DoS attacks through Fourier transform and mutual information: 7204–7209. https://doi.org/10.1109/ICC.2015.7249476 6. Hong K, Kim Y, Choi H, Park J (2017) SDN-assisted slow HTTP DDoS attack defense method 22:688–691. https://doi.org/10.1109/LCOMM.2017.2766636 7. Kouvatsos DD, Assi SA (2010) On the analysis of queues with heavy tails: a non-extensive maximum entropy formalism and a generalisation of the Zipf-Mandelbrot distribution: 99–111. https://doi.org/10.1007/978-3-642-25575-5_9 8. Ye N, Chen Q (2001) On the analysis of queues with heavy tails: an anomaly detection technique based on a chi-square statistic for detecting intrusions into information systems 17:105–112. https://doi.org/10.1007/s11276-009-0221-y 9. Benmusa T, Parish DJ, Sandford M (2005) Detecting and classifying delay data exceptions on communication networks using rule based algorithms 18:159–177. https://doi.org/10.1002/ dac.694 10. Sperotto A, Schaffrath G, Sadre R, Morariu C, Pras A, Stiller B (2010) An overview of IP flowbased intrusion detection 12:343–356. https://doi.org/10.1109/SURV.2010.032210.00054 ˙ 11. Ellens W, Zuraniewski P, Sperotto A, Schotanus H, Mandjes M, Meeuwissen E (2013) Flowbased detection of DNS tunnels and detection: 124–135. https://doi.org/10.1007/978-3-64238998-6_16 12. Aiello M, Cambiaso E, Scaglione S, Papaleo G (2013) A similarity based approach for application DoS attacks detection: 000430–000435. https://doi.org/10.1109/ISCC.2013.6754984 13. Giralte LC, Conde C, Diego D, Martin I, Cabello E (2013) Detecting denial of service by modelling web-server behaviour 39: 2252–2262. https://doi.org/10.1016/j.compeleceng.2012. 07.004
28
N. Anand and M. A. Saifulla
14. Aiello M, Cambiaso E, Mongelli M, Papaleo G (2014) An on-line intrusion detection approach to identify low-rate DoS attacks: 1–6. https://doi.org/10.1109/CCST.2014.6987039 15. Tripathi N, Hubballi N, Singh Y (2016) How secure are web servers? An empirical study of slow HTTP DoS attacks and detection: 454–463. https://doi.org/10.1109/ARES.2016.20 16. Bhatia S, Mohay G, Tickle A, Ahmed E (2011) Parametric differences between a real-world distributed denial-of-service attack and a flash event: 210–217. https://doi.org/10.1109/ARES. 2011.39 17. Farina P, Cambiaso E, Papaleo G, Aiello M (2016) Are mobile botnets a possible threat? The case of SlowBot Net: 268–283. https://doi.org/10.1016/j.cose.2016.02.005 18. Configures optimizations for a protocol’s listener sockets. https://httpd.apache.org/docs/2.4/ mod/core.html 19. Mod_antiloris. https://sourceforge.net/projects/mod-antiloris/ 20. Mod reqtimeout. https://httpd.apache.org/docs/trunk/mod/mod-reqtimeout.html 21. Mod limitipconn. http://dominia.org/djao/limitipconn.html 22. Guirguis M, Bestavros A, Matta I, Zhang Y (2005) Reduction of quality (RoQ) attacks on internet end-systems: 1362–1372. https://doi.org/10.1109/INFCOM.2005.1498361 23. Braga R, Mota E, Passito A (2010) Lightweight DDoS flooding attack detection using NOX/OpenFlow: 408–415. https://doi.org/10.1109/LCN.2010.5735752 24. Gude N, Koponen T, Pettit J, Pfaff B, Casado M, McKeown N, Shenker S (2008) NOX: towards an operating system for networks: 105–110. https://dl.acm.org/doi/pdf/10.1145/1384609. 1384625 25. Kohonen T (1990) The self-organizing map 1464–1480. https://doi.org/10.1109/5.58325 26. Tripathi N, Hubballi N (2016) A probabilistic anomaly detection scheme to detect DHCP starvation attacks: 1–6. https://doi.org/10.1109/ANTS.2016.7947848
Transmission Pricing Using MW Mile Method in Deregulated Environment Gummadi Srinivasa Rao , V. Hari Vamsi, and B. Venkateswararao
Abstract Reformation of power system network has been started for involvement of the competition in the power sector. After restructuring of the existing power system network; the involvement of independent Power producers (IPPs) at generation stage, existing transmission system can be changed to the open access and involvement of private entities at the distribution level has been increased. Hence, when the existing transmission network is changed to open access, transmission fee allotment is a very important problem. In the presented text, different researchers have offered dissimilar schemes for sharing of transmission price. This manuscript presents the allocation of transmission pricing when two transactions are contacted between buyer and seller using MW mile method. The IEEE-5 system has been considered for calculating the transmission pricing. In this work, two transactions are assumed for transmission pricing allocation calculation. The two transactions are, a genco at node 3 has come into an agreement with a buyer at node 2, for power trade of 50 MW (Transaction TI), and a genco at node 4 has come into an agreement with a buyer at node 5, for power trade of 100 MW (Transaction 2). The reasonable transmission price distribution, which requires identifying the quantity of power streaming in each line by each power producer and consumer when before and after transactions take place. For this, a novel optimization technique, BAT algorithm optimization technique has been implemented for calculation of power flow in each line. This paper presents the calculation of transmission pricing using the MW mile method when two transactions take place in a standard IEEE-5 bus arrangement. The outcomes are evident that the proposed technique is more advantageous than the postage stamp method available in the literature. Keywords MW mile method · Transmission line pricing · Distribution system · Deregulated power system
G. S. Rao (B) · V. Hari Vamsi · B. Venkateswararao Department of EEE, Velagapudi Ramakrishna Siddhartha Engineering College, Kanuru, A.P. 520007, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 689, https://doi.org/10.1007/978-981-99-2322-9_3
29
30
G. S. Rao et al.
1 Introduction Reformation of power system network has in use around the human race. The main intention at the back of this reformation is to avoid monopoly at the phases of production, transmission, and allotment in the power system network, to decrease effectiveness and eminence of services in the electricity delivery system. At the start, power segments run as perpendicularly merged utilities in which all the tasks are restricted by individual states of the country. Present day reformation and open contact are initiated in the power industry. The regulated structure of the power sector is transformed into a restructured network in which generation, transmission, and distribution are separated by three sectors and competition is involved in the complete system. After restructuring of existed power system network, the involvement of independent Power producers (IPPs) at generation stage has been increased, the existed transmission system can be changed to the open access and involvement of private entities at the distribution level has been increased. But to bring in competition in the transmission division is very difficult by reason of building of a fresh transmission line for each generation. For this reason, transmission price sharing is extremely impotent job within the reformed structure [1]. Various authors to solve this difficulty presented several methodologies. The objective of transmission price sharing is to bring in a healthy competition in the electricity sector [2–6].
2 Problem Statement There are two transactions taking place in IEEE-5 bus system along with the existing loads, in which the transmission company has to gratify. The two transactions are: (A) A generation company at node 3 has come into an agreement with a client at node 2, for power put up for sale of 50 MW (Transaction TI). (B) A generation company at node 4 has come into an agreement with a client at node 5, for power put up for sale of 100 MW (Transaction T2). In the above two cases, sharing of transmission charge has been premeditated based on the power flow disturbance that happens when new contracts are entered.
3 Five-Bus Test System and Relevant Data The one line drawing of IEEE-5 node system is exposed in below Fig. 1. The line parameters, generation, and loads are given in p.u. values. The network particulars are number of lines = 7, number of nodes = 5, number of generators = 2, and number of loads = 4. The seven line parameters such as line resistance, line reactance, and line susceptance of the IEEE-5 bus system are listed in Table 1.
Transmission Pricing Using MW Mile Method in Deregulated …
31
Fig. 1 Five-bus system one line diagram
Table 1 Transmission line parameters of IEEE-5 bus From node
To node
Resistance of line in per unit
Reactance of the line in per unit
Susceptance of the line in per unit
1
2
0.02
0.06
0.06
1
3
0.8
0.24
0.05
2
4
0.06
0.18
0.04
2
4
0.06
0.18
0.04
2
5
0.04
0.12
0.03
3
4
0.01
0.03
0.02
4
5
0.08
0.24
0.05
The values of +ve sequence, zero sequence resistance, inductance, capacitance, and transmission line length are calculated by considering the base values as 100 MVA (S b ) and 230 kV (V b ) which are tabulated in Table 2. From example, the length calculation of the transmission line between bus-1 and bus-2 is as follows: The length √ √ X∗ B of transmission line is calculated using the formula ( 2π f ) ∗ velocity of light in km/sec, after that the length of transmission line is 47 km for the first row in Table 2. Likewise, all the line lengths are calculated and tabulated in Table 2. The generation and load data of the assumed test structure are given in Table 3, and the corresponding original values are listed in Table 4. Node category, primary voltage, and angle of the IEEE-5 bus network and the cost coefficients of the generators, min and max values of the generators are listed in Tables 5 and 6.
32
G. S. Rao et al.
Table 2 Line lengths, sequence impedances of the transmission line Bus no.
Line length in km
Positive sequence resistance (/km)
Zero sequence resistance (/km)
Positive sequence Zero sequence inductance (H/km) inductance (H/ km)
1–2
47
0.22
0.66
0.0017
0.0052
1–3
87
0.48
1.45
0.0038
0.0115
2–3
67
0.46
1.40
0.0037
0.0112
2–4
67
0.46
1.40
0.0037
0.0112
2–5
47
0.44
1.32
0.0035
0.0105
3–4
19
0.27
0.813
0.0021
0.0064
4–5
87
0.48
1.455
0.0038
0.0115
Table 3 p.u. values of generation and load at nodes (buses) Bus no.
PG
QG
Q-Max
Q-Min
PL
1
0
2
0.4
3
QL
0
5
−5
0
0
0
3
−3
0.2
0.1
0
0
0
0
0.45
0.15
4
0
0
0
0
0.8
0.1
5
0
0
0
0
1
0.1
Table 4 Actual values of generation and load at nodes (buses) Bus no.
PG
QG
Reactive-Max
Reactive-Min
PL
QL
1
0
0
500 × 106
− 500 × 106
0
0
2
40 × 106
0
300 × 106
− 300 × 106
20 × 106
10 × 106
3
0
0
0
0
45 × 106
15 × 106
4
0
0
0
0
80 × 106
10 × 106
5
0
0
0
0
100 × 106
10 × 106
Table 5 Bus type, initial voltage, and angle of the IEEE-5 bus network Node no.
1
2
3
4
5
Node category
Slack
Generator (PV)
Load (PQ)
Load (PQ)
Load (PQ)
Initial voltage magnitude
1.06
1
1
1
1
Initial voltage angle
0
0
0
0
0
Transmission Pricing Using MW Mile Method in Deregulated …
33
Table 6 Cost coefficients of the generators and min and max values of the generators Generator
a
b
c
PGi min
PGi max
G1
0.00375
2
0
50
300
G2
0.0175
1.75
0
20
80
Ci = ai PGi 2 + bi PGi + C Total cost = C1 + C2 The base case power flows No. of generators = 02 No. of lines = 7 Total generation = G1 = 250.63 MW, G2 = 10 MW. Total = 260.63 MW Total demand = 245 MW
4 Methodology 4.1 MW Mile Method In the proposed methodology, on each transmission line due to a transaction, the power flow mile is calculated by multiplying the power flow and length of the line. The sums of all the power flow miles have been calculated to achieve the total transmission system usage, and this provides a calculation of how much each transaction uses the grid. The price is then proportionate to the transmission usage by a transaction [7–11]. Algorithm Step-A: Multiply the unit cost of the line (100 $/MW-mile for all lines) with the known line lengths to find the total cost of the line. Step-B: Find Optimal Power Flow (OPF) using BAT algorithm to find the initial power flow on all the lines without any transaction. Step-C: Apply transaction TI in the network and hence the power flows on each line will be changed due to the laws of physics. Then find fresh power flows using the OPF solution. Step-D: Repeat the step-C for Transaction T2. Step-E: Work out the change in power flow on each line originated by the transaction TI. Step-F: Work out the change in power flow on each line originated by the transaction T2. Step-G: Find the total network usage by TI by calculating each line usage due to transaction TI. Step-H: Find the total network usage by T2 by calculating each line usage due to transaction T2. Step-I: Find the total network usage by T1 and T2 for comparative distribution of the costs. Step-J: Compute the proportional distribution of cost due to transaction TI. Step-K: Compute the proportional distribution of cost due to transaction T2.
34
G. S. Rao et al.
5 Results and Discussion The base case power flows (without any buy–sell transaction) are tabulated in Table 7. Transmission transactions The following transactions are considered in the network for calculation of transmission pricing: (A) A generation company (Genco) at node 3 has come into an agreement with a client at node 2, for power trade of 50 MW (let transaction TI). Table 7 Base case power flows Bus from–to
Length of trans. line km
Active power flow (MW)
Reactive power flow (MVAr)
Line losses (MW)
1–2
47
180.30
50.02
6.29
1–3
87
70.33
17.67
3.82
2–3
67
32.45
4.01
0.65
2–4
67
41.55
5.09
1.06
2–5
47
90.01
14
3.34
3–4
19
53.30
2.32
0.3
4–5
87
13.49
– 0.96
0.16
Total losses = 15.62 MW Total cost of generation = 754.8 $/h The power flows with Bat algorithm optimization Total generation = G1 = 202.8729 MW, G2 = 55.2664 MW Total demand = 245 MW The power flows with Bat algorithm optimization (without any buy–sell transaction) are tabulated in Table 8 [12–14]
Table 8 Finest power flow after BAT algorithm Bus from–to
Length of trans. line km
Active power flow
1–2
47
139.9
59.99
4.2
1–3
87
62.982
18.79
3.16
2–3
67
35.63
2.86
0.77
2–4
67
44.08
4.23
1.19
2–5
47
91.25
13.69
3.43
3–4
19
49.68
3.92
0.27
4–5
87
12.31
− 0.47
0.13
Total losses = 13.15 MW Total cost of generation = 710.2541 $/h
Reactive power flow
Line losses
Transmission Pricing Using MW Mile Method in Deregulated …
35
(B) A generation company (Genco) at node 4 has come into an agreement with a client at node 5, for power trade of 100 MW (let transaction T2). The power flow in the network when a 50 MW power transaction taken place (i.e. transaction TI). The following observations are made based on the data given in Table 9. Total losses = 11.04 MW. Total cost of generation = 830.369 $/h. G1 = 200.8627 MW, G2 = 55.1793 MW, and G3 = 50 MW. The power flow in the network when a 100 MW power transaction taken place (i.e. transaction T2). The following observations are made based on the data given in Table 10. Total losses = 20.24 MW. G1 = 208.83 MW, G2 = 56.40 MW, and G3 = 100 MW. Total cost of generation = 1195.6 $/h. Using the data provided in Table 11 and unit cost of the transmission lines of 100 $/MW mile, the transmission price allocation for the assumed transactions is calculated and corresponding calculations are tabulated in Table 12.
Table 9 Power flows when power transaction TI Bus from–to
Length of trans. line km
Real power flow (MW)
Imaginary power flow (MVAr)
Line loss (MW)
1–2
47
149.4
57.55
4.63
1–3
87
51.46
9.10
1.99
2–3
67
17.39
7.46
0.2
2–4
67
29.24
4.40
0.52
2–5
47
83.32
8.62
2.82
3–4
19
71.66
4–5
87
19.84
14.9 3.26
0.54 0.34
Table 10 Power flows when power transacted (transaction T2) Bus from–to
Length of trans. line km
Real power flow (MW)
Imaginary power flow (MVAr)
Line loss (MW)
1–2
47
156.42
55.71
4.98
1–3
87
52.41
9.15
2.05
2–3
67
16.42
6.85
0.18
2–4
67
20.06
8.25
0.26
2–5
47
151.37
26.83
9.48
3–4
19
21.6
10.10
0.06
4–5
87
61.34
13.92
3.23
36
G. S. Rao et al.
Table 11 Transmission line lengths in miles Bus from–to
Length of trans. line km
Length of trans. line miles
1–2
47
29.2044
1–3
87
54.0593
2–3
67
35.4182
2–4
67
35.4182
2–5
47
29.2044
3–4
19
11.8061
4–5
87
54.0593
Table 12 Line cost sharing using MW mile method Steps Line i-j
1–2
1–3
2–3
2–4
2–5
3–4
4–5
Total
A
Lij * Cij
2920
5405
3541
3541
2920
1180
5405
24,912
B
Pij base p.u
1.399 0.629
0.356
0.440
0.912
0.496 0.123
C
Pij due to 1.494 0.514 T1
0.173
0.292
0.833
0.716 0.198
D
Pij due to 1.564 0.524 T2
0.164
0.200
1.513
0.216 0.613
E
Fi due to T1
0.095 0.115
0.183
0.148
0.079
0.22
0.075
F
Fi due to T2
0.165 0.105
0.192
0.24
0.601
0.28
0.49
G
A*E
277.4 621.575 648.003 524.068 230.68
H
A*F
481.8 567.525 679.872 849.84
I
GTotal + 10,279.558 HTotal
J
GTotal/I
0.2886
K
HTotal/I
0.7113
259.6 405.375 2966.701
1754.92 330.4 2648.5
7312.857
About 100 $/MW mile for all lines is considered for cost allocation to the transmission lines when two transactions happen in the considered test system. The power flows from ith bus to jth bus and –ve sign indicates the reverse direction of power flow.
Transmission Pricing Using MW Mile Method in Deregulated …
37
6 Conclusion By addressing the power flows on definite lines and the distance of transaction, the transmission pricing has been calculated when the two transactions happen in the considered network by applying the MW mile method. In this method, the shortcomings of the postage stamp-based transmission-pricing scheme have been overcome. The percentage of total cost allocation for the two transactions is 28.86% for transaction 1 and 71.13% for transaction 2. To identify the quantum of power flowing in each line by each generator and load when before and after transactions take place, a novel optimization technique, BAT algorithm optimization technique has been implemented for calculation of power flow in each line. This paper presented the calculation of transmission pricing using the MW mile method when two transactions take place in IEEE-5 bus test system. The results are evident that the proposed method is more advantageous than the postage stamp method available in the literature.
References 1. Olmos L, Arriaga IJP (2009) A comprehensive approach for computation and implementation of efficient electricity transmission network charges. Energy Policy 37(12):5285–5295 2. Galiana FD, Conejo AJ, Gil HA (2003) Transmission network cost allocation based on equivalent bilateral exchanges. IEEE Trans Power Syst 18(4):1425–1431 3. Conejo AJ, Contreras J, Lima DA, Feltrin AP (2007) Zbus transmission network cost allocation. IEEE Trans Power Syst 22(1):342–349 4. Oloomi-Buygi M, Salehizadeh M (2008) Considering system non-linearity in transmission pricing. Int J Elect Power Energy Syst 30(8):455–461 5. Abou El Elaa AA, El-Sehiemyb RA (2009) Transmission usage cost allocation schemes. Elect Power Syst Res 79(6):926–936 6. Bhakar R, Sriram VS, Padhy NP, Gupta HO (2009) Transmission embedded cost allocation in restructured environment: a game-theoretic approach. Elect Power Compon Syst 37(9):970– 981 7. Lima DA, Feltrin AP, Contreras J (2009) An overview on network cost allocation methods. Elect Power Syst Res 79(5):750–758 8. Arriaga IJP, Rubio FJ, Puerta JF, Arceluz J, Martín J (1995) Marginal pricing of transmission services: an analysis of cost recovery. IEEE Trans Power Syst 10(1):546–553 9. Rubio-Oderiz FJ, Arriaga IJP (2000) Marginal pricing of transmission services: a comparative analysis of network cost allocation methods. IEEE Trans Power Syst 15(1):448–454 10. Sedaghati A (2006) Cost of transmission system usage based on an economic measure. IEEE Trans Power Syst 21(2):446–473 11. Shirmohammadi D, Filho V, Gorenstin B, Pereira MVP (1996) Some fundamental technical concepts about cost based transmission pricing. IEEE Trans Power Syst 11:1002–1008 12. Abhyankar AR, Soman SA, Khaparde SA (2006) Optimization approach to real power tracing: an application to transmission fixed cost allocation. IEEE Trans Power Syst 21(3) 13. Steven S (2002) Power system economics: designing markets for electricity. IEEE Press & Willey-Interscience Publications 14. Rubio-Oderiz J, Perez-Arriaga IJ. Marginal pricing of transmission services: a comparative analysis of network cost allocation methods. IEEE Trans Power Syst 15(1)
Performance Analysis of Nonlinear Companding Techniques for PAPR Mitigation in 5G GFDM Systems Neethu Radha Gopan and S. Sujatha
Abstract Generalized Frequency Division Multiplexing (GFDM) is a 5G waveform contender that offers asynchronous and non-orthogonal data transmission, featuring several advantages, some of them being low latency, reduced out-of-band (OOB) radiation and low adjacent channel leakage ratio. GFDM is a non-orthogonal multicarrier waveform which enables data transmission on a time frequency grid. However, like orthogonal frequency division multiplexing and many other multicarrier systems, high peak-to-average power ratio (PAPR) is one of the main problems in GFDM, which degrades the high-power amplifier (HPA) efficiency and distorts the transmitted signal, thereby affecting the bit error rate (BER) performance of the system. Hence, PAPR reduction is essential for improved system performance and enhanced efficiency. Nonlinear companding techniques are known to be one of the effective low complexity PAPR reduction techniques for multicarrier systems. In this paper, a GFDM system is evaluated using mu law companding, root companding and exponential companding techniques for efficient PAPR reduction. The PAPR and BER graphs are used to evaluate the proposed methods in the presence of an HPA. Simulations show that, out of these three techniques, exponential companding was found to provide a trade-off between the PAPR reduction and BER performance. Keywords PAPR reduction · GFDM · Root companding · Exponential companding · Mu law companding · High-power amplifier
N. R. Gopan (B) Department of ECE, Rajagiri School of Engineering and Technology, Kochi, India e-mail: [email protected] S. Sujatha Department of ECE, Christ (Deemed to be University), Bengaluru, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 689, https://doi.org/10.1007/978-981-99-2322-9_4
39
40
N. R. Gopan and S. Sujatha
1 Introduction Upcoming wireless communication technologies will be crucial for the functioning of the modern society. In terms of multicarrier modulation methods, fifth-generation wireless communication systems have GFDM as one of the most promising options. It is a filter bank-based multicarrier system which makes use of pulse shaping filter for achieving time–frequency localization of the transmit signals. GFDM systems provide improved spectral efficiency and low out-of-band emissions when compared to OFDM as proposed by Michailow et al. [1] and Matthe et al. [2]. However, like other multicarrier systems, GFDM suffers from high PAPR. The techniques discussed by Rahmatallah and Mohan [3], that are adopted for the PAPR reduction in OFDM systems include signal distortion techniques like amplitude clipping and filtering and companding and signal scrambling techniques like partial transmit sequence, selective mapping, interleaving, etc. Different techniques for PAPR mitigation in GFDM systems have been discussed in literature. A clipping and mu law companding technique for PAPR reduction in GFDM was proposed by Tiwari and Paulus [4]. It was observed that both clipping and µ-Law companding techniques combinedly reduce the PAPR significantly rather than using a single method. Li and Deng [5], suggested an effective nonlinear companding transform for PAPR mitigation together with an iterative receiver to enhance the system’s BER performance. Proper selection of the cutoff point and inflexion point helps to provide a trade-off between PAPR and BER quality. A companding method based on polynomial compression along with an iterative expander at the receiver was proposed by Sharifian et al. [6]. Studies show that the increasing polynomial order reduces the PAPR but has a reverse effect on the complexity of the system and the BER performance. In another work, Mishra and Kulat [7] derived a theoretical expression to evaluate the PAPR of a GFDM system which is based on order statistics. A square root companding technique was evaluated based on the derived expression as well as using simulations. An efficient nonlinear companding scheme was proposed by Liu et al. [8] based on the arc-cosine function for PAPR reduction along with an iterative reception scheme to improve the BER of the system. Simulation results demonstrate that, in comparison to the traditional companding and filtering strategies, the suggested method offers better PAPR and BER performance and greater design flexibility. In the work by Mary and Vimala [9], the paper analyzed a piecewise nonlinear companding transform to mitigate high PAPR in GFDM system and to improve the BER performance. According to the simulation results, companding transforms and the roll-off factor for root raised cosine filters both work well to lower the high PAPR of GFDM. The structure of the work is as follows. Section 2 discusses the conventional GFDM system model. In Sect. 3, the PAPR problem and HPA nonlinearity in GFDM systems are discussed. In Sect. 4, mu law, root companding and exponential companding techniques are discussed for PAPR reduction. The BER and PAPR performances of all the three techniques are compared in Sect. 5. Finally, the conclusion of this work is presented in Sect. 6.
Performance Analysis of Nonlinear Companding Techniques for PAPR …
41
2 GFDM System Model GFDM is a non-orthogonal block-based multi-carrier transmission scheme in which M sub-symbols are superimposed over K subcarriers. Hence, the total number of symbols that can be transmitted on a single data block is given by N = KM. The input binary data are first passed through a symbol mapper, e.g., QAM which converts the bits into symbols from a complex constellation (μ is taken as the modulation order). The mapped data vector, d having N × 1 elements, is given as an input to the GFDM modulator. The data block is decomposed into M sub-symbols, each of which has K samples, given as, d m = [d 0,m d 1,m …d K −1,m ]T , where m can take values of the different sub-symbols. The data symbol on the kth subcarrier in the mth time slot is denoted by d k,m . The pulse shaping filter used for transmitting each d k,m is given by pk,m (n) = p((n − m K ) mod N )e j2πkn/N ,
(1)
where n is the sampling index (0 ≤ n ≤ N − 1), p(n) is the pulse shaping filter used and pk,m (n) is the pulse shaping filter after shifting circularly in both time and frequency domains. All the data symbols d k,m filtered by pk,m (n) are finally superimposed to obtain the GFDM symbol and are given by x(n) = pk,m (n).dk,m .
(2)
Finally, the GFDM symbol can be modified by the cyclic prefix to produce x (n). The GFDM signal is then passed through a nonlinear high-power amplifier to the AWGN channel. The general block description of a GFDM system is represented in Fig. 1. At the receiver, the GFDM symbol is given by y(n) = h(n) ∗ x (n) + w(n),
(3)
where h(n) is the impulse response of the channel and w(n) is the complex additive white Gaussian noise (AWGN) with variance N0. The symbols are passed through a zero-forcing receiver at the receiver side after the cyclic prefix has been eliminated. The final step is to demap the demodulated symbols to create a series of bits.
Fig. 1 Block description of a GFDM system
42
N. R. Gopan and S. Sujatha
3 PAPR and HPA Nonlinearity in GFDM Several subcarriers in GFDM are out of phase with one another. All points reaching maximum value at the same time will cause a spike in the signal envelope, causing output peaks. The GFDM signal peak value becomes significantly higher than the average power of the entire system due to the large number of modulated subcarriers. This ratio of the peak to average power value is termed as peak-to-average power ratio. PAPR = 10 log10 (Pmax /Pavg ).
(4)
PAPR of the system is measured in terms of the complementary cumulative distributive function (CCDF) and is defined as the probability that PAPR of a GFDM symbol is larger than a particular threshold (ξ o). CCDF is mathematically represented as CCDF = prob{PAPR > ξo }.
(5)
High-power amplifiers at the transmitter are easily saturated by large PAPR signals, resulting in degradation of BER performance along with OOB radiation. Using amplifiers with a large input back-off (IBO) is necessary to mitigate such distortion effects. This in turn increases the energy consumption in mobile devices and also decreases the power amplifier efficiency. Hence, reduction of PAPR in GFDM systems is essential. Figure 2 depicts the input–output characteristics of a power amplifier. The Rapp model is a widely used high-power amplifier model and is discussed by Paredes et al. [10]. This HPA model allows the input signal to pass through till the input amplitude approaches’ saturation. The output of the HPA is given as: R[x(n)] = Ra [x(n)]e j Rp[x(n)] ,
Fig. 2 High-power amplifier transfer characteristics
(6)
Performance Analysis of Nonlinear Companding Techniques for PAPR …
43
where Ra [x(n)] is the AM/AM characteristic function of the signal amplitude, and Rp [x(n)] is the AM/PM characteristic function of the signal phase. AM/AM and AM/ PM equations for the Rapp model HPA are given by Eqs. (7) and (8), respectively. Ra [x(n)] =
a.x(n) , (1 + (ax(n)/Vsat )2 p )1/2 p
(7)
R p [x(n)] = 0,
(8)
where x(n) is the GFDM signal which is passed through the HPA, α is the gain of the HPA, V sat is the saturation voltage above which the nonlinear region occurs and p is the roll-over factor of the HPA.
4 PAPR Reduction Techniques Companding is the most commonly used nonlinear technique for reduction of PAPR in multicarrier systems. With the objective of lowering the system’s PAPR, companding transforms convert the probability density function of the GFDM signal.
4.1 Mu Law Companding Wang et al. [11] proposed the mu law companding technique stating that the low amplitude signals are enlarged, and the higher amplitude signals remain unaltered. Hence, although the PAPR of the system decreases, the system’s average power increases using this technique. The μ-law companding function is represented as: y(n) = F{x(n)} = Vmax
log 1 +
μ|x(n)| Vmax
log(1 + μ)
.sgn(x(n)),
(9)
where V max is the maximum amplitude of x(n) and μ is the companding factor. Inverse companding function is represented as x(n) = F −1 {y(n)} =
|y(n)| Vmax . (1 + μ) Vmax − 1 sgn(y(n)). μ
(10)
44
N. R. Gopan and S. Sujatha
4.2 Root Companding The root companding technique is based on the order of the root which is discussed by Chang et al. [12]. Root companding function is expressed as: y(n) = F{x(n)} = |x(n)|r .sgn(x(n)).
(11)
Inverse companding function is given by x(n) = F −1 {y(n)} = |y(n)| r .sgn(y(n)), 1
(12)
where r is the root order. When r is increased to 1, the transformed signal y(n) becomes larger than the input signal, when |x(n)| < 1. The transformed signal y(n) becomes smaller than the input signal, when |x(n)| > 1. As the order is increased beyond 1, the transformed signal y(n) is smaller than the input signal, when |x(n)| < 1 and larger than x(n) when |x(n)| > 1.
4.3 Exponential Companding In exponential companding proposed by Jiang et al. [13], the PAPR of the GFDM signal can be reduced if the PDF of the signals can be transformed to uniform distribution. This companding technique also ensures that the average power level is maintained constant. The companding operation is as follows:
|x(n)|2 . y(n) = F{x(n)} = sgn(x(n)) α[1 − exp − σ2 d
(13)
Here, the parameter d(d > 0) is called the companding factor and α is a positive integer which maintains the same average input and output power. ⎛
⎞d/2
⎜ ⎟ ⎜ ⎟ E |x(n)|2 ⎜ ⎟ α = ⎜ ⎟ 2 ⎠ 2 ⎝ d 1 − exp − |x(n)| E 2 σ
.
(14)
The reverse operation is given by x(n) = F{y(n)} = sgn(y(n)).
−σ 2
|y(n)|d . log 1 − α
(15)
Performance Analysis of Nonlinear Companding Techniques for PAPR …
45
5 Performance Evaluation The simulations for the GFDM system are performed using Quadrature Amplitude modulation (QAM) with K = 128 subcarriers and M = 5 sub-symbols. To model the high-power amplifier, the Rapp model is chosen with saturation voltage of 1.5 V, a = 1 (normalized gain) and roll-off factor of p = 3. The graph showing the transfer characteristics for the Rapp HPA model for different V sat values is shown in Fig. 3. An AWGN channel model is taken for the GFDM system. The parameters selected for system simulation are given in Table 1.
Fig. 3 Transfer characteristics for Rapp model HPA for different IBO values
Table 1 GFDM system parameters
System parameters
Values
K-no. of subcarriers
128
M-no. of sub-symbols
5
No. of GFDM symbols
500
Pulse shaping filter
Raised cosine filter
Roll-off factor (α)
0.5
Modulation
4 QAM
High-power amplifier
RAPP model, p = 3, IBO = 3.5 dB
Channel
AWGN
Receiver
Zero forcing
46
N. R. Gopan and S. Sujatha
Fig. 4 CCDF Plot for mu law companding
The CCDF plot and BER plot for mu law companding are shown in Figs. 4 and 5, respectively. In mu law companding, mu values were taken as 10, 50 and 150. For μ = 10, the PAPR was reduced to 6.8 dB from 10.8 dB (PAPR without companding). The bit error performance of the system for the same mu value is around 10–3 at an SNR of 18 dB. For mu values 50 and 150, the PAPR values are further reduced to 5.2 dB and 4.8 dB, respectively, as shown in Fig. 4. It can be seen that, for μ = 50 and 150, the BER performance degrades to around 10–2 and 10–1.5 , respectively, for SNR of 18 dB as shown in Fig. 5. Thus, in the case of mu law companding, a high signal-to-noise ratio (SNR) is required to achieve tolerable BER values. It is observed that in mu law companding, as mu is increased, the PAPR decreases, but the BER values are low when compared to the original GFDM system. PAPR and BER performances of root companding for r = 0.2, 0.5 and 0.8 are also studied in Figs. 6 and 7. It can be seen in Fig. 6, that for a CCDF of ≈ 10–3 , the PAPR is the minimum (2.6 dB) when r = 0.2. But, root companding with r = 0.8 yields the minimum BER among the other values of r as shown in Fig. 7. In root companding, the PAPR reduction improves as r is decreased with a compromise in BER.
Performance Analysis of Nonlinear Companding Techniques for PAPR …
47
BER Plot
0
10
-1
10
-2
BER
10
-3
10
-4
10
mu=10 m=50 mu=150 Without Companding
-5
10
-6
10
0
2
4
6
8
10
SNR dB Fig. 5 BER plot for mu law companding
Fig. 6 CCDF plot for root companding
12
14
16
18
48
N. R. Gopan and S. Sujatha
Fig. 7 BER plot for root companding
In exponential companding, Figs. 8 and 9 depict that, as the companding factor d is increased to 1, the PAPR decreases and the BER performance improves. But, as d is increased beyond 1, the PAPR further decreases, which is an advantage, but at the expense of degrading BER performance. Hence, we can conclude that, for exponential companding, d = 1 is an optimum value for better PAPR reduction and BER performance. In Fig. 10, a comparative study of all the three companding techniques is carried out, and it can be seen that r = 0.2, d = 1.3 and d = 1 offer the best PAPR reduction out of all the values. But, the BER performance is the best for r = 0.8, r = 0.5 followed by d = 1. Hence, we can conclude that exponential companding with d = 1 is a better choice as it provides a comparable performance for both PAPR and BER. Table 2 gives a comparison of all the three companding techniques for different parameter values and their effect on BER performance and PAPR reduction (Figs. 10 and 11).
Performance Analysis of Nonlinear Companding Techniques for PAPR …
Fig. 8 CCDF plot for exponential companding
Fig. 9 BER plot for exponential companding
49
50
N. R. Gopan and S. Sujatha
Fig. 10 CCDF plot for comparison of mu law, root and exponential companding
Table 2 Comparison of PAPR and BER values for different companding techniques
Companding technique
Parameter values
PAPR at ≈ 10–3 (dB)
BER at SNR = 14 dB
Mu law companding
μ = 10
6.8
≈ 10–2
μ = 50
5.2
≈ 10–1.5
μ = 150
4.8
≈ 10–1
r = 0.2
2.6
≈ 10–1
r = 0.5
6
≈ 10–3
r = 0.8
9.6
≈ 10–4
d = 0.5
4.2
≈ 10–2
d=1
3
≈ 10–2.5
d = 1.3
2.6
≈ 10–2
Root companding
Exponential companding
Performance Analysis of Nonlinear Companding Techniques for PAPR …
51
Fig. 11 BER plot for comparison of mu law, root and exponential companding
6 Conclusion The implementation and evaluation of companding techniques for PAPR mitigation in GFDM were studied. A GFDM system was implemented with K = 128 subcarriers and M = 5 sub-symbols over an AWGN channel. The exponential companding technique has not been studied and implemented in GFDM systems till date. Hence, this work evaluates PAPR performance of a GFDM system with an HPA for exponential companding and compares it with the existing mu law and root companding technique. Simulation results show that exponential companding technique provides comparable PAPR and BER performance than mu law and root companding techniques.
References 1. Michailow N, Matthé M, Gaspar IS, Caldevilla AN, Mendes LL, Festag A, Fettweis G (2014) Generalized frequency division multiplexing for 5th generation cellular networks. IEEE Trans Commun 62(9):3045–3061 2. Matthé M, Gaspar IS, Mendes LL, Zhang D, Danneberg M, Michailow N, Fettweis GP (2017) Generalized frequency division multiplexing: a flexible multi-carrier waveform for 5G. 5G mobile communications. Springer, Cham, pp 223–259 3. Rahmatallah Y, Mohan S (2013) Peak-to-average power ratio reduction in OFDM systems: a survey and taxonomy. IEEE Commun Surv Tutor 15(4), Fourth Quarter
52
N. R. Gopan and S. Sujatha
4. Tiwari S, Paulus R (2020) Non-linear companding scheme for peak-to-average power ratio (PAPR) reduction in generalized frequency division multiplexing. J Opt Commun 5. Li Y, Deng W (2018) A novel piecewise nonlinear companding transform for PAPR reduction in GFDM. In: 10th international conference on wireless communications and signal processing (WCSP), pp 1–5 6. Sharifian Z, Omidi MJ, Farhang A, Saeedi-Sourck H (2015) Polynomial-based compressing and iterative expanding for PAPR reduction in GFDM. In: 23rd Iranian conference on electrical engineering, pp 518–523 7. Mishra SK, Kulat KD (2019) Approximation of peak-to-average power ratio of generalized frequency division multiplexing. AEU Int J Electron Commun 99:247–257 8. Liu K, Deng W, Liu Y (2018) An efficient nonlinear companding transform method for PAPR reduction of GFDM signals. In: IEEE/CIC international conference on communications in China (ICCC), pp 460–464 9. Celina MF, Vimala P (2020) Generalized frequency division multiplexing system performance with companding transform and pulse shaping filter. Material Today Proc 10. Paredes MCP, Grijalva F, Carvajal-Rodríguez J, Sarzosa F (2017) Performance analysis of the effects caused by HPA models on an OFDM signal with high PAPR. In: IEEE second Ecuador technical chapters meeting (ETCM), pp 1–5 11. Wang X, Tjhung TT, Ng CS (1999) Reduction of peak-to-average power ratio of OFDM system using a companding technique. IEEE Trans Broadcast 45(3):303–307 12. Chang P-H, Jeng S-S, Chen J-M (2010) Utilizing a novel root companding transform technique to reduce PAPR in OFDM systems. Int J Commun Syst 23:447–461 13. Jiang T, Yang Y, Song Y-H (2005) Exponential companding technique for PAPR reduction in OFDM systems. IEEE Trans Broadcast 51(2):244–248
Tensor Completion-Based Data Imputation Framework for IoT-Based Underwater Sensor Network Govind P. Gupta
and Prince Rajak
Abstract In the IoT-based Underwater Sensor Network (IoT-USN), a set of underwater sensor nodes are deployed for monitoring of the marine ecosystems. These deployed nodes perform sensing tasks and periodically send the observations to the base station for additional data analytic and processing. In the IoT data analytics over the marine data, it is observed that many observations collected at the base station are missing due to malfunctions of sensor nodes, packet loss during wireless communication. Data imputation of the missing data entries for efficient data analytics is a very fundamental research issue in the real-time monitoring of the marine ecosystems. Thus, to impute the missing data problem, a data imputation framework using tensor completion techniques is proposed in this paper for an IoTbased Underwater Sensor Network system. In the proposed framework, three wellknown tensor completion algorithms such as Bayesian Gaussian CP decomposition (BGCP), Bayesian Augmented Tensor Factorization (BATF), and HaLRTC are used for data imputation. In the proposed framework, a 3D tensor of shape 19 × 720 × 5 is used for employing the data imputation technique. Result evaluation of the proposed framework is done using real-time marine ecosystem dataset in terms of MAPE and RMSE. Keywords Data imputation · Tensor completion · Spatiotemporal data · Underwater sensor
1 Introduction For the monitoring of the marine ecosystems, IoT-based Underwater Sensor Network (IoT-USN) is employed in which a set of underwater sensor nodes and one or more sink nodes are deployed for collection of the observed sensing data. Nature of these observed data is of big data and most of the data analytics tools require rich and full data for efficient prediction and processing [1]. For analytics over marine ecosystem G. P. Gupta (B) · P. Rajak Department of Information Technology, National Institute of Technology, Raipur, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 689, https://doi.org/10.1007/978-981-99-2322-9_5
53
54
G. P. Gupta and P. Rajak
systems, rich and complete data are required to get different analytics with high precision and accuracy. However, it is observed that collected data suffer from missing data problems as a result of different issues like interference in the environment, data loss during transmission of data packets and due to failure of intermediate forwarding nodes or malfunctioning of data generation sources themselves, such as failure of a sensor. Imputation of the missing data is a fundamental research issue due to its big data nature and requirement of high precision and accuracy-based analytics [2–5]. Missing data can be recovered using a variety of approaches. Imputing values based on nearby values, such as using closest neighbor algorithms, or data imputation based on the mean of numerous values are two examples. However, for spatiotemporal datasets, these strategies are ineffective. Spatiotemporal refers to both time and space, implying that it fluctuates in both space and time. Generally, traditional schemes can perform better for a small percentage of the missing data, such as 5% or 10%, but when the proportion of missing data increases beyond these levels, the results tend to degrade. As a result, typical data imputation techniques cannot be utilized on this sort of big data since they would exacerbate rather than solve the problem [3–6]. Tensor completion (TC) techniques can be effective techniques to impute a large set of missing data in a big data analytics problem. As a result, this paper proposed a data imputation framework for IoT-based Underwater Sensor Networks designed for monitoring marine ecosystems, using TC techniques like BGCP, BATF, and HaLRTC which are employed for data imputation tasks. Here, high-accuracy lowrank TC is named as HaLRTC. In the proposed framework, a 3D tensor of shape 19 × 720 × 5 is used for employing the data imputation technique. Result analysis and performance comparison of the proposed framework are done using a benchmark marine ecosystem dataset in terms of MAPE and RMSE under random and nonrandom settings. This paper is structured as follows. Section 2 discusses the brief review of the related works based on data imputation of missing observation. Section 3 presents a brief description of the missing data problem and representation of dataset in a 3D tensor. Section 4 explains the detailed description of the proposed framework. Section 5 presents result evaluation and analysis. Finally, Sect. 6 presumes the paper with some future work.
2 Related Work In [1], the author has extended the probabilistic matrix factorization. Bayesian Gaussian CANDECOMP/PARAFAC is the name of the proposed model. They employed traffic data acquired from Guangzhou, China, to apply it to tensors of high rank. The authors compare their suggested work to existing methodologies. For all methods, the fraction of missing data has been changed between 10 and 50%. Frequent missing data and random missing data are both utilized. In all instances, BGCP is proven to
Tensor Completion-Based Data Imputation Framework for IoT-Based …
55
provide adequate performance. In this instance, it was also discovered that GGCP performs well with the three-order tensor. In [2], the Bayesian Augmented Tensor Factorization model is discussed. This model is evaluated using a traffic dataset of the Guangzhou city, China. This approach is said to be more capable of imputation than Bayesian tensor factorization. For 10, 30, and 50% missing, the missing situations evaluated are both random and non-random. BATF’s RMSE and MAPE results are compared to those of other approaches like BGCP, BCPF, and STD. In the case of a random missing rate, BATF outperforms STD and has a smaller effect with a spike in missing rate. BCPF has a minor edge over BATF and BGCP. BATF still outperforms the other two approaches in nonrandom missing. As a result, the author argues that the suggested BATF technique may predict values by leveraging the spatiotemporal correlation. In [3], working of the low-rank tensor completion with truncated nuclear norm model is discussed. These methods were tested using a traffic dataset. The LRTCTNN model used a comprehensive rate parameter to control the degree of shortness on all tensor modes. Experiments were conducted on four sets of spatiotemporal traffic data. This research work has considered a high missing percentage such as 50–70 percent under random and non-random situations. LRTC-TNN methods perform well compared to the remaining methods such as BGCP, BTMF, and HaLRTC. In [4], Zhen et al. discussed review of the tensor completion techniques for recovering within visual data. In [5], a detailed description of the BATF model is discussed. In [6], Chen et. al. discussed Bayesian temporal factorization model for time series prediction for imputing the missing values. A low-rank tensor learning based model is discussed in [7] for traffic data recovery.
3 Description of the Missing Data Problem and Representation of Dataset in a 3D Tensor Many times, data are lost during transmission from sender to receiver owing to issues with the underlying infrastructure. Data loss or distortion can also be caused by noise in the media or interference. Missing data can also be caused by malfunctioning data-gathering technology, such as sensors. Multiple redundant sensors are installed near each other to deal with this problem. It is now improbable that all of them will start malfunctioning at the same time. In this research work, we will study the performance of several tensor completion strategies to deal with missing data in these cases. The sensor readings will be organized into three-dimensional tensors. This research work has considered random and non-random missing settings for analysis of the performance. • Random Missing (RM) Setup: In the IoT-USN, many sensor nodes are deployed for observing the monitoring of marine ecosystems. In this scenario, any set of these sensor nodes might lose data at any point in time; as a result, there will
56
G. P. Gupta and P. Rajak
be sporadic missing data. This type of missing value has no discernible pattern. Figure 1 shows the random missing observation scenario. • Non-Random Missing (NRM) Setup: In this scenario, a sensor node may suffer the loss of data on specified days or times in the non-random missing situation. In other words, a pattern can be detected in the moment when a sensor truncates the values. Figure 2 shows the non-random missing scenario. For imputation of the missing observation in the IoT-USN system, first steps should be representation of the observed dataset into a 3D tensor. Figure 3 illustrates the representation of the water temperature dataset into a 3D tensor where it is assumed that there are n sensor nodes sending their observed data at the base station at which a 3D tensor is defined for storing and processing of the received data frame. This tensor representation is used as input to tensor completion techniques for imputation of the missing observations.
Fig. 1 Random missing observation scenario in the IoT-USN system
Fig. 2 Non-random missing value scenario in the IoT-USN system
Tensor Completion-Based Data Imputation Framework for IoT-Based …
57
Fig. 3 Representation of the water temperature data into a 3D tensor
4 Tensor Completion-Based Data Imputation Framework for the IoT-USN System In the IoT-based USN system for the monitoring of marine ecosystems, nearby sink nodes are responsible for accumulating the observed data, arriving from deployed IoT nodes as sensing observation. At each sink node, a data imputation module is installed for performing the imputation of the missing fields in the observation matrix. For imputation of the missing observation, this research work employs tensor completion-based techniques. Figure 4 illustrates the block diagram of the proposed tensor completion-based data imputation framework for the IoT-USN system. In the proposed framework, the first step is to represent the collected observations from different sensor nodes into a series of the 3D tensor so that tensor completion techniques can be easily employed for imputing the missing observation by exploiting the spatial and temporal correlations. The proposed framework has employed three recognized tensor completion techniques, for instance, BGCP, BATF, and HaLRTC for data imputation tasks. Explanations of these techniques are discussed as follows.
Fig. 4 Tensor completion-based data imputation framework
58
G. P. Gupta and P. Rajak
• BGCP: This is a completely Bayesian tensor factorization model that learns the latent factor matrices using Markov chain Monte Carlo (MCMC) (i.e., low-rank structure) [1]. The first component of this approach is CP decomposition, the second part is Gaussian assumptions, and the third part is the Bayesian framework. In the Canonical Polyadic (CP) decomposition, tensor rank decomposition is used in which the aggregate of rank one tensors is defined as rank CP(B) = S, which is the rank of the CP representation. The following formula is used for CP decomposition: r
Yˆ =
u s ◦ vs ◦ xs , yˆi jt =
s=1
r
u is v js xts , ∀(i, j, t),
(1)
s=1
Here, vectors u s ∈ K m , vs ∈ K n , xs ∈ K f are factor matrices. Vector outer product is represented using a ◦ symbol. Gaussian assumption is the second step that is employed. In this step, let us assume that a given matrix y ∈ Rm×n×f which is having many missing values. Following factorization expression is used to reconstruct the missing values within y as shown in Eq. (2). yi jt ∼ N
r
u is v js xts , τ
−1
, ∀(i, j, t).
(2)
s=1
Here, columns of latent factor matrices are denoted by vectors u s ∈ Rm , vs ∈ Rn , xs ∈ Rf , respectively. In a Bayesian environment, the basic notion of Bayesian inference [5] is used to calculate the posterior distribution as a result of the prior distribution and likelihood function. Equation (3) is used to figure out the model parameters as given below. yi jt ∼ N μ + φi + θ j + ηt +
r
u ik v jk xtk , τ −1 , ∀(i, j, t).
(3)
k=1
Here, N Gaussian distribution, τ represents precision. • BATF: This is a complete Bayesian model which uses a concept of a unique tensor factorization, with explicit and latent factor. These factors are generally lowrank factors. In this technique, features of the dataset are learnt using variational Bayes. The main goal of this approach is to solve the problem of missing values in multivariate time series data through bending the data along day dimensions using a low-rank tensor structure. CP decomposition, tensor unfolding, Gibbs sampling, and vector combination are used in BATF [2]. • HaLRTC: This model uses NN minimization to find an appropriate estimation for the missing fields [3]. In the HaLRTC model, a temporal dimension is included in the initial time series matrix, to study the underlying patterns and seasonality
Tensor Completion-Based Data Imputation Framework for IoT-Based …
59
factors. This technique first represents the incomplete dataset into a tensor structure and converts the imputation problems into a low-rank tensor completion problem. In addition to diminishing the tensor rank, an autoregressive norm on the original matrix is used.
5 Result Evaluation and Analysis This section presents experimental results and comparative study of the proposed framework using three latest TC-based schemes, for instance BGCP, BATF, and HaLRTC. Root Mean Squared Error (RMSE) and Mean Absolute Percentage Error (MAPE) are used for performance comparison. In the next section, description of the water sensor dataset collected for monitoring of the marine ecosystem is used.
5.1 Description of Water Sensor Dataset The data from the water sensor were obtained from Australia’s tropical marine research agency, the Australian Institute of Marine Science (AIMS) [8]. We have chosen ocean underwater sensor data from the Australian maritime area. AIMS has a variety of data, such as data from temperature loggers, data from corals throughout the world, data from ocean monitoring surveys, and so on, but we have chosen temperature as the most important data we have acquired from numerous sensors. In this research work, we have selected five sensors from the Australian region, namely CORALBAY 2, ELLES MA, TANTABIDDI SL1, 19131FL1, and 19138FL1, and picked water temperature data from these sensors, which deliver hourly measurements, to generate our dataset, using water temperature as the primary variable. The data were collected for research purposes between September 2 and October 20, 2021. The produced dataset contains the readings of five sensors over a certain period (hourly reading). The dataset is described in Fig. 5, the time series charting of water temperature and date is described in Fig. 6, and the sensor position in the Australian region is described in Fig. 7.
5.2 Result Analysis in Terms of RMSE and MAPE Figure 8 shows the experimental results in terms of MAPE for a random missing scenario. It can be observed from Fig. 8 that the percentage of the missing ratio increases from 20 to 60%, and the value of the MAPE increases. However, the rate of the growth for BGCP-based schemes is low and high for HaLRTC-based. The BGCP-based scheme performs better than the other two schemes. Similarly, Fig. 9 shows the experimental results in terms of MAPE for a non-random missing scenario.
60
G. P. Gupta and P. Rajak
Fig. 5 Describe the dataset which we have used in the experiment
Fig. 6 Plot between water temperature and date [8]
In this experiment, the missing ratio varied from 20 to 60% to observe the behavior of the proposed framework using three different tensor completion techniques. From the figure, it can be observed that in a non-random scenario, a BATF-based approach performs better than the remaining techniques. As the percentage of the missing ratio increases for BGCP and HaLRTC schemes, MAPE also increases. However, the rate of growth in MAPE is low in case of BATF and high in case of HaLRTC. Figure 10 shows the result analysis of the proposed framework for data imputation task in terms of RMSE by altering the missing percentage from 20 to 60% and considering random missing scenarios. It can be observed from Fig. 10 that as the missing ratio increases, RMSE also increases. However, the rate of growth in RMSE
Tensor Completion-Based Data Imputation Framework for IoT-Based …
61
Fig. 7 Yellow box shows the water sensor planted in Australian region [8] 0.012
Fig. 8 Result analysis in terms of MAPE for water temperature data for random scenario
BGCP BATF HaLRTC
0.01
MAPE
0.008 0.006 0.004 0.002 0
20
30
40
50
60
% of random missing 0.8
Fig. 9 Result analysis in terms of MAPE for water temperature data for non-random scenario
0.7
BGCP
BATF
HaLRTC
MAPE
0.6 0.5 0.4 0.3 0.2 0.1 0
20
30
40
% on non-random missing
60
62
G. P. Gupta and P. Rajak
0.6
Fig. 10 Result analysis in terms of RMSE for random missing scenario
RMSE
0.5 0.4
BGCP BATF HaLRTC
0.3 0.2 0.1 0
20
30
40
50
60
% of the random missing
30
Fig. 11 Result analysis in terms of RMSE for non-random scenario
RMSE
25
BGCP
BATF
HaLRTC
20 15 10 5 0
20
30
40
60
% of non-random missing
is low in both BGCP and BATF, and for HaLRTC, it is high. Performance of the HaLRTC is poor due to high RMSE. Similarly, Fig. 11 shows the result in terms of RMSE in a non-random missing scenario. It can be seen from Fig. 11 that performance of the BATF is best with low RMSE compared to others. However, RMSE for the HaLRTC is high.
6 Conclusion This paper has discussed a data imputation model using TC-based techniques for the IoT-USN. In this research work, imputation of the missing observations is solved using TC-based techniques. For analysis of the proposed framework, three TCbased techniques are utilized such as BGCP, BATF, and HaLRTC and experiments are completed under random and non-random missing settings using water quality dataset. From the result analysis, it is observed that the BATF-based model performs better compared to others in both missing data scenarios. We can state that for the water quality dataset, BATF performs somewhat better in the random missing situation than BGCP and much better in the non-random missing case in terms of RMSE.
Tensor Completion-Based Data Imputation Framework for IoT-Based …
63
When compared to the HaLRTC, the BGCP performs better in both random and nonrandom settings. Even in non-random missing cases, the method BATF can function effectively for real-world water quality data since it has very low error values. In future, we plan to extend this work for a high dimensional dataset of the smart cities’ project. Acknowledgements This work was supported by the MATRICS Project funded by the Science and Engineering Research Board (SERB), India, under the Grant MTR/2019/001285.
References 1. Chen X, He Z, Sun L (2019) A Bayesian tensor decomposition approach for spatiotemporal traffic data imputation. Transp Res Part C Emerg Technol 98:73–84 2. Chen, X, He Z, Chen Y, Lu Y, Wang J (2019) “Missing traffic data imputation and pattern discovery with a Bayesian augmented tensor factorization model. Transp Res Part C Emerg Technol 104:66–77 3. Chen X, Yang J, Sun L (2020) A nonconvex low-rank tensor completion model for spatiotemporal traffic data imputation. Transp Res Part C Emerg Technol 117:102673 4. Long Z, Liu Y, Chen L, Zhu C. Low rank tensor completion for multiway visual data. Signal Process 155:301–316 5. Chen X, He Z, Chen Y, Lu Y, Wang J. Missing traffic data imputation and pattern discovery with a Bayesian augmented tensor factorization model. Transp Res Part C Emerg Technol 104:66–77 6. Chen X, Sun L (2021) Bayesian temporal factorization for multidimensional time series prediction. IEEE Trans Pattern Anal Mach Intell 7. Chen X, Chen Y, Saunier N, Sun L (2021) Scalable low-rank tensor learning for spatiotemporal traffic data imputation. Transp Res Part C Emerg Technol 129:103226 8. Water temperature dataset: online access at https://www.aims.gov.au/
Pre-training Classification and Clustering Models for Vietnamese Automatic Text Summarization Ti-Hon Nguyen and Thanh-Nghi Do
Abstract Our investigation aimed to propose a new single-document extractive text summarization model, which consists of a classifier and a summary component based on pre-trained clustering models. First, we train the classifier on the training data set, then we train the clustering models on the subtraining data sets, and split from the entire training data set. In the summary process, the summary model uses the classifier to predict the input text label and then uses this label to choose the corresponding clustering model for summarizing. The model’s numerical test results on the Vietnamese data set based on ROUGE-1, ROUGE-2, and ROUGE-L are 51.50%, 16.26%, and 29.25%, respectively. In addition, our model can perform well on cost-effective resources like an ARM CPU to summarize large amounts of documents. Keywords Text summarization · Extractive · Stochastic gradient descent · Clustering
1 Introduction H. P. Luhn introduced an automatic text summary in 1958 [11]. That is creating a compressed version of one or one set of input text while keeping the primary ideas of the input content [1]. Current research focuses on two main approaches to producing the result: extractive and abstractive [1]. Abstractive methods create the summary by paraphrasing, and extractive methods make the summary by selecting sentences from the input. As a result, the summary given by an extractive model is more grammatical and fluent than the other. T.-H. Nguyen (B) · T.-N. Do Can Tho University, Can Tho, Vietnam e-mail: [email protected] T.-N. Do e-mail: [email protected] T.-N. Do UMI UMMISCO 209, IRD/UPMC, Paris, France © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 689, https://doi.org/10.1007/978-981-99-2322-9_6
65
66
T.-H. Nguyen and T.-N. Do
In the extractive technique, the summary model using the pre-training clustering model [16, 17], trained on the training data set, has some advantages such as being effective in time and having a high ROUGE score [8]. However, the previous work introduced the clustering models on the training data set while it included many topics, which can reduce the quality of the output. Therefore, we propose solving this problem by training the clustering models on the subdata training data set and split from the whole training data set based on the topic of its records. In addition, we implement a classifier to obtain the input text label before passing it into the summary model. The contribution of this work is the introduction of the new extractive text summary model for single-document based on the combination of classification and clustering models. This model also performs well on a large-scale data set, saves time, and can be deployed on cost-saving resources such as ARM CPUs. The remainder of this paper is as follows: Sect. 2 contains the related work, Sect. 3 is about the summarization model, Sect. 4 is the experiment and results, and Sect. 5 is the conclusion and further work.
2 Related Work Nguyen proposed the summary model [16] based on the clustering model and presented the results of the numerical test on the Vietnamese data set. Our model inherited Nguyen’s model using the pre-trained clustering model as the standard for scoring the candidate sentence in the summary model. However, our model differs from Nguyen’s in the pre-trained models and word representation. We added the classifier layer before the summary component and trained the clustering models on the subtraining data set instead of the entire training data set. We also used three kinds of word meaning representation, including word-to-vector (W2V) [14], Glove [19], and Fasttext [6], while Nguyen’s model used only bag-of-word [4] as the frequency word representation. Our summary model used the centroids to extract the summary sentences, related to the extractive model proposed by Radev et al. [20, 21]. However, these summary models have differences in the way of choosing the centroids and the number of input documents. First, D. R. Radev uses the TF-IDF [10] score of words in the corpus to determine the centroids. In contrast, we use pre-trained clustering models to define the centroids. The next difference is D. R. Radev proposed a multi-document summarization model, but we introduce a single-document one. Our model used word meaning representation and pre-trained the clustering model like the summary model proposed by Rossiello [22]. However, we trained the clustering models on the subtraining data set, but G. Rossiello trained on the entire training data set. Moreover, our summary model is a single-document instead of a multi-document summary of G. Rossiello.
Pre-training Classification and Clustering Models …
67
Fig. 1 Overall summary model
3 Model 3.1 Proposed Summary Model The proposed summary model has two main components, a stochastic gradient descent (SGD) classifier and a summary component based on kmeans clustering model [9, 13] as illustrated in Fig. 1. Algorithm 1 [16, 17] describes the detail of implementation in the summary component. The training data set is used to train the word representation model, including word-to-vector (W2V), Glove, Fasttext, and TF-IDF weights. Then, we prepare the SGD classifier on the training data set using the TF-IDF word representation. Next, we split the training data set into subtraining sets based on its sample topic. Then, using the three trained word meaning representation models, we trained the separated k means models on these subsets. One subset and one word embedding model will contribute to a kmeans model. In the summary process, the summary model uses the classifier to predict the label of the input text to choose the corresponding kmeans model for summarizing.
68
T.-H. Nguyen and T.-N. Do
Algorithm 1 The summary algorithm for an input text d 1: Given: clustering model from the training step clustering_model, document d, number of output sentence n 2: Initialize an empty set of sentences S, an empty set of vectors X , an empty set of vectors V , a zero-length text summar y 3: S ← s sentences is split from d 4: for s ∈ S do 5: X ← x vector of s 6: end for 7: C ← c cluster centroid in clustering_model 8: for x ∈ X do 9: c ← f (x, C) function f (x, C) return the closest c ∈ C with x 10: V ← (x_idx, x_dis) x_idx is index of x in X , x_dis is cosine distance of x and c 11: end for 12: sort V with the increase of x_dis 13: for i = 1 to n do 14: summar y = summar y + S[V [i].x_idx]+ “. ” 15: end for 16: return summar y
3.2 The kmeans Clustering The kmeans algorithm is widely applied in clustering due to its strength and simplicity. We use Mini-batch kmeans [9, 12, 23], an improved version of kmeans to deal with a large-scale data set, to implement the summary component. The detail of Mini-batch kmeans is indicated in Algorithm 2. Algorithm 2 Mini-batch kmeans 1: Given: k, mini-batch size b, iterations t, dataset X 2: Initialize each c ∈ C with an x picked randomly from X 3: v ← 0 4: for i = 1 to t do 5: M ← b examples picked randomly from X 6: for x ∈ M do 7: d[x] ← f (C, x) 8: end for 9: for x ∈ M do 10: c ← d[x] 11: v[c] ← v[c] + 1 1 12: η ← v[c] 13: c ← (1 − η)c + ηx 14: end for 15: end for
Cache the center nearest to x Get cached center for this x Update per-center counts Get per-center learning rate Take gradient step
In Algorithm 2, the number of clusters is k, the set of cluster centers C, the cluster center c ∈ R m , |C| = k, the collection of vectors x is X , f (C, x) is the function for finding the closest cluster center c ∈ C to x based on the Euclidean distance.
Pre-training Classification and Clustering Models …
69
3.3 Stochastic Gradient Descent Stochastic gradient descent (SGD) is a simple and efficient algorithm for large-scale learning because its computational complexity corresponds to linear with the number of points in the training data [3]. Therefore, we use the SGD algorithm to implement the classification model. Given a classification task with the data set D = [X, Y ] including of m datapoints X = {x1 , x2 , . . . , xm } in the n-dimensional input space R n , having corresponding labels Y = {y1 , y2 , . . . , ym } being {cl1 , cl2 , . . . , cl p }. The SGD algorithm tries to find p separating planes (w1 , w2 , . . . , w p ∈ R n ) for p classes (cl1 , cl2 , …, cl p ). It can be done by the Eq. (1). m λ 1 2 L(w p , [xi , yi ]) min Ψ (w p , [X, Y ]) = w p + 2 m i=1
(1)
where the errors are measured by L(w p , [xi , yi ]) = max{0, 1 − yi (w p .xi )} and a positive constant λ is to control the regularization strength (w p 2 ). Studies in [2, 24] show that the SGD algorithm solves the function (1) by using a learning rate η to update w in T epochs. In every epoch t, the SGD uses one randomly datapoint (xi , yi ) picked in the mini batch Bi to compute the subgradient ∇t Ψ (w p , [xi , yi ]) and update w p as in the follows: w p = w p − η∇t Ψ (w p , [xi , yi ])
(2)
3.4 Word Representation TF-IDF: Jones introduced tf-idf [5] in 1972 to increase the weight of rare words and eliminate the dependence on stop-word lists to represent text data. The weighted value tf-idf wt,d for word t in document d can be calculated by (3). N (3) wt,d = log10 (count(t, d) + 1) × log10 d ft where N is the total number of records in the data set, and d f t is the number of the samples in which term t occurs. Word Embeddings: Word embeddings are short, dense vectors, with the number of dimensions d ranging from 50 − 1000, rather than the much more extensive vocabulary size in the term frequency model. As a result, word embedding can better capture synonymy and save computer resources for training machine learning models. We used word embedding for clustering models, including word-to-vector, Glove, and Fasttext. The word-to-vector [15] methods are fast and efficient to train. Glove [18]
70
T.-H. Nguyen and T.-N. Do
is based on ratios of probabilities from the word–word co-occurrence matrix, combining intuitions of count-based models. Fasttext [7] deals with unknown words and sparsity in languages with rich morphology using subword models.
3.5 Cosine Distance Given vector a and vector b with n dimension, the cosine distance da,b of them can be calculated by (4). n ai bi a · b = 1 − i=1 (4) da,b = 1 − cos(θ ) = 1 − n n 2 2 | a ||b| a b i=1 i i=1 i θ is the angle formed by a and b. where ai , bi is the elements of a and b,
3.6 ROUGE ROUGE [8] is the metric for automatically evaluating the output of the text summarization model. ROUGE calculates the recall, precision, and F1 score by comparing the system summary (the summary created by the model) with the reference summary (the gold standard summary). In ROUGE, the “overlapping word” is the word visible in both the system and the reference summaries. The “overlapping word” is determined by the n-gram model in ROUGE-N and the longest common subsequence in ROUGE-L. The ROUGE recall, precision, and F measure can be calculated by (5). SO Recall = SR SO (5) Precision = SS 2P R F1 = P+R where SO , S R , and SS are the total of “overlapping words”, the sum of the number of words in the reference summary, and the count of words in the system summary, respectively.
Pre-training Classification and Clustering Models …
71
4 Experiment 4.1 Data Set We evaluate our model on the VNText [16] data set. This data set has 1,101,101 samples, each with titles, subtitles, and primary content fields. The main content is used as the input of the summary model, and the subtitle is used for the reference summary in the evaluation. The VNText is divided into three subsets: the training set with 880,895 samples, the validation set with 110,103 records, and the testing set with 110,103 documents. Table 1 shows the details of the VNText data set. In Table 1, W _sent is the average word number in one sentence, W _r ecor d is the average word number per record, S_r ecor d is the average sentences number per record, T est_r e f is the reference summary in the testing data set, and V al_r e f is the reference summary in the validation data set.
4.2 Evaluating Computer We use one computer configuration for all processes. Its CPU is a four-core ARM Neoverse-N1, 2.8 GHz, four cores, four threads, 24 GB Ram, and 150.34 MB/s HDD speed.
4.3 Training, Summarizing, and Parameters We trained the W2V, Glove, Fasttext, and SGD models and calculated TF-IDF on the training data set, turning parameters on the validation and evaluating the summary model on the testing set. In addition, we split the training set into 16 subtraining data sets based on the class of its sample for training the kmean models. The word representation for training kmeans is W2V, Glove, and Fasttext, respectively. The parameter values of these pre-trained models are shown in Table 2.
Table 1 VNText detail Information
Train
Test
Val
Test_ref
Val_ref
Recor d Sents W or ds W _sent W _r ecor d S _r ecor d
880,895.00 18,738,333.00 452,686,377.00 24.16 513.89 21.27
110,103.00 2,342,296.00 56,563,630.00 24.15 513.73 21.27
110,103.00 2,333,519.00 56,417,036.00 24.18 512.40 21.19
110,103.00 155,573.00 4,163,723.00 26.76 37.82 1.41
110,103.00 155,496.00 4,170,809.00 26.82 37.88 1.41
72 Table 2 Parameter values Model/object W2V and fasttext
Glove
SGD Data set k means Summary
T.-H. Nguyen and T.-N. Do
Parameters
Value/range
Context_windows_size Related_word_algorithm Epochs Embedding_dimention Context_windows_size Related_word_algorithm Epochs Embedding_dimention epochs Loss Number_of_subtraining Number_of_cluster Number_of_output_sentences
10 Skip_gram 5 100 15 Skip_gram 15 300 20 Hinge 16 1000 1,2,3,4,5
4.4 Results We illustrate the numerical testing score F1 for the experimentation with ROUGE1, ROUGE-2, and ROUGE-L. In the results tables, column Model is the model name and column n is the sentence number of the output summary. The name of the models consisting of the word embedding (w2v, glove, and fasttext—ft) name and the classifier name (sgd), the _tr suffix indicates that the model uses the actual class of the input text. In parallel, we also present the processing time to measure how efficient the summary model is. ROUGE. The F1 score based on ROUGE is presented in Tables 3, (4), (5) and Figs. 2, 3, 4, which included ROUGE-1, ROUGE-2, and ROUGE-L. In ROUGE-1, when n = 1 the model using the Glove embedding model produces impressive results glove_tr with 51.52% and glove_sgd with 51.50%. The F1 score of fasttext_sgd and w2v_sgd are also high with 51.34% and 51.24%, it is much lower than when using the true input class with 0.01% and 0.02%. When n ∈ {2, 3, 4, 5}, the results of w2v_sgd are better than those of the other models. The F1 will decrease when more sentences are produced in the output summary. In ROUGE-2 and ROUGE-L, when n = 1, the Glove embedding summary models are better than others, glove_tr with 16.38% and glove_sgd with 16.36%. However, when n from 2 to 5, the Fasttext summary models are better in ROUGE-2, and the W2V summary models are better in ROUGE-L (Tables 4 and 5). Times. The spending time in our experiment is presented in Table 6 and Fig. 5, includes the time for pre-training models, running summarize the testing data set, and evaluating ROUGE score. In training processes, the model using Fasttext embedding requires the highest time, with 7.41 h because of the training Fasttext model. On the other hand, in the
Pre-training Classification and Clustering Models … Table 3 F1 score on ROUGE-1 Model n=1 n=2
F1 score (%)
ft_tr glove_tr w2v_tr ft_sgd glove_sgd w2v_sgd
51.35 51.52 51.26 51.34 51.50 51.24
52 48 44 40 36 32 28 24 20
n=3
42.31 41.91 42.44 42.28 41.88 42.42
ft tr
73
34.02 33.51 34.21 34.00 33.47 34.19
glove tr
w2v tr
ft sgd
n=4 28.33 27.81 28.52 28.31 27.78 28.51
n=5 24.38 23.89 24.56 24.36 23.87 24.55
glove sgd w2v sgd
models name n=1
n=2
n=3
n=4
n=5
Fig. 2 Line chart of F1 score on ROUGE-1
F1 score (%)
16
12
8 ft tr
glove tr
w2v tr
ft sgd
glove sgd w2v sgd
models name n=1
n=2
Fig. 3 Line chart of F1 score on ROUGE-2
n=3
n=4
n=5
74
T.-H. Nguyen and T.-N. Do
F1 score (%)
32 28 24 20 16
ft tr
glove tr
w2v tr
ft sgd
glove sgd w2v sgd
models name n=1
n=2
n=3
n=4
n=5
Fig. 4 Line chart of F1 score on ROUGE-L Table 4 F1 score on ROUGE-2 Model n=1 n=2 ft_tr glove_tr w2v_tr ft_sgd glove_sgd w2v_sgd
16.37 16.38 16.25 16.34 16.36 16.23
13.20 12.92 13.16 13.18 12.89 13.14
Table 5 F1 score on ROUGE-L Model n=1 n=2 ft_tr glove_tr w2v_tr ft_sgd glove_sgd w2v_sgd
29.25 29.26 29.22 29.23 29.25 29.20
26.72 26.60 26.76 26.71 26.58 26.75
n=3 10.72 10.36 10.72 10.71 10.34 10.71
n=3 23.56 23.38 23.64 23.55 23.37 23.63
n=4 9.07 8.70 9.09 9.06 8.68 9.07
n=4 20.96 20.76 21.05 20.95 20.74 21.04
n=5 7.93 7.57 7.96 7.92 7.55 7.94
n=5 18.91 18.69 19.00 18.89 18.68 18.99
summarizing process, the Glove embedding model needs more time than the others, glove_sgd with 1.14 h, because the dimensions of the Glove vector are 300, in contrast to the dimensions 100 of the W2V and Fasttext vectors. However, in general, the time to calculate the ROUGE score is not different between each model.
Pre-training Classification and Clustering Models … Table 6 Time (in hours) Model Total ft_tr glove_tr w2v_tr ft_sgd glove_sgd w2v_sgd
8.41 2.70 5.90 9.06 3.34 6.56
75
Pre-train
Summary
Rouge
7.31 1.45 4.80 7.41 1.49 4.90
0.40 0.54 0.40 0.96 1.14 0.98
0.70 0.71 0.70 0.70 0.71 0.69
8.41
ft tr glove tr
2.70
w2v tr
5.90
ft sgd
9.06
glove sgd
3.34
w2v sgd
6.56 0
1.5
3
4.5 6 total of hour
7.5
9
Fig. 5 Total time (in hours)
5 Conclusion and Future Work We presented the proposition of a new extractive text summarization model, which adds a classifier before the summary component. The classifier predicts the label of the input text, and then the summary model uses the corresponding clustering model to make the summary. We used TF-IDF word representations for the classification model and Fasttext, Glove, and word-to-vector embedding for clustering models. Experiments showed that this model has the advantage of producing a high ROUGE score and can perform fast on a cost-efficient resource such as an ARM CPU. Further work could consider implementing a local classification model to improve the classifier’s accuracy, which could give a higher ROUGE score of the summary.
76
T.-H. Nguyen and T.-N. Do
References 1. Allahyari M, Pouriyeh S, Assefi M, Safaei S, Trippe ED, Gutierrez JB, Kochut K (2017) Text summarization techniques: a brief survey 2. Bottou L, Bousquet O (2007) The tradeoffs of large scale learning. In: Platt J, Koller D, Singer Y, Roweis S (eds) Advances in neural information processing systems. Curran Associates, Inc. 3. Do TN (2022) Imagenet challenging classification with the raspberry pi: an incremental local stochastic gradient descent algorithm 4. Harris ZS (1954) Distributional structure. Word 10(2–3):146–162 5. Jones KS (1972) A statistical interpretation of term specificity and its application in retrieval. J Doc 6. Joulin A, Grave E, Bojanowski P, Mikolov T (2016) Bag of tricks for efficient text classification 7. Joulin A, Grave E, Bojanowski P, Mikolov T (2017) Bag of tricks for efficient text classification. In: Proceedings of the 15th conference of the European chapter of the Association for Computational Linguistics: volume 2, short papers, Valencia, Spain, Association for Computational Linguistics, 427–431 8. Lin CY (2004) Rouge: a package for automatic evaluation of summaries. In: Text summarization branches out, 74–81 9. Lloyd S (1982) Least squares quantization in PCM. IEEE Trans Inf Theor 28(2):129–137 10. Luhn HP (1957) A statistical approach to mechanized encoding and searching of literary information. IBM J Res Dev 1(4):309–317 11. Luhn HP (1958) The automatic creation of literature abstracts. IBM J Res Dev 2(2):159–165 12. MacQueen J (1967) Classification and analysis of multivariate observations. In: 5th Berkeley symposium mathematical statistic probability, 281–297 13. MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Cam LML, Neyman J (eds) Proc. of the fifth Berkeley symposium on mathematical statistics and probability, vol 1. University of California Press, 281–297 14. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781 15. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. Adv Neural Inf Process Syst 26 16. Nguyen TH, Do TN (2022) Extractive text summarization on large-scale dataset using k-means clustering. In: Fujita H, Fournier-Viger P, Ali M, Wang Y (eds) Advances and trends in artificial intelligence. Theory and practices in artificial intelligence. Springer International Publishing, Cham, pp 737–746 17. Nguyen TH, Do TN (2023) Extractive text summarization on large-scale dataset using k-means clustering and word embedding. In: Smys S, Lafata P, Palanisamy R, Kamel KA (eds) Computer networks and inventive communication technologies. Singapore, Springer Nature, pp 489–501 18. Pennington J, Socher R, Manning C (2014) GloVe: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, Oct 2014, 1532–1543 19. Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 1532–1543 20. Radev DR, Jing, H, Budzikowska M (2000) Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies. In: Proceedings of the 2000 NAACL-ANLP workshop on automatic summarization, vol 4. NAACL-ANLPAutoSum’00, USA, Association for Computational Linguistics, 21–30 21. Radev DR, Jing H, Styundefined M, Tam D (2004) Centroid-based summarization of multiple documents. Inf Process Manage 40(6):919–938 22. Rossiello G, Basile P, Semeraro G (2017) Centroid-based text summarization through compositionality of word embeddings. In: Proceedings of the MultiLing 2017 workshop on summarization and summary evaluation across source types and genres. Association for Computational Linguistics, Valencia, Spain, Apr 2017, 12–21
Pre-training Classification and Clustering Models …
77
23. Sculley D (2010) Web-scale k-means clustering. In: Proceedings of the 19th international conference on World Wide Web. WWW’10. Association for Computing Machinery, New York, NY, USA, 1177–1178 24. Shalev-Shwartz S, Singer Y, Srebro N (2007) Pegasos: primal estimated sub-gradient solver for SVM. In: Proceedings of the 24th international conference on Machine learning, 807–814
Identifying Critical Transition in Bitcoin Market Using Topological Data Analysis and Clustering Anusha Bansal, Aakanksha Singh, Sakshi Vats, and Khyati Ahlawat
Abstract Advent of Bitcoin marked the beginning of decentralized digital currency. It has witnessed dramatic growth in prices and undergone huge volatility swings from its inception. The absence of institutional restrictions and regulations makes it susceptible to sudden crashes. For the time series of Bitcoin from 2016 to 2018, predictive analysis is performed using topological data analysis tools—persistent homology, which is used to generate topological features from the Bitcoin time series. To characterize the bubbles approaching critical transitions, clustering algorithms—k-means, fuzzy c-means, k-medoids, and hierarchical methods are applied to the statistical value of the features. The results reveal strong warning signs before an impending crash. Analysis of the cluster visualization also shows that fuzzy cmeans and k-medoids provide better clusters among all. Internal validation indices, namely: average silhouette width, Dunn index, Davies–Bouldin index, Calinski– Harabasz pseudo-F-statistics, similarity index, Rand index, corroborate the above determinations. Keywords Bitcoin · Bubble · Topological data analysis · Critical transitions · L1-norms · Clustering · Dunn index
1 Introduction Cryptocurrency, a form of digital currency, is based upon blockchain technology that keeps an online ledger of transactions and is highly secured Rehman et al. [1]. Nowadays, a large number of cryptocurrencies have emerged in the market. Out of the major cryptocurrencies Forbes Advisor [2], Bitcoin is focused upon in this paper. Cryptocurrencies are neither recognized in the mature asset class, nor Anusha Bansal and Aakanksha Singh have contributed equal to this paper as co-first authors. A. Bansal · A. Singh (B) · S. Vats · K. Ahlawat Department of Computer Science and Engineering, Indira Gandhi Delhi Technical University for Women, Kashmere Gate, Delhi, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 689, https://doi.org/10.1007/978-981-99-2322-9_7
79
80
A. Bansal et al.
subjected to institutional restrictions, and are largely non-regulated. This kind of regulatory void leaves the cryptocurrency market prone to volatility Gerlach et al. [3]. As cryptocurrency market share increases, it becomes more important to develop effective strategies to extract information and identify incoming critical transitions. There are various machine learning strategies to handle the modeling and prediction of Bitcoin behavior. However, no consensus has been reached on the most effective information or features to be used as the input for those machine learning methods. A relatively newer field, Topological Data Analysis, has been proven effective in analyzing bubble formation in financial markets Gidea et al. [4]. The idea behind TDA is that the shape formed due to adjacency in a dataset contains some topological features that supply qualitative and sometimes quantitative data Conti et al. [5]. An increase in turbulence can be associated with a rising number of holes in the point cloud data. As the data in the point cloud becomes scattered, the accumulation of data points can increase near some specific values, and it may also decrease near other values, creating gaps. These kinds of changes are easily visible using the L1-norms of persistence landscapes. While a critical transition is approaching, higher values of L1-norms are achieved as the amplitude increases leading to the oscillation of the L1-norms Gidea et al. [6]. Therefore, the growth of L1-norms can serve as the leading indicator of critical transition. Patterns of critical transitions are identified by comparing the derived time series of L1-norms of persistence landscapes with the time series of the daily log prices of Bitcoin. Gidea [7] uses the k-means algorithm, but many clustering issues have been identified. Firstly, if the two-time series are highly similar, the clusters will be overlapped; consequently, there will be an intersection. This case is poorly handled by the k-means algorithm. Secondly, validating the clusters and analysis of outliers. If the datasets are contained in only one cluster or the data are low dimensional and well-separated, a clustering like k-means gives good results. However, the same result may not be produced in each run and the outliers cannot be handled. Taking these shortcomings and the volatile behavior of Bitcoin as the motivation, analysis is done to predict incoming Bitcoin crashes using four different methodologies described in Sect. 3. These approaches are compared based on their results and accuracy.
2 Literature Survey The cryptocurrency market’s behavior has been categorized as inherently complex, noisy, evolutionary, nonlinear, and deterministically chaotic in the comprehensive survey of existing work Fang et al. [8]. Literature Gunay and Ka¸skalo˘glu [9] indicate the existence of long-range dependence in time series, confirming that the cryptocurrency market follows a latent pattern in the context of chaos theory. Also, in the study Partida et al. [10], the time series of cryptocurrencies like Bitcoin and Ethereum shows characteristics of chaos, self-similarity, and correlated randomness. Henrique et al. [11] illustrated that machine learning strategies have gigantic potential to handle the
Identifying Critical Transition in Bitcoin Market Using Topological …
81
modeling and prediction in nonlinear and multivariate monetary time series. Some have used features extracted from various financial and economic data sources based on text, sentiment, or social media analysis conjugated with epidemic modeling to forecast the time-series state Phillips and Gorse [12]. The field of persistent homology has proposed the idea that invariant topological features of high dimensional and complex data can supply relevant and useful patterns on the behaviors of the underlying metric space represented by the data Gidea et al. [4]. It is used in the literature Azamir et al. [13] to spot early warning signs of abnormal changes in dynamic networks of data. Topological data analysis (TDA) is a field that started as a discipline of persistent homology with the pioneering works of Edelsbrunner et al. [14] and Zomorodian and Carlsson [15]. They introduced an efficient algorithm based on persistent homology and its visualization using persistent diagrams and persistent barcodes, respectively. These methodologies landed as an important discipline when illustrated in a seminal paper Carlsson [16]. TDA has been combined with different techniques to study financial markets in recent years. Gidea and Katz [3] were able to provide early warnings for the internet bubble, commonly known as the “dot-com crash”, and detect warnings for the “subprime mortgage crisis” of 2007–2008. Similarly, a correlation network from DJIA stocks data is used with TDA Gidea [7] to study the latter financial crisis. Literature suggests that relationships between a couple of time series can be framed by using clustering and as a measure to find dependence between two-time series Gerlach et al. [3]. In Davies et al. [17], the ubiquitous Fuzzy c-means (FCM) clustering algorithm to the space of persistence diagrams is extended, enabling unsupervised learning that automatically captures the topological structure of data without the prior topological knowledge or additional processing of persistence diagrams that many other techniques require. Some of the relevant literature has been summarized in Table 1.
3 Experimental Overview 3.1 Dataset The Bitcoin closing price data are obtained for the 2016–2018 time period, from Yahoo Finance as Bitcoin experienced a boom in 2016 lasting till the major crash of early 2018. In January–February 2018, Bitcoin prices fell by 65% followed by other cryptocurrencies due to a systematic crash. Thus, this work considers time-series data just before the peak, and the window of 14-09-2017–07-01-2018 is recognized as the most significant peak.
82
A. Bansal et al.
Table 1 Summary of different works of literature Title
Data source
Methodology
Findings
Publication year
Topological data analysis of financial time series: landscapes of crashes, Gidea and Katz [3]
Time series of four major US stock market indices, Yahoo Finance
TDA-based method, time-delay coordinate embedding
Analysis of the time series of the L1 and L2-norms proves strong development during a bubble peak
2018
Dissection of Bitstamp, Log-periodic bitcoin’s multiscale Thomson Reuters power law bubble history Datastream singularity January (LPPLS) model 2012–February 2018, Gerlach et al. [3]
Quantitative bubble 2019 analysis concludes Bitcoin prices are volatile and drop rather sharply, but still have the potential to come back strongly
Effect of cluster size distribution on clustering: a comparative study of k-means and fuzzy c-means clustering, Zhou and Yang [18]
Real-world data sets, UCI machine learning repository
K-means clustering algorithm, Fuzzy c-means clustering algorithm
Demonstrates that FCM has a more grounded uniform impact than k-means
Topological recognition of critical transitions in time series of cryptocurrencies, Gidea et al. [4]
Four cryptocurrencies prices (Bitcoin, Ethereum, Litecoin, and Ripple), Yahoo Finance
TDA-based algorithm along with k-means clustering
TDA and k-means 2020 clustering are proven to potentially identify critical transitions in a non-stationary and erratic time series such as that of the cryptocurrency markets
A time-series clustering methodology for knowledge extraction in energy consumption data, Ruiz et al. [19]
Energy consumption information, buildings of the University of Granada, Spain
K-means. K-medoids, PAM, hierarchical method, Gaussian mixture
K-medoids using the PAM algorithm with Euclidean distance have given the best results. (silhouette index, Dunn index)
COVID-19, cryptocurrencies bubbles and digital market efficiency: sensitivity and similarity analysis, Montasse et al. [20]
Major eighteen cryptocurrencies from CoinMarketCap
Dynamic time warping distance-based clustering
The results indicate 2021 that the 2020 pandemic has the highest impact on cryptocurrency market efficiency
2020
2020
Identifying Critical Transition in Bitcoin Market Using Topological …
83
3.2 Algorithms Used The approach used in this paper is comprised the following phases: • Log returns are calculated on the collected input. • Time-delay coordinate embedding is carried out on data with output as a dataset of the point cloud. • Next, persistence landscapes are constructed, and calculation of the L1-norms is done by using simplicial complexes. • Finally, various machine learning clustering algorithms are applied to find the dependency between L1-norms and log prices of Bitcoin. The basic flow of the methodology used is represented below in the flow chart (Fig. 1). The following sections discuss each phase in detail. Time-delay coordinate embedding. The time-delay coordinate embedding method is the first step for performing topological data analysis. This method is mainly used for dynamic systems which are nonlinear Gidea and Katz [3]. In time-delay coordinate embedding, using the given time series of the system taken into consideration, the phase space is rebuilt. Extending the applications of Takens’ theorem to an invariant set, the dimension (d) of the delay coordinate vector used is: d > 2m + 1,
(1)
where m is the system dimension Gidea et al. [4]. To find the cryptocurrency price crashes using the available time-series data, sliding windows with persistent homology are used. The time-delay embedding method is applied on every window and converts the time series into four-dimensional time-delay coordinate vectors. Topological Data Analysis (TDA). The next step is using the persistent homology method. It is used to compute topological features of data from its point cloud. For building simplicial complexes, persistent homology uses n-tuples of data points present in the set of point clouds. For this paper, simplices with dimension only 1 are of interest as they represent vertices’ connection when interpoint distance between vertices is less than the given scaling parameter. Using the simplicial complex, certain topological features of the point cloud data are outlined. These new features
Fig. 1 Flowchart
84
A. Bansal et al.
are collected using persistent homology and then distinguished by two parameters, namely, birth value and death value. The birth and death values are used to characterize a particular feature in any one of the three categories: significant, insignificant, or noisy. Significant topological features are those for which parameter values are lasting, whereas insignificant or noisy refers to those for which parameter values are temporary. Finally, the scaling parameter values per the birth and death values of the n-dimensional hole are represented using coordinates of the n-dimensional persistence diagram. Using the persistence homology, the change in the shape of the point cloud over time is observed and visualized using persistence diagrams, which is used to detect critical transitions at an early stage. The constructed topological signals have very few chances of generating false positives as they are more durable against the noise Prabowo et al. [21]. The diagrams are quantized as norms of persistence landscapes, which tend to oscillate or swing with increasing amplitude due to loss of attractivity, signifying approaching crashes. Clustering. An analysis of the association between the given price time series and the L1-norms is obtained before an approaching critical transition to obtain the final result as a visualizing bubble crash. Since the popular correlation method is not suitable for nonlinear systems Zhang [22], clustering is adopted for the given time-series data as no statistical assumptions need to be fulfilled in advance. A cluster is considered a warning signal of an impending crash if a higher value of L1-norm is there for the points (x, y) in the cluster (y > 0.1) and the dates form an almost adjacent interval of time. x is the log-price of the asset and y refers to the L1-norm of the persistence landscape in relation to the log returns. With the help of the optimized number of clusters, hard clustering algorithms: k-means, kmedoids, hierarchical and soft clustering algorithm: fuzzy c-means are implemented. After applying the clustering algorithms, data points are grouped in different clusters according to similarity. To validate the clusters, following metrics are adopted: Validity Metrics. Dunn index. It is measured as the ratio of the smallest distance among the data points in different clusters to the largest intracluster distance. Its range is [0, infinity), and a higher value suggests better clustering. D=
min1≥i≤ j≤n d(i, j) . max1≤k≤n d (k)
(2)
DB index. Davies–Bouldin (DB) index is measured as the ratio of within cluster scatter to the separation between two clusters. Ideally, the scatter within a cluster should be minimum, and the separation between the two considered clusters should be maximum. Better clustering is symbolized by a lower value of the DB index. DB =
nc 1 Ri , where Ri = n c i=1
max
j=1.....n c,i= j
(Ri ), i = 1 . . . n c .
(3)
Identifying Critical Transition in Bitcoin Market Using Topological …
85
Average Silhouette Width. Silhouette coefficient is used to visualize the similarity between a data point and its associated cluster compared to all other available clusters. The silhouette coefficient, s(i), for the ith point is given by: S(i) =
b(i) − a(i) , max(b(i), a(i))
(4)
where a(i) = mean intracluster distance, b(i) = mean nearest cluster distance. Higher Silhouette value indicates that the data point has more similarity with its own cluster than with the data points in other clusters. Calinski–Harabasz pseudo-F-statistic. The pseudo-F-statistic measures the ratio between-cluster variance to within-cluster variance. Well-separated and tightly condensed clusters are depicted by the high value of this index. Rand index. The Rand index is used to calculate how similar the results of two different clustering techniques are. A higher value indicates similarity of the clustering methods. RI =
a+b Correct similar pairs + correct dissimilar pairs . = C2 Total possible pairs
(5)
3.3 Experimental Values Used Point Cloud Dataset. In the sliding window technique, the point cloud includes one point and excludes another while transitioning from one window to another in an instant of time. Thus, a sufficiently large window should be taken as a small window size may lead to negligible changes. Considering the vectors of the four dimensions and the total length of the time series as 2200 approx., the length of the window comes out to be 50. By initializing the sliding to one day, a total of 2150 time-ordered point cloud datasets are received. Intermediate Data. The log of Bitcoin price is visualized in Fig. 2a. Figure 2b shows the graph containing log return. The L1-norm of the persistence landscape is shown in Fig. 2c. It indicates that L1-norm grows toward a higher value at the end of the considered time period, which is at the peak. Finding Optimal Number of Clusters. Optimal number is very crucial in cluster analysis as it controls the granularity properly. Using the elbow method (Fig. 2d), silhouette method (Fig. 2e), and gap statistics method (Fig. 2f), the optimal number of clusters obtained are 4, 4, and 3, respectively.
86
A. Bansal et al.
Fig. 2 a Top left: log of Bitcoin price 2016–2018, b top right: log return of Bitcoin prices from 2016 to 2018, c middle left: L1-norm for Bitcoin from 2016 to 2018, d middle right: optimal number of clusters using elbow method, e bottom left: optimal number of clusters using silhouette method, f bottom right: optimal number of clusters using gap statistics method
4 Results The clusters obtained for Bitcoin during the period from 14-09-2017 to 07-01-2018 are shown in Fig. 3. Cluster 1 formed using k-means (Fig. 3a), cluster 4 using kmedoids (Fig. 3b), cluster 4 using hierarchical (Fig. 3c), and cluster 3 using fuzzy c-means (Fig. 3d), contain points with higher values of y, where y > 0.1, and there is an almost contiguous interval between 2017-12-21 and 2018-01-07, which is just before the Bitcoin crisis (Table 2). These clusters are classified as carrying a strong warning signal. Cluster 2 of FCM (Fig. 3b) contains values near 0.10, classified as a weak warning sign. This cluster has an interval longer than cluster 2 of k-means (Fig. 3a), even though they mark the same transition in the Bitcoin time series. Hence, FCM provides a weak warning signal earlier than k-means.
4.1 Cluster Validation Internal validation metrics are used to evaluate and compare how the clustering algorithms behaved. For an upcoming crash, the L1-norm values would be far apart for the peak and the date when Bitcoin price decreases significantly; hence, the dates will be in different clusters. In contrast, the rising slope (bubble formulation) would have similar values with a small gap and would be present in the same cluster. Table
Identifying Critical Transition in Bitcoin Market Using Topological …
87
Fig. 3 Bitcoin from 14-09-2017 to 07-01-2018 before the major crash a top left: k-means clusters, b top right: k-medoids clusters, c bottom left: hierarchical clusters, d bottom right: fuzzy c-means clusters Table 2 Bitcoin market crash—significant cluster Cluster
Log price
L1
Cluster
Log price
L1
2017-12-21
0.9763362
0.1445945
2017-12-30
0.9539295
0.1411884
2017-12-22
0.9613303
0.1329647
2017-12-31
0.9639431
0.1411164
2017-12-23
0.9681812
0.1474732
2018-01-01
0.9598994
0.1422988
2017-12-24
0.9620932
0.1555071
2018-01-02
0.9703285
0.1382438
2017-12-25
0.9629056
0.1587484
2018-01-03
0.9719623
0.1321623
2017-12-26
0.9784328
0.1572839
2018-01-04
0.9748749
0.1531596
2017-12-27
0.9765897
0.1645552
2018-01-05
0.9873714
0.1493963
2017-12-28
0.9674687
0.1949869
2018-01-06
0.9879998
0.1493963
2017-12-29
0.9678513
0.1542944
2018-01-07
0.9810454
0.1493963
88
A. Bansal et al.
3 shows the values of validity indices for each clustering method. A lower value of DB Index and a higher value of Dunn Index, Average Silhouette width and the pseudo-F-statistic depicts better clustering. For k-means and hierarchical clustering, these values are the same, concluding that the clusters from these methods are the same. The same holds for fuzzy c-means and k-medoids clustering. This conclusion is supported by the Rand index value 1 for these pairs in Table 4. It is observed that the output of the TDA-based approach of k-medoids and FCM performs better than k-means and hierarchical clustering. One of the possible reasons behind the better understanding of the cluster centers is that the k-medoids technique takes actual data points as centers instead of the mean of the values taken in kmeans. Outliers highly influence the mean of data. In comparison, medoids are more robust to outliers and noises, as they use actual points to represent a cluster Cilluffo et al. [23]. The financial time series is very rocky which contributes to the number of outliers that affect the result of the k-means algorithm. The time-series dataset Table 3 Value of validity indices obtained for the clustering algorithms Clustering algorithm
Dunn index
Davies–Bouldin’s index
Average silhouette width
Calinski–Harabasz pseudo-F-statistic
K-means clustering
0.1127695
0.6200807
0.5880788
218.7269
K-medoids clustering
0.3026788
0.4882835
0.6783715
267.5489
Hierarchical clustering
0.1127695
0.6200807
0.5880788
218.7269
Fuzzy c-means clustering
0.3026788
0.4882835
0.6783715
267.5489
Table 4 Value of rand index obtained when different clustering algorithms are compared
Clustering Algorithms
K means clustering
K means clustering
K medoids clustering
Hierarchical clustering
Fuzzy C means clustering
0.8869048
1
0.8869048
0.8869048
1
K medoids clustering
0.8869048
Hierarchical clustering
1
0.8869048
Fuzzy C means clustering
0.8869048
1
0.8869048
0.8869048
Identifying Critical Transition in Bitcoin Market Using Topological …
89
generally has enormous volume, and hierarchical clustering is unsuitable due to its high quadratic computational complexity. Fuzzy c-means works better than other clustering algorithms on time series because it allows a data point to be in multiple clusters Reddy et al. [24]. The FCM algorithm is an important vehicle to cope with overlapping clustering.
5 Conclusion and Future Work In this research work, changes in the topological properties of the Bitcoin time series are explored to detect bubbles and incoming crashes. The TDA method is combined with k-means, k-medoids, hierarchical, fuzzy c-means clustering techniques to discover patterns that are topologically different in the original and generated time series. The prominent findings are that the generated series grows before the bubble bursts, derived as warning signals consistent with the observed major crash of early 2018. In addition, both observation and internal validation indices used on the clustering algorithms indicate that fuzzy c-means and k-medoids generate better clusters. It is of interest that these two techniques produce identical results as verified by the Rand index values. The analysis of these four methods adds to the standard set of available machine learning methods applied in the time-series analysis. This methodology is free from traditional opinion and contributes to the little known field of TDA-based approaches in financial markets. The information extracted from the clustering can be used to detect impending crashes and provide predictive patterns. This research has several limitations that could lead to future research opportunities. First, this research only focused on the Bitcoin time series from 2016 to 2018, and a more extensive time period can be considered. Second, the performance analysis of clustering is heavily dependent on the nature of the data; hence, the conclusion is specific to this study only. Third, an alternative approach can be explored as the background of some validity indices used might have introduced bias. In addition, as suggested in Wu et al. [25], with pre-defined accurate distance metrics between the time series of prices, sampling the clusters with suitable optimization techniques and heuristics, an optimal tracking portfolio can be constructed.
References 1. Rehman MHU, Salah K, Damiani E, Svetinovic D (2020) Trust in blockchain cryptocurrency ecosystem. IEEE Trans Eng Manage 67(4):1196–1212 2. Forbes Advisor (2022) Top 10 cryptocurrencies in April 2022. https://www.forbes.com/adv isor/investing/top-10-cryptocurrencies 3. Gerlach JC, Demos G, Sornette D (2018) Dissection of bitcoin’s multiscale bubble history January 2012 to February 2018. SSRN Electron J 4. Gidea M, Goldsmith D, Katz Y et al (2020) Topological recognition of critical transitions in time series of cryptocurrencies. Physica A Stat Mechan Appl 548:123843
90
A. Bansal et al.
5. Conti F, Moroni D, Pascali MA (2022) A topological machine learning pipeline for classification. Mathematics 10:3086 6. Gidea M (2017). Topological data analysis of critical transitions in financial networks. In: International conference and school on network science. Springer, Cham, pp 47–59 7. Fang F, Ventre C, Basios M, Kanthan L, Martinez-Rego D, Wu F, Li L (2022) Cryptocurrency trading: a comprehensive survey. Fin Innov 8(1):13 8. Gunay S, Ka¸skalo˘glu K (2019) Seeking a chaotic order in the cryptocurrency market. Math Comput Appl 24(2):36 9. Partida A, Gerassis S, Criado R, Romance M, Giráldez E, Taboada J (2022) The chaotic, selfsimilar and hierarchical patterns in bitcoin and ethereum price series. Chaos Solitons Fractals 165(2) 10. Henrique BM, Sobreiro VA, Kimura H (2019) Literature review: machine learning techniques applied to financial market prediction. Expert Syst Appl 124:226–251 11. Phillips RC, Gorse D (2017) Predicting cryptocurrency price bubbles using social media data and epidemic modelling. In: 2017 IEEE symposium series on computational intelligence (SSCI). Honolulu, HI, USA, pp 1–7 12. Azamir B, Bennis D, Michel B (2022) A simplified algorithm for identifying abnormal changes in dynamic networks. Physica A Stat Mechan Appl 607 13. Edelsbrunner H, Letscher D, Zomorodian A (2022) Topological persistence and simplification. Discret Comput Geom 28(4):511–533 14. Zomorodian A, Carlsson G (2004) Computing persistent homology. Ann Symp Comput Geometry 274:347–356 15. Carlsson G (2009) Topology and data. Bull Am Math Soc 46(2):255–308 16. Gidea M, Katz Y (2018) Topological data analysis of financial time series: landscapes of crashes. Physica A 491:820–834 17. Davies T, Aspinall J, Wilder B, Tran-Thanh L (2020) Fuzzy c-means clustering for persistence diagrams 18. Zhou K, Yang S (2020) Effect of cluster size distribution on clustering: a comparative study of k-means and fuzzy c-means clustering. Pattern Anal Appl 23(1):455–466 19. Montasser GE, Charfeddine L, Benhamed A (2021) COVID-19, cryptocurrencies bubbles and digital market efficiency: sensitivity and similarity analysis. Fin Res Lett Part A 46:102362 20. Ruiz LGB, Pegalajar MC, Arcucci R, Molina-Solana M (2020) A time-series clustering methodology for knowledge extraction in energy consumption data. Expert Syst Appl 160:113731 21. Prabowo N, Widyanto RA, Hanafi M, Pujiarto B, Avizenna M (2021) With topological data analysis, predicting stock market crashes. Int J Inform Inf Syst 4(1):63–70 22. Zhang GP (2003) Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing 50:159–175 23. Cilluffo G, Fasola S, Ferrante G, Malizia V, Montalbano L, La Grutta S (2021) Machine learning: an overview and applications in pharmacogenetics. Genes 12(10):1511 24. Reddy BR, Kumar YV, Prabhakar M (2019) Clustering large amounts of healthcare datasets using fuzzy c-means algorithm. In: 2019 5th international conference on advanced computing & communication systems (ICACCS), pp 93–97 25. Wu D, Wang X, Wu S (2022) Construction of stock portfolios based on k-means clustering of continuous trend features. Knowl-Based Syst 252
Healthcare Information Exchange Using Blockchain Technology Aman Ramani, Dhairya Chhabra, Varun Manik, Gautam Dayama, and Amol Dhumane
Abstract Today, both medical facilities and individuals produce a significant amount of healthcare data every day. The medical industry has significantly benefited from healthcare information exchange (HIE), as has been demonstrated. It is crucial but difficult to store and share a vast amount of healthcare data. Healthcare Information Exchange (HIE) offers outstanding benefits for patient care, such as improving the standard of health care and aiding in the organization of healthcare data. With the aid of blockchain technology, users can confirm, safeguard, and synchronize the information contained in a data sheet (also known as a transaction ledger) that is cloned by numerous users. For enabling better services, blockchain technology has offered significant benefits and incentives to industries. This analysis aims to examine the advantages, difficulties, and utilities that influence the applications of blockchain in the medical sector. Keywords Healthcare information exchange (HIE) · Blockchain · Data · Transaction · Ledger · Hash · Block
A. Ramani (B) · D. Chhabra · V. Manik · G. Dayama · A. Dhumane Department Computer Science, Pimpri-Chinchwad College of Engineering, Pune, India e-mail: [email protected] D. Chhabra e-mail: [email protected] V. Manik e-mail: [email protected] G. Dayama e-mail: [email protected] A. Dhumane e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 689, https://doi.org/10.1007/978-981-99-2322-9_8
91
92
A. Ramani et al.
1 Introduction Why Blockchain? The main reason to use blockchain is to avoid single-party power and to avoid centralized power on personal data because every individual should have the right on their personal medical data, so that no one can misuse the data, to overcome this problem we are using blockchain, which is decentralized network and can be useful in this situation. Bender and Sartipi [1] suggested that even medical records or information related to patients should be taken good care and secure by storing it using blockchain. Blockchain serves several advantages regarding use of blockchain in medical information exchange. Hence, we analyze and conclude flaws in the present-day healthcare information exchange (HIE) and solve it by using blockchain. Wilcox et al. [2] said that this will help to recover all past medical records including all types of scans, lab tests, treatment details, and many more, which will help the individual to keep track of medical details, which can be very useful in some medical emergencies. The main motive to come up with this problem is to have secure data and respect someone’s privacy. Jiang et al. [3] analyzed how a blockchain works and how secure it is. We can use it to change present-day healthcare information exchange used in hospitals. This would help patients as well as organization in several different ways. Blockchain will surely change the way of storage and accessing data and make it more secure. Patients could use their data not only in one hospital but in any hospital of the world through this technology. Vest and Gamm [4] added that even personal physicians would also ask for permission through a decentralized app and will get patient data only if the patient authorizes. All these make healthcare information using blockchain technology a new way to transmit data in future. Kabir et al. [5] also suggested an IoT approach for healthcare data transfer and Zhou et al. [6] also gave various encryption methods for data security. Xia et al. [7] gave a cloud—blockchain hybrid approach for HIE. Murugan et al. [8] commented that blockchain being the new technology of the twenty-first century is nowadays being used and heard more because of its great benefits and applications that exist and can be made in the future. The main aim of blockchain is a computer generated distributed and non-centralized shared ledger that exists over a network. The present healthcare industry is storing its patient’s medical data on a centralized network which is not secure. Ahmadi et al. [9] observed tampering and vulnerability in such a kind of storage which is possible. Therefore, in order to store data, we can use blockchain where its functionalities could help us to secure data and protect patient’s privacy along with many other advantages. Blockchain being a new technology needs more cover and introduction in the paper.
Healthcare Information Exchange Using Blockchain Technology
93
2 Literature Review 2.1 Hash Function Hash is a mathematical concept which converts the input into encrypted unique output. Hash provides the output of fix length. Mainly, it is used for encryption, that is used in blockchain for computation. If we provide the same input, it will provide the same output. If there is a change in a single character or size of input, then the hash function will provide unique output for the input value. In blockchain, hash is developed on the basis of information present in block header. It is used to protect data integrity. In blockchain hash function is used for various purposes, like digital signatures, Proof of Work (PoW), Merkel’s tree, the blockchain’s chain. Hash function is also used for security in a blockchain, the most used hash function till date is SHA256 (Secure Hash Algorithm), which is used to provide the hash value (alpha-numeric character of 256 bits) for the input data provided. Properties of hash function: • They avoid collisions. No two input hashes should correspond to the same output hash, according to this rule. • They are concealable. It should be challenging to determine a hash function’s input value from its output. • They ought to be suitable for puzzles.
2.2 Merkle Tree Blockchain consists of data that is secured using a hash value and each hash of data is hashed to the parent node along with the hash value of the sibling node; this forms a bottoms-up approach for a block which is called a Merkle tree. Here in Fig. 1, Hash-A, Hash-B, Hash-C, and Hash-D are the data that need to be updated on a block, and we find the Hash value of each data. These are generally the leaf nodes of a tree, and further following the bottom-up approach, their hash values are hashed to get Hash-AB and Hash-CD which further gives Hash-ABCD. Hash-ABCD is the header of the next block which is treated as the address to the previous block. Some applications of Merkle tree are: • In distributed systems where the same data has to be present in various locations, Merkle trees are helpful. • Merkle trees are useful for detecting contradictions. • Merkle trees are used by Apache Cassandra to find discrepancies across database replicas as a whole. • Blockchain and bitcoin both utilize it.
94
A. Ramani et al.
Fig. 1 Merkel tree
2.3 Block of Blockchain and It’s Working Blocks are data structures found in the blockchain database and are used to store transaction data permanently in a cryptocurrency blockchain. A block stores all or a portion of the most recent transactions that the network has not yet verified. The block is closed when the data are verified. Then, a fresh block is made so that fresh transactions may be added to it and verified. As a result, a block is an everlasting collection of records that, once recorded, cannot be changed or deleted. A block of a blockchain a part which contains secured data in form of its hash value and also a header which contains the former block’s hash value on chain and current block’s time. In this manner, a blockchain is formed. Genesis block is the very first block in a blockchain. Figure 2 gives the correct way how a blockchain works and data or transaction is stored on it. Before adding the data in a blockchain, the data are stored in a pool and the data from the pool are brought to the Merkle tree to form a block in the blockchain. After the block is created, the blockchain is committed by the present block and a new block will be made. In this way, a blockchain network is secure and difficult to tamper with.
2.4 Application of Blockchain in Health Care Blockchain is a cutting-edge technology that has applications in the healthcare industry. The promotion and application of blockchain technology in the industry of health care face a number of hurdles despite the many advantages it offers, including safety, faith, and lucid transaction, which speeds up the process of transaction and transmission of data. Nair and Bhagat [10] suggested that the variety and isolation of healthcare records and data continue to cause the medical sector to consider that it
Healthcare Information Exchange Using Blockchain Technology
95
Fig. 2 Transaction works on a chain
must be independent of blockchain technology. The healthcare industry’s key challenges are privacy and security of existing data, and it is necessary to monitor and safeguard protected health information from hacking attempts. In order for various software systems to connect, exchange, and apply information, the interaction of medical-related data is another difficulty for medical professionals. Nair and Bhagat [10] found that accessibility is another issue with regard to the operational viability of blockchain. The diversity issue with blockchain, which affects accessibility further, is a problem that affects all disruptive technologies. Dixon and Cusack [11] measured the importance of blockchain in healthcare information transfer. According to the Vest and Gamm [4], the growing volume of contracts and information permanently saved on each block are what’s to blame for the scalability problem. Interactivity and data veracity are two difficulties that are now faced by blockchain. Due to incompatibility among systems that are supposed to be improved by blockchain technology in the medical industry, the issues are brought on by difference and may be linked to the finite data standards. Kotarkar et al. [12] used this approach to help the healthcare insurance industry. Heekin et al. [13] used a decision support method on this system to help in diagnosis. We discovered that blockchain technology’s influence on health care is still developing, despite the enormous potential and intense interest in it. Edwards [14] realized that the healthcare industry is expanding quickly, and we anticipate that blockchain will have a significant positive impact there shortly. To address the various obstacles
96
A. Ramani et al.
and issues that blockchain technology is experiencing, such as interactivity, incorporation with the present systems, skeptical cost, technological selection condition, and scale, more research and studies need to be undertaken.
2.5 Reasons for Using Blockchain Regrettably, the healthcare sector has not kept up with user expectations. Conventional systems are frequently attackable, and slow, and play very little of a role in the patient’s care. Edwards et al. [15] found that due to differing standards and data formats, it is exceedingly difficult to share health data held in conventional systems, which means that the current healthcare ecosystem is inadequate for the immediate needs of modern users. The main goal of an HIE system is to communicate health information across institutional and geographic borders in order to provide a reliable and secure delivery method. Regarding this sharing system, a few aspects need to be taken into account. • Maintaining user data privacy is crucial, and failing to do so will have consequences for the legal and financial sectors. • Traditional data-sharing systems require a single trusted central authority and a centralized source of data, both of which raise the risk of data security. Failure of the central storage will put the storage of different patients’ medical records at danger in this centralized system. Vest and Gamm [4] realized that HIE is under pressure to grow the infrastructure and accommodate a range of data sources because there is an increase in the amount of data for each patient coming from multiple sources, including wearables, doctors, and lab reports. Reasons for choosing blockchain can be seen in Fig. 3.
3 Technological Details To build a blockchain-based technology, there are different tools we need to study. This tool simplifies the process of implementing blockchain technology.
3.1 Here Are Some Tools Which Are Used to Implement Technology Solidity: It is an object-oriented high-level language used to implement smart contracts. Blockchain engineers can develop an application that can provide
Healthcare Information Exchange Using Blockchain Technology
97
Fig. 3 Benefits of using blockchain
self-executing business logic merged in smart contracts, thereby leaving a nonremediable, and authoritative ledger of transactions. Fungible tokens and nonfungible token’s smart contracts are made using solidity. Different use cases are allowed to be made with solidity for people who use blockchain. These contracts or functionality can be used to create different use cases between the patient and health organizations. Solc: Solc is a solidity compiler which is drafted in C++ language. Its main purpose is to transform solidity contracts into an easy understanding structure for Ethereum Virtual Machine (EVM). JavaScript syntax is used in solidity which makes it a loosely typed language. Nair and Bhagat [10] found that in order to make it easy to understand and decode by EVM, the smart contract formats need to be changed. This is where Solc is introduced. There are two styles of Solc–Solc (drafted in C++) and Solc-js. Most of the Ethereum nodes have Solc, also It is useful for offline compiling. Remix: Remix is an IDE which is also a free cause desktop and web app. It encourages a faster enhancement cycle and has plenty of extensions with inherent Graphical User Interfaces. Remix serves as a learning and teaching tool for Ethereum as well as a platform for the whole Solidity contract writing process. Remix is drafted in JavaScript; therefore, it is compatible with any browser. Solidity-based smart contracts can be examined, developed, tested, and deployed using Remix. Other than having excellent documentation. Truffle: To create and develop an environment, truffle framework is used nowadays. It can be used to develop blockchain-based applications. It comes with a vast variety of libraries that come with custom deployment for drafting new smart contracts, making complex decentralized applications, and help intercept other problem giving conditions for developing a blockchain. Automated contract testing can be done using Mocha and chai in truffle. Smart contract development can also be enabled, including
98
A. Ramani et al.
compilation, linking, and deployment. Also, it gives an arranged build pipeline for carrying out a custom-made procedure. Blockchain-as-a-Service(BaaS): BaaS was created because it is not realistic (or economically doable) for an organization to develop a complete end-to-end blockchain solution. The functionality of BaaS is modeled to be similar to that of SaaS. Cloud-based platforms let you create, host, and use your own blockchain applications, smart contracts, and other blockchain-based features, with the cloud service provider handling and managing all necessary tasks and operations to uphold the flexibility and functionality of the blockchain infrastructure. For individuals or businesses that want to select blockchain technology but have been restrained by operational costs and software difficulties, BaaS can be a useful tool. For example, Microsoft (Azure), Amazon (AWS Amplify) provide BaaS services.
4 Current Trends in HIE Currently, we come across trends that mostly healthcare information exchange platforms are cloud based, some of them are Azal Health it is a paid platform based on cloud which provides a user a better understanding and some complex interface, in this all to gain clear a perspective on your practice with the ability to analysis the data and create reports and charts for analysis purpose, they have many services like Hospital and clinical EHR (electronic health records). Edwards [14] said that Consulting, Revenue cycle management, clinical and financial analytics can be applications of HIE. So, for analysis Payne et al. [16] used the user’s data which can be accessed by a healthcare organization which is direct attack and violation of privacy of the user’s data. Another healthIt.gov is a government website that provides the information about the details for EHR. They provide different approaches and technologies to perform the HIE in a smooth manner; nowadays, they are trying to find and creating AI algorithm in which they are accessing the user’s data and trying to predict the future illness, so that the customer should be aware about the diseases that will going to happen in future, and it is somewhat good practice, but the question arises what about privacy and data integration. They provided new technology about blockchain-based approaches to HIE networks. Various users of HIE can be seen in Fig. 4. Kimura et al. [17] started a ministry project for promotion of standardized healthcare information exchange. India is working on a Ayushman Bharat Digital Mission (ABDM) which comes under Digital India, their vision is to create a national digital health ecosystem that supports universal health coverage, and the mission is try to create digital platform for evolving health ecosystem through wide range of data, information, and infrastructure services while ensuring security, confidentiality, and privacy. Basically, the
Healthcare Information Exchange Using Blockchain Technology
99
Fig. 4 Users of HIE
central and state government and national health authorities have all controls of the data.
5 Working of HIE The above diagram shows how blockchain works in healthcare information exchange. Below is the detailed workflow of HIE using blockchain. • Using the public hash key to replace a patient’s identity. A hash value is a distinctive numerical value. To replace the patient’s identity, this hash is used. Attackers cannot decode the identity or the protected health information when a hash replaces the user’s identity (PHI). The patient’s name, address, financial information, and social security number are removed as a result of de-identification. • Make sure the data are HIPAA compliant. Making the data compliant is always necessary before storing it on the blockchain. The patient data are safeguarded by Health Insurance Portability and Accountability Act (HIPPA) compliance. Only identifiable data are accessible to and disclosed by the involved members, in accordance with HIPAA privacy regulations. • Keep data on the blockchain, and use a unique identifier to execute transactions. When healthcare organizations or other service providers help patients, all of the client’s data are saved on the blockchain with a public key. The patient’s information may include things like diagnosis, care plan, course of treatment, gender, age, and date of birth. By triggering smart contracts, the data provided
100
A. Ramani et al.
by health vendors, healthcare providers, or insurance companies are stored on the blockchain in the form of transactions. Any healthcare provider who requires access to a patient’s public data will only see it after the transaction ID matches. • Healthcare institutions send a blockchain query. Through APIs, healthcare institutions and organizations can create and submit queries. At this point, smart contracts enable access to non-identifiable data like age, gender, etc. • Patient shares the public/private key with healthcare units. Patients’ public keys are directly connected to the blockchain data. Patients’ information is made visible when they give the healthcare facilities their public or private key. Blockchain as a transaction layer can store data in one of two ways when it comes to information storage: • On-chain—In this case, the data are kept entirely on the blockchain. Examples include hash codes, transactional data, audit logs, and metadata pertaining to off-chain storage. • Off-chain—In this case, the data are saved using blockchain links. These links serve as pointers to the data that are kept apart, much like traditional databases. Blockchain, for instance, cannot store abstract data, such as MRI or X-ray images. As a result, this kind of data shows the necessity of links to distinct locations. Soni and Singh [18] observed that blockchain ensures security and readability to parties with permission access when medical data are directly stored. Ample data file storage simultaneously slows block processing speeds and reveals scaling issues. Shae and Tsai [19] gave a similar blockchain approach for the design of the platform.
6 Conclusion In this study, we demonstrate the findings of a thorough literature analysis on the state of blockchain technology today in the healthcare industry. The survey demonstrates that medical blockchain applications are still a new subject. Having pertinent, fast, and accurate patient data available during the time of caring would help to enhance healthcare quality and outcomes, which is a major motivation behind the introduction of HIE. Important structural, cultural, economic, and political differences exist among health systems around the world, and these differences are likely to have an impact on the future creation, application, and administration of HIE. As a result, while political will and the deployment of suitable incentives are essential for success, coordinating global HIE plans around a common framework, such as the UN Sustainability Goals, would be valuable. A healthcare information exchange solution based on blockchain gives better ideas, ensures accurate patient health assessments, and supports the shift to valuebased treatment. Like Huang et al. [20] realized that blockchain technology has the potential to significantly alter the quality, safety, and efficacy of drugs, vaccines,
Healthcare Information Exchange Using Blockchain Technology
101
diagnostics, and care methods. Increased information availability, security, and transparency make these improvements possible. Because blockchain technology is still in its infancy, there is an opportunity for future research fields to present usability and the perceptions of users regarding the adoption of blockchain technology.
References 1. Bender D, Sartipi K (2013) An agile and RESTful approach to healthcare information exchange. In Proceedings of the 26th IEEE international symposium on computer-based medical systems. IEEE, pp 326–331 2. Wilcox A, Kuperman G, Dorr DA, Hripcsak G, Narus SP, Thornton SN, Evans RS (2006) Architectural strategies and issues with health information exchange. AMIA Annu Symp Proc 3. Jiang S, Cao J, Wu H, Yang Y, Ma M, He J (2018) Blochie: a blockchain-based platform for healthcare information exchange. In: IEEE international conference on smart computing (smartcomp), pp 49–56 4. Vest JR, Gamm LD (2010) Health information exchange: persistent challenges and new strategies. J Am Med Inform Assoc 17(3):288–294 5. Islam SR, Kwak D, Kabir MH, Hossain M, Kwak K-S (2015) The internet of things for health care: a comprehensive survey. IEEE Access 3:678–708 6. Zhou J, Cao Z, Dong X, Lin X (2015) Tr-mabe: white-box traceable and revocable multiauthority attribute-based encryption and its applications to multi-level privacy-preserving ehealthcare cloud computing systems. INFOCOM, pp 2398–2406 7. Xia Q, Sifah EB, Asamoah KO, Gao J, Du X, Guizani M (2017) Medshare: trust-less medical data sharing among cloud service providers via blockchain. IEEE Access 5:14757–14767 8. Murugan A, Chechare T, Muruganantham B, Kumar SG (2020) Healthcare information exchange using blockchain technology. Int J Electr Comput Eng 10(1):421 9. Sadoughi F, Nasiri S, Ahmadi H (2018) The impact of health information exchange on healthcare quality and cost-effectiveness: a systematic literature review. Comput Methods Progr Biomed 161:209–232 10. Nair R, Bhagat A (2020) Healthcare information exchange through blockchain-based approaches. In: Transforming businesses with bitcoin mining and blockchain applications. IGI Global, pp 234–246 11. Dixon BE, Cusack CM (2016) Measuring the value of health information exchange in Health Inf Exchange:231–248 12. Kotarkar A, Padamadan S, Warekar Z, More J (2022) Leveraging the power of blockchain in the health insurance industry. In: 2nd international conference on intelligent technologies (CONIT), pp 1–7 13. Heekin AM et al (2018) Choosing wisely clinical decision support adherence and associated inpatient outcomes. Am J Managed Care 24(8):361 14. Edwards J (2006) Case study: Denmark’s achievements with healthcare information exchange. Gartner Industr Res 15. Edwards A, Hollin I, Barry J, Kachnowski S (2010) Barriers to cross-institutional health information exchange: a literature review. J Healthcare Inf Manage JHIM 24(3):22–34 16. Payne TH, Lovis C, Gutteridge C, Pagliari C, Natarajan S, Yong C, Zhao L-P (2019) Status of health information exchange: a comparison of six countries. J Glob Health 9(2):0204279. https://doi.org/10.7189/jogh.09.020427 17. Kimura M, Nakayasu K, Ohshima Y, Fujita N, Nakashima N, Jozaki H, Numano T, Shimizu T, Shimomura M, Sasaki F, Fujiki T (2011) SS-MIX: a ministry project to promote standardized healthcare information exchange. Methods Inf Med 50(02):131–139 18. Soni M, Singh DK (2021) Blockchain-based security & privacy for biomedical and healthcare information exchange systems. Mater Today Proc
102
A. Ramani et al.
19. Shae Z, Tsai JJ (2017) On the design of a blockchain platform for clinical trial and precision medicine. ICDCS, pp 1972–1980 20. Huang CD, Behara RS, Goo J (2014) Optimal information security investment in a healthcare information exchange: an economic analysis. Decis Support Syst 61:1–11
Multilevel Credit Card Fraud Detection Using Face Recognition and Machine Learning Tushar Deshpande, Shourjadeep Datta, Rushabh Shah, Vatsal Doshi, and Deepika Dongre
Abstract Technology in today’s world has wholly modified payment methods from cash to digital or card payments. Therefore, in these situations, the authentication system is essential for guaranteeing the legitimacy of a transaction. This is because such payment systems are susceptible to fraud or false transactions. As a result, numerous techniques to authenticate the user’s credentials have been developed and practiced. We use machine learning to predict the fate of the transaction by training our model on past or existing transactional data. Despite the fact that these solutions boost security, they could not be more reliable. These systems may predict that the transaction is legitimate when it’s not. This paper intends to illustrate the idea that helps to increase the credibility of such a machine learning system with the help of integration with a face recognition system. The final decision of whether the transaction is fraudulent or not will be taken based on outputs obtained by the machine learning module and face recognition module. The face recognition module uses the LBPH algorithm. To decide the threshold in the face recognition module and test it, we experimented with 30 users for fraudulent and non-fraudulent face data. For the machine learning module, to select the algorithm to use in our system, we performed a comparative analysis between different algorithms and chose the one that had the best performance on the input dataset. In this way, a multilevel authentication system is proposed to help increase a transaction’s truthfulness. Keywords Face recognition · LBPH · Credit card fraud detection · Machine learning · Multilevel fraud detection · Confusion matrix
T. Deshpande (B) · S. Datta · R. Shah · V. Doshi · D. Dongre Department of Computer Engineering, D. J. Sanghvi College of Engineering, Mumbai, India e-mail: [email protected] D. Dongre e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 689, https://doi.org/10.1007/978-981-99-2322-9_9
103
104
T. Deshpande et al.
1 Introduction Credit cards have been used a lot for online shopping as a result of the growth and acceleration of e-commerce, which has resulted in a high number of credit card frauds. Identification of credit card fraud is necessary for the digital age. Credit card transaction fraud is an unauthorized user using the account without the owner’s knowledge. Credit card fraud is one of the hot topics in recent times wherein either the credentials of the credit card holder are leaked or the One-Time-Password is intercepted during a transaction. Many solutions were proposed wherein a face detection mechanism was used to cross-check whether the person trying to make a transaction is genuine or not, but unfortunately if the threshold set is poor even the face of the person could be morphed. It can be used for making a transaction. And these systems had a poor accuracy rate. Pozzolo et al. [1] analyzes in detail the real-world working conditions of fraud detection system and provided a formal description of the articulated classification problem involved. Some of the approaches currently used to detect fraud transactions are the Artificial neural network, Fuzzy logic, Genetic algorithm, Logistic regression, Decision tree, Support vector machine, Bayesian network, Hidden Markov model, and K-nearest neighbor method. Kho and Vea [2] and Maniraj et al. [3] used machine learning algorithms to analyze all authorized transactions and flag any that seem suspicious. Professionals look into these reports and get in touch with the cardholders to confirm whether the transaction was legitimate or fraudulent. In each of these methods, the parameters of the transaction–such as the geographic location, amount, and timing are examined to determine whether a transaction is fraudulent. If the parameters of credit card transactions of the person are analyzed and the face of that person could be prior to the transaction then it would provide multilevel authentication and become very difficult for the attacker to commit fraud. The credit card history and the face data of the user are contained in a dataset on which an ML algorithm will be applied. The algorithm or the model will take a union of the past behavior of the user and the face data to claim whether the current user is a fraud or not.
2 Prior Work A large amount of literature on anomaly or fraud detection in this domain has already been published and is freely available. In related work, Yu and Wang [4] used credit card transaction data from a single commercial bank to accurately estimate fraudulent transactions using outlier mining, outlier detection mining, and distance sum algorithms Trivedi et al. [5]. Outlier mining is a type of data mining commonly used in the financial industry. It deals with spotting detached elements from the main system, such as fraudulent transactions. Based on the values of such attributes, they calculated the separation between the
Multilevel Credit Card Fraud Detection Using Face Recognition and Machine Learning
105
observed value of that attribute and its preset value. Unusual methods, such as the hybrid data mining/complex network classification algorithm, which is based on a network reconstruction algorithm and allows the creation of representations of the divergence of one instance from a reference group, have proven successful in medium-sized online transactions. Some of the techniques employed in this field include applications of data mining, automated fraud detection, and adversarial detection Phua et al. [6]. Suman, Research Scholar, GJUS&T at Hisar HCE [7], suggested supervised and unsupervised learning as techniques for detecting fraud credit card transactions. These techniques and algorithms, despite their surprising success in some areas, were unable to offer a reliable and consistent solution to fraud detection. The effectiveness of classification models for the challenge of detecting credit card fraud was demonstrated by Shen et al. [8]. The authors proposed three classification models decision tree, neural network, and logistic regression. Among the three models, neural network and logistic regression outperform the decision tree. There have also been initiatives to move forward from a completely another angle. Sahu et al. [9] made an attempt to enhance the alert feedback interaction in the event of a fraudulent transaction. The authorized user will be alerted, and a response will be sent to deny the current transaction. One of the ways that provided new light on this topic was the Artificial Genetic Algorithm Vats et al. [10], which tackled fraud from a different perspective. It was successful in detecting fraudulent transactions and reducing the number of false alarms. Despite the fact that it was accompanied by a categorization issue with varying misclassification costs.
3 Proposed Model The architecture diagram, i.e., Fig. 1. explains how exactly the solution works at the system level. At the initial stage, Bank verification/signup is completed by entering the credit card details and OTP. Here, the system also asks for the user’s face data during signup. This face data will be kept as a benchmark for a user. When a credit card is used for a transaction, the face provided in real-time will be compared with the face data of the credit card owner stored at signup. The 2 phases after the OTP module are: • Machine learning module: In essence, this is a model that has been trained to assess whether a transaction is fraudulent or not by looking at the user’s past transactions and other factors. If everything seems normal then it gives 0 as the output otherwise 1. • Face recognition module: In this module, initially, the system captures the face of the user and tries to match it with the face stored in the DB. So, as we can see the image matrix from the DB is fetched and compared with the captured face. The system continues the transaction if the difference between the two faces is less than
106
T. Deshpande et al.
Fig. 1 System flow
or equal to the threshold value; otherwise, the transaction is aborted. Finally, an AND operation is performed between the model output(1/0) and the face module output. If it is 0 then the transaction takes place successfully.
4 Input Dataset and Preprocessing The dataset taken for the machine learning module is the ‘Credit Card Transactions Fraud Detection Dataset’ from Kaggle retrieved by Shenoy [11]. This is a simulated dataset that contains legitimate and fraudulent transactions between the duration of January 1, 2019 to December 31, 2020. The number of customers in the dataset is 1000, and the number of merchants is 800. The total number of transactions in the dataset is 1852394. The dataset is preprocessed before doing the ML model training. The preprocessing involves creating 6 new columns. A new column of age is derived by using the columns ‘trans-date-trans-time’ and ‘dob’ which represent the date of the transaction and the date of birth of the person making the transaction. The remaining 5 columns include ‘hist-trans-60d’ (which means the number of transactions done by this customer in the last 60 days), ‘hist-trans-24h’ (which gives us the number of transactions done in the last 24 h), ‘hist-fraud-trans-24h’ (which
Multilevel Credit Card Fraud Detection Using Face Recognition and Machine Learning
107
gives us the number of fraudulent transactions of this customer in the last 24 h), ‘hist-fraud-trans-2h’ (which gives us the number of fraudulent transactions of this customer in the last 2 h) and lastly, ‘hist-trans-avg-amt-60d’ (which points out the average money spent by the customer in the last 60 days). Finally, the null values if any are filled with the value 0, and the process of oversampling is done to equate the number of legitimate and fraudulent transactions in the dataset. After this, the dataset is given for the model development process.
5 Machine Learning Module The face recognition part of our product is accompanied by an ML model which predicts the fate of the transaction on the basis of the transaction data. The output dataset from the preprocessing steps discussed above is again preprocessed by removing unnecessary columns and converting the string inputs into integer outputs by using labelEncoder(). This preprocessed dataset is then fitted to different models. The training size is kept at 70% and the testing size is kept at 30%. The columns chosen for the prediction are based on correlation with the target column and the ones we finalized included ‘amt’ which refers to the amount, ‘lat’ and ‘long’ which refer to location, ‘city-pop’ which refers to the current population of the city, ‘age’ which is the age of the customer, ‘hist-trans-60d’, ‘hist-trans-24h’, ‘hist-fraud-trans-24h’, ‘hist-fraudtrans-2h’, and ‘hist-trans-avg-amt-60d’ which were discussed in Sect. 4 and lastly, ‘Category’ which deals with the category that the object being purchased falls in.
5.1 Evaluation Metrics The evaluation metrics used to evaluate the ML models considered are as follows: Precision: Hossin and Sulaiman [12] described that Precision allows us to get the degree of potential to categorize exquisite samples, the within-face model. Each advantageous and bad sample to be classified ought to be considered whilst calculating the accuracy of the model. Accuracy is ought to consider all fine samples efficiently or incorrectly labeled as high-quality. Precision =
TruePositives (TruePositives + FalsePositives)
(1)
Recall: Hossin and Sulaiman [12] say that recall refers to the degree to a number of advantageous samples correctly classified by using the ML model. A model is said to have excessive accuracy, and a good reputation if it classifies samples as positive, and can also correctly classify a small variety of superb samples. The model successfully classifies all wonderful samples. It now does not bear in mind whether or not a bad sample is classified as advantageous.
108
T. Deshpande et al.
Precision =
TruePositives (TruePositives + FalseNegatives)
(2)
F1 score: Hossin and Sulaiman [12] states that F1 score considers both false-positive and false-negative results. Intuitively, accuracy is much less obvious, but F1 is commonly much more helpful than accuracy. Accuracy works great when the cost of false positives and false negatives is similar. F1-score =
2 ∗ (Recall ∗ Precision) (Recall + Precision)
(3)
Accuracy score: Hossin and Sulaiman [12] point out that accuracy is the maximum intuitive measure of performance and in reality the ratio of the variety of expected observations to the entire quantity of observations. We assume that our model is at its satisfactory when it has excess accuracy. Sure, precision is a great measure, but only if you have a symmetric information set with about the same false high-quality and fake negative values. Consequently, different parameters must be examined to evaluate the performance of the version. Accuracy =
(TP + TN) (TP + FP + FN + TN)
(4)
Support: Support in normal terms means the count of a particular data plot. Disproportionate aid in the records may also point to a structural flaw in the classifier’s scores, indicating the necessity for a stratified sample or recalibration. Instead of switching between models, assistance diagnoses the evaluation strategy [13].
5.2 Model Output By making use of the above preprocessing and model training and testing steps, The outputs we got are shown below: Naive Bayes:
Here, we can see that the accuracy is 78% and the precision and recall are 0.87 and 0.73 for non-fraudulent and fraudulent data, respectively. So, we can see that there are false positives and false negatives detected by this model.
Multilevel Credit Card Fraud Detection Using Face Recognition and Machine Learning
109
Logistic regression:
For Logistic Regression, we can see that the accuracy is 93% and the precision and recall are 0.93 and 0.93 for non-fraudulent and fraudulent data, respectively. So, we can see that there are relatively less false positives and false negatives detected by this model. XGBC Classifier:
For this model, we can see that the accuracy is 100% and the precision and recall for non-fraudulent and fraudulent data are also giving optimal values for the data given to the model. So, we can see that the least number of false positives and false negatives are detected by this model. The best performing model which in this case is XGBC classifier is therefore used for making the prediction based on the transactional data o the customer. New data is provided to predict the ‘is_fraud’ column and if the transaction is fraudulent then the output is 1, and if the transaction is not fraudulent ‘is_fraud’ is predicted as 0.
6 Face Recognition Module Face recognition is an improvement to credit card security. Here we are using LBPH algorithm to calculate the difference between face data stored in the database and live frame images obtained during face recognition. Ahmed et al. [14] showed that LBPH algorithm works well for face detection of low resolution images. The pixel is an image’s fundamental building block. Every image has pixels with values ranging from 0 to 255. Red, green, and blue are the primary colors and make up each of the three values that make up a pixel. We conclude that a single pixel has three channels, one channel for each of the three basic colors because the combination of these three basic colors will produce all the colors seen in the image. Figure 2, obtained from Shaikh et al. [15] shows an abstract method to calculate the average face difference from a face matrix. Suppose, the central element, i.e., 90 is taken as the threshold. Now, the values in the matrix which are above 90 are replaced by 1 and the rest
110
T. Deshpande et al.
Fig. 2 Calculating average face difference from face matrix [15]
of them are replaced by 0. Later, if we combine the 1’s and 0’s of this matrix and represent it as a binary value, we obtain a decimal Value = 141 after converting this binary value. It shows that the total value of all the pixels surrounding the central pixel is 141. And this process will continue for all other pixels. In this module, we will calculate the difference between the face which will be stored as a benchmark. Here, a question arises how is a threshold calculated for the face difference above which the system can definitively conclude that the transaction is fraud? For this, we collected the face data of 30 users and stored it in a database to find out the range of face differences given by our algorithm. Here are the charts that illustrate the same: • First, we plotted a scatter plot of non-fraudulent face differences for all our users. This gives us the face difference of a user with respect to an image of themselves stored in a database created. This helps us get an idea of the range of face differences given by our algorithm for a legitimate transaction. • This is done by taking continuous frames of face input of the legitimate user and comparing each frame with the face data stored in a database. • In Fig. 3, X-axis shows users, while the Y shows their face difference percentage while using face recognition. Each dot here represents face difference data, their highs, and lows. As we see Frequency of dot are more sectors below 0.3. • Next, we plotted a bar chart of fraudulent face differences for a range of users. • For this, we kept every user as a benchmark on which we have given the face input of all other users and took the aggregate of all the obtained face difference values. • This gave us the values shown in Fig. 4 which represent fraudulent face difference data. The X-axis represents the user we kept as a benchmark and Y axis gives the average face difference between all other fraud users and the benchmark user. As we can see most of the bars are above 0.5. Therefore, all these charts when analyzed deeply, conclude that legitimate users face differences with go to a maximum of 0.45 for some rare cases but for the majority, the average difference goes up to 0.40 maximum. Whereas the fraudulent data chart shows us that the minimum difference is greater than 0.50. Here, by the support of these results, we propose the threshold value by a margin of safety to be 0.40.
Multilevel Credit Card Fraud Detection Using Face Recognition and Machine Learning
Fig. 3 Face difference of a user to the same user stored in a database
Fig. 4 Average face difference between fraud users to legitimate users stored in the database
111
112
T. Deshpande et al.
7 Integration of the Modules The backend of the transaction will have the execution of the two separate parallel modules discussed above, which include the face recognition module and the machine learning module. The final output of our system will depend on the individual output of these modules. There are four possible scenarios: • The machine learning and face recognition modules both predict 0 which means there is no fraud and the transaction is approved. • The machine learning module gives 0 but the face recognition outputs 1 which means the face provided is not matching with face stored earlier. Therefore, here there is a fraud, and the transaction is aborted. • The face recognition module says 0 but the transaction history of the customer is not regular or normal and therefore the machine learning module gives the output as 1. Here too, the transaction will be aborted. • The machine learning module as well as the face recognition module predict 1 which points out that neither the transaction history is normal nor the face given is correct. Therefore, fraud is detected, and the transaction is canceled.
8 Result and Discussion To test our proposed system, we took an image of a user (Fig. 5) to give it as input to our system. The transaction parameters required by the ML model are also given to the machine learning module. As the transaction is legitimate with proper inputs to both modules, the message ‘Transaction approved’ was given and the result is displayed in Fig. 6. As the output from both the machine learning module and face recognition module is 0, the transaction is approved as the user is legitimate.
Fig. 5 Test user
Multilevel Credit Card Fraud Detection Using Face Recognition and Machine Learning
113
Fig. 6 Legitimate transaction
Fig. 7 Fraudulent transaction
Now, we gave the same data to our ML module but gave a different face input to the face recognition module which gave us a ‘Fraud detected’ message as the face does not match. The output is shown in Fig. 7. Here we can see that the face difference output is 0.609 which is above the threshold discussed in Sect. 6. Therefore, the output from face recognition is 1, and fraud is detected.
9 Conclusion In this paper, we proposed a new system for detecting credit card fraud transactions. This system involved 2 modules which include a face recognition module and a machine learning module. The face recognition module took an input of a face which is compared with the one stored as a benchmark whereas the machine learning module is given an input set of transactional data elements based on which the prediction is done. For transactional data, we implemented ML models such as Logistic regression, XGBC classifier, and Naive Bayes classifier, whose results can be vulnerable. Thus
114
T. Deshpande et al.
we integrate the LBPH algorithm with the ML model to provide multilevel security. Other members of a family won’t be able to make a transaction because their face is not stored in the database. Future solutions can be found for this issue.
References 1. Dal Pozzolo A, Boracchi G, Caelen O, Alippi C, Bontempi G (2018) Credit card fraud detection: a realistic modeling and a novel learning strategy. IEEE Trans Neural Netw Learning Syst 29(8):3784–3797. https://doi.org/10.1109/TNNLS.2017.2736643 2. Kho JRD, Vea LA (2017) Credit card fraud detection based on transaction behavior. TENCON 2017—2017 IEEE region 10 conference, Penang, Malaysia, pp 1880–1884. https://doi.org/10. 1109/TENCON.2017.8228165 3. Maniraj SP, Saini A, Ahmed S, Sarkar S (2019) Credit card fraud detection using machine learning and data science. Int J Eng Res 08. https://doi.org/10.17577/IJERTV8IS090031 4. Yu W-F, Wang N (2009) Research on credit card fraud detection model based on distance sum. In: 2009 international joint conference on artificial intelligence, Hainan, China, pp 353–356. https://doi.org/10.1109/JCAI.2009.146 5. Trivedi I, Monika MM (2016) Credit card fraud detection. Int J Adv Res Comput Commun Eng 5(1) 6. Phua C, Lee V, Smith K, Gayler R. A comprehensive survey of data mining-based fraud detection research. In: School of Business Systems, Faculty of Information Technology, Monash University, Wellington Road, Clayton, Victoria 3800, Australia 7. Suman (2014) Survey paper on credit card fraud detection. Int J Adv Res Comput Eng Technol (IJARCET) 3(3) 8. Shen A, Tong R, Deng Y (2007) Application of classification models on credit card fraud detection. In: International conference on service systems and service management: 1–4. https:// doi.org/10.1109/ICSSSM.2007.4280163 9. Sahu A, Firoz A, Kirandeep K (2022) Credit card fraud detection. https://doi.org/10.13140/ RG.2.2.32147.14888 10. Vats S, Dubey SK, Pandey NK (2013) Genetic algorithms for credit card fraud detection. In: International conference on education and educational technologies 11. Shenoy K (2020) Credit card transactions fraud detection dataset. Retrieved from https://www. kaggle.com/datasets/kartik2112/fraud-detection 12. Hossin M, Sulaiman MN (2015) A review on evaluation metrics for data classification evaluations. Int J Data Mining Knowl Manage Process (IJDKP) 5(2) 13. The scikit-yb developers. Classification report, Yellow Bricks, Jan 2017 14. Ahmed A, Guo J, Ali F, Deeba F, Ahmed A (2018) LBPH based improved face recognition at low resolution. Int Conf Artif Intell Big Data (ICAIBD) 2018:144–147. https://doi.org/10. 1109/ICAIBD.2018.8396183 15. Shaikh A, Shaikh S, Shaikh R, Khan M (2020) Image sorting using object detection and face recognition. Int J Innov Sci Res Technol 5(3)
Transfer Learning-Based End-to-End Indian English Recognition System Shambhu Sharan, Amita Dev, and Poonam Bansal
Abstract ASR or Automatic Speech Recognition systems has been a focus of recent study. ASR is a computer science and linguistics based multidisciplinary system or technology that allows a computing system to deduce the transcribed form of text from the spoken speech waveform. DeepSpeech, a cutting-edge speech recognition system created by means of the end-to-end (E2E) deep learning approach, is one of the most modern ASR architectures. The deep learning approach basically is a highly enhanced Recurrent Neural Network (RNN) training algorithm that makes use of many Graphical Processing Units (GPUs). The majority of the training is conducted on foreign English speech resources, resulting in limited generalisation for Indian English. We developed an E2E-based ASR system for Indian English adopting a transfer learning strategy and the DeepSpeech model. We also employed finetuning as well as data argumentation techniques so as to optimise and refine DeepSpeech ASR in an Indian English scenario. Transfer learning as well as fine-tuning of the already trained DeepSpeech model is carried out using IndicTTS database of Indian English. The untrained model, the model trained by us, and other available ASR systems are analysed and compared. Keywords Automated speech recognition · Deep learning · DeepSpeech · IndicTTS dataset · Speech-to-text · Transfer learning
S. Sharan (B) · A. Dev · P. Bansal Indira Gandhi Delhi Technical University for Women, New Delhi, India e-mail: [email protected] A. Dev e-mail: [email protected] P. Bansal e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 689, https://doi.org/10.1007/978-981-99-2322-9_10
115
116
S. Sharan et al.
1 Introduction The globe has seen tremendous technological progress in the previous several decades. The future success is entirely dependent on humans and machines engaging alongside collaboratively. One such collaboration may be Automatic Speech Recognition (ASR), as speech is humans’ primary and most adaptable form of communication [1]. ASR is a versatile technical technology that enables individuals to enter relevant data into computers merely by speaking to a digital terminal. It translates continuous auditory speech to corresponding text, which the computer then interprets to retrieve appropriate instructions, resulting in a type of seamless conversation amongst humans and machines. ASR is thus a latest computing requirement for flexible human–machine interaction. An archetypal ASR is composed of various modules, including acoustic, lexicon, and language model as well as a decoder. The robust design architecture is based on several separate considerations, and even a standard acoustic model is trained frame by frame using Markov assumptions. The end-to-end (E2E) technique was presented in the domain and has lately gained popularity for eliminating all probable presumptions from a continuous ASR system and building a singular predictive model optimised on a sequence level [2]. End-to-end techniques to voice recognition have indeed been effectively adopted thanks to the rapid advancement of machine learning, deep learning algorithms, and use of supercharged hopped-up graphics processing units (GPUs). Several recurrent and convolutional layers are combined to create an interconnected system that functions as both, i.e. a language model as well as an acoustic model, which seamlessly translates speech inputs to transcribed verbatim [3]. Speech has a plethora of information and qualities that may be extremely beneficial to any ASR system [4]. Multiple algorithms for the ASR system have been suggested by various researchers, academicians, and firms from across the globe. Today, we have widely available cutting-edge ASR Engines like as Siri from Apple, Alexa from Amazon, and Google Assistant from Google. But, there are various issues with frequently accessible ASR, such as difficulty distinguishing distinct accents, particularly in the Indian context. There have also been some significant advancements in voice recognition field. Several computer techniques/methods have been developed to turn voice into text with reasonable transcription accuracy. ASR systems have a diverse set of application areas in a variety of industries. Today, a range of sectors use various uses of speech technology to fulfil a variety of jobs. Recent advancements in the field of deep learning and accessibility of significant datasets have benefited the area of ASR. Deep learning may replace decades of modelling research with new models that are more accurate and involve reduced manual work. The following sections address some of the most often used methodologies and algorithms for ASR modelling.
Transfer Learning-Based End-to-End Indian English Recognition System
117
1.1 HMM and GMM-Based Modelling The acoustic model (AM) generally simulates the stats of speech characteristics for every language-specific speech unit, like phones, syllable, or a word [5]. The fundamental schematic diagram of a voice recognition system has been depicted in Fig. 1. As shown, AM is necessary to examine speech feature vectors for acoustic content. Hidden Markov Models (HMMs) are statistical model used for acoustic modelling which generally make use of Markov process (a natural and random stochastic process) with unknown parameters, wherein probability of the prevailing existing state is exclusively determined by the existing state itself and is unaffected by any prior/former states. It is a two-fold dynamical system including a fundamental stochastic procedure that is not perceptible and can only be noticed via yet another set of stochastic procedures that generate a series of observed symbols. The HMM can be trained automatically and feasibly employed as far as computation time is concerned, because the same regards the sound input as quasi-static for shorter intervals (also called frames). The states of the HMM have not been straightforwardly accessible since possible different factors that are generally affected by the states have been perceptible for inspection. All states of HMM have an optimal frequency distribution for potential end-state token(s). The HMM parameters are the transitional probabilities of the states along with the mean, variance, and the weights of the mixture that describe the output distribution of the states. HMM creates models that are stochastic in nature, from the provided datasets that are known and thereafter equate probabilities of the testing/unknown dataset generated by more or less every model. Previously, HMMs were being employed for audio modelling in recognition systems of human speech. With the help of the HMM, a word in a vocabulary for
Fig. 1 Overall architecture of an Automatic Speech Recognition (ASR) [6]
118
S. Sharan et al.
ASR may easily be described, where one or more states of hidden portion symbolise a phonological unit, and observable portion signifies the statistical properties of the associated acoustic events in a specific feature space. HMM, on the other hand, has several restrictions. Because they are based on the maximum likelihood criteria, typical continuous density HMMs trained with expectation maximisation algorithms like Baum–Welch have weak discriminative capability among various models. Later, an HMM-GMM-blended ASR system was developed, with HMM serving as framework of structural sequence of speech signals and every state of HMM employing a GMM to simulate the acoustical properties of sound units [7]. The GMM may be thought of as an implementation of mixture of hybrid density models including the parametric and non-parametric models. It contains structure and parameters as in the case of a parametric model, that affect density behaviour in predictable ways, and on the other hand, it has numerous flexibility options, just as in non-parametric models, to allow for arbitrary density modelling.
1.2 ANN-Based Modelling Despite HMM-GMM architectures have largely overtaken various continuous ASR algorithms, they too are not without drawbacks. They employ short-time spectral characteristics, and their requirements for the system design are incompatible with the goal of boosting precision of accuracy in ASR systems. Because HMMGMM frameworks follow generative modelling approach, while the ultimate objective of ASR is pattern categorization, conditional model or the discriminative models appear to be more ideal for acoustic modelling. Furthermore, many machine learning methodologies such as Artificial Neural Network (ANN), Support Vector Machine (SVM), and others have emerged as viable tools for audio modelling. Artificial Neural Networks (ANNs) are very much similar to human brain’s neural network system. The efficiency of ANN to imitate nonlinear complex dynamical systems, i.e. the system wherein any variation in output is not proportionate to variation in the input, is one of its primary benefits in voice recognition [8]. Speech is essentially a nonlinear analog signal produced by a nonlinear system. ANNs attempt in imitating human brain’s behaviour so as to carry out recognition task for spoken utterances. ANN-based models usually tend to take an acoustical input, perform numerous operations at various layers, and thereafter output the recognised text as shown in Fig. 2 [9]. The final transcription or text output is determined based on the precision of weights determined by training dataset. The major benefit of ANNs in the field of ASR that includes noisy spoken utterances is indeed the categorisation as well as identification of static patterns. That being said, just ANN-based frameworks need not adequately perform in the case of continuous speech units. So, in order to overcome such issues, the ANN architectures were incorporated with HMM in some kind of a hybrid fashion to achieve optimal outcomes. Furthermore, with availability to massive trainable datasets, effective learning methods, and strong computer resources, the promise of ANNs has lately been
Transfer Learning-Based End-to-End Indian English Recognition System
119
Fig. 2 ANN-based ASR architecture [9]
realised. Some of the recent advances in ANN have resulted in deep learning, which has transformed various application fields, including ASR. Three aspects can directly be linked to deep learning’s short-term success above any classical machine learning. Deep learning, for example, provides end-to-end trainable designs that include extraction of the features, reduction in the dimensions, and final classification. In typical machine learning, these processes are otherwise regarded as isolated sub-systems that might result in inferior performance in pattern recognition. Secondly, without using any specified feature extraction techniques, the informative features specific to a target might be learnt from both input examples and classification targets. Thirdly, deep learning techniques are extremely versatile in terms of capturing complicated nonlinear interactions between inputs and output objectives at a level considerably above the capability of classic feature extraction approaches.
1.3 RNN and LSTM-Based Modelling The Recurrent Neural Network (RNN) is another type of NN that catches valuable temporal variations in sequential input such as voice to improve the accuracy of recognition. Hidden layers in an RNN architecture store the history of previous items in an input sequence. Notwithstanding its efficiency in sequential data modelling, RNNs have difficulties when trained on data series having more degrees of separation, utilising typical backpropagation approach. Long Short-Term Memory (LSTM) networks solve such issues by implementing the concept of “gates,” i.e. a kind of unique hidden units that efficiently manage the degree of data to retain or discard during backpropagation [10]. To boost speed, bidirectional RNNs use knowledge from the historical as well as the forthcoming events when processing sequential input. Our nation is quite often recognised as a diverse country. As per the eighth schedule of language, 22 languages spoken in different parts of the country are considered as official languages [11]. English is also used officially in more than 12 Indian states
120
S. Sharan et al.
and union territories, as well as a secondary language in few more states. Indian English (IE) basically is different variants of English being spoken in different parts of the country and by the Indian diaspora [12]. As adopted in the Indian constitution, it is also used by the Government of India for communication alongside Hindi [13]. Thus, there is a need to develop a better Indian English ASR model to help the county population to use the same properly. In the 2017 DeepSpeech model, an ultra-modern ASR system was created with an E2E modelling architecture. The system tends to perform very well in noisy conditions, even better than the commonly utilised, cutting-edge commercial speech systems. Because the system is premised on deep learning, the datasets for the purpose of training are most of the time made up of foreign English dialects. Hence, it performs poorly for the Indian English. Let us first discuss the DeepSpeech system followed by the way to adapt the same for Indian English. DeepSpeech is an advanced and sophisticated ASR system technologically advanced by utilising Baidu’s E2E ASR architecture [14]. Baidu’s DeepSpeech publication popularised the notion of E2E voice recognition systems. E2E indicates that the model receives the audio and produces the text output directly. Conventional ASR models, such as those developed using widely used libraries like Kaldi or Sphinx, determine phonemes and afterwards translate those phonemes to actual text. In 2017, Mozilla announced an publicly accessible implementation of Baidu’s paper and released the same as “Mozilla DeepSpeech” [15]. The system makes use of RNN as the architecture, and huge volume of audio recordings has trained over multiple GPU instances. Because it perceives straight away from data, it does not necessitate specialised mechanisms such as noise filtering or speech adaptation. Conventional speech systems depend primarily on highly engineered and technologically advanced processing stages, and complex and advanced pipelines made up of numerous different algorithms such as specialised features, acoustical models, and HMM. Such pipeline requires continuous reconfiguration to improve performance, requiring a significant amount of effort to tune the features as well as models. Aside from that, an especially made framework for reliability is considered necessary for greater adequacy of speech recognition in a loud background. Huge amounts of data give deep learning systems an upper hand in learning and improving ultimate results. It can automatically learn robustness to noise or speaker variation. The system also aids in extracting Mel Frequency Cepstral Coefficients (MFCC) from the spoken utterances and thereafter yields the transcripted output directly without requiring any external knowledge. The DeepSpeech network consists of six layers, i.e. first three dense layers that are fully connected, then a RNN layer (unidirectional) followed by another dense layer and an output layer, as shown in Fig. 3 [15]. The speech features are inputted into the initial connected dense layer, and output text is generated at the output layer. The linked dense layer is hidden and uses ReLU activation function, whereas the RNN layer uses LSTM cells. The network generates the probabilities of characters in matrix form for each time step. The Connectionist Temporal Classification (CTC) loss function has also been utilised to increase likelihood of proper transcribed output. The DeepSpeech library provides pre-trained ASR models with which one
Transfer Learning-Based End-to-End Indian English Recognition System
121
can develop ASR applications, as well as tools for training one’s own DeepSpeech models. Another intriguing feature is the option to contribute to DeepSpeech’s public training dataset via the Common Voice project. Deep neural networks are very much proficient in performing huge variety of tasks, but they frequently demand massive quantities of training data and computer capabilities. However, earlier research has shown that deep learning models can also be transferred between languages. As a result, we employ a transfer learning-based technique to handle the problem with restricted speech resources in an E2E architecture. Transfer learning basically is a kind of augmentation of learning in a new assignment by the transfer of information from a previously learned related assignment. We used the well-known DeepSpeech transfer learning algorithm to create an E2E recognition framework for English spoken in India. The research work
Fig. 3 DeepSpeech model architecture [15]
122
S. Sharan et al.
presented here describes the entire process of fine-tuning the well-known DeepSpeech model for Indian English. For training, the IndicTTS database is used, which is one of the greatest freely available resource for many Indian languages, including Indian English. It contains over ten thousand spoken English phrases uttered by local males and females from various locations of India.
2 Literature Review Kunze et al. investigated model adaptation-based transfer learning as a method for developing ASR models with limited GPU memory, throughput, and training data [16]. The authors undertake a series of systematic experiments in which they adapt a Wav2Letter CNN trained for English to German. They demonstrate that the method permits for quicker training on consumer-grade resource base while using fewer data for training to attain almost similar precision, cutting the cost of training various languages ASR models. Contemplation of the model indicated that tiny changes to the network’s compositions seemed effective for high performance, particularly in the inner layers. Asadolahzade et al. develop a phone detection algorithm for Persian language and investigate the impact of domain adaptation [17]. The authors trained the network using English corpus first, thereafter transferred and fine-tuned it with the Persian data. Their tests on the FarsDat show that transfer learning can lower the error rates of phonemes by 7.88% when compared to a model built from scratch. Furthermore, they reported error rate enhancements of 2.08 and 1.52% when evaluated by comparing with the DNN-HMM and DNN-HSMM, correspondingly. The impacts of transfer learning on DNN-based voice recognition systems are presented by Asefisaray et al. [18]. Their reference model has been trained on a huge corpus of contact centre telephony records, while the target is acoustically mismatched out-of-domain data consisting of meeting recordings from Turkey’s Grand National Assembly. The authors evaluated how varied target training data sizes, transferred layer counts, and feature extractors affected transfer learning. Their investigations reveal that the acquired models beat the models that were simply trained on the target data for all target training sizes, and the model transferred utilising 20 h of target data had 7.8% greater recognition rates than the reference system. Hjortnaes et al. build DeepSpeech models using thirty-five hours of colloquial Komi recorded conversations and adjust the result with linguistic models built from a variety of sources [19]. The authors’ prior researches have shown that the concept of transfer learning with DeepSpeech may enhance the accuracy of a voice recognizer for Komi, despite the fact that the error rate continues to remain quite high. As a result, the authors conducted additional tests using language models constructed using KenLM from online text materials. These are built using two corpora: one for literary texts, one for social media material, and one that combines the two. They then performed simulations with each language model to investigate the influence of
Transfer Learning-Based End-to-End Indian English Recognition System
123
the data source on the voice recognition model. Their findings reveal considerable refinements of more than 25% in error rate of characters and approximately 20% in words. This provides a crucial analytical perspective into how ASR outcomes might be improved when resources are limited. Transfer learning may compensate for a lack of training data in the target language, and internet texts are a valuable resource for creating language models in this scenario. Beibut et al. present a technique for obtaining a pre-trained prototype of the Russian language and using the prototype’s weight in their neural network [20]. They used the Russian language model since the Kazakh and Russian languages are fairly close in sound. The researchers at their institution created the database of Kazakh language with transcribed verbatim. In their research, 50 native speakers contributed around 400 sentences. For the automatic extension of the database, a specific technique has been developed. The information was gleaned from wellknown Kazakh classics such as “Abai zholy,” “Kara sozder,” and others.
3 Methodology This section covers the training of the DeepSpeech model and optimising process for Indian English. The most recent DeepSpeech model has been trained for Indian English.
3.1 Data Collection DeepSpeech’s training is performed using the IndicTTS dataset, which is a multiinstitutional initiative aimed at creating text-to-speech synthesiser technologies for Indian languages, enhancing synthesis performance, and integrating TTS into diverse applications [21]. It is a consortium-based initiative supported and funded by the Department of Electronics and Information Technology (DIT). The IndicTTS English collection comprises English language audios’ recordings with annotation files spoken by local males and females from various areas and states across India including Assamese, Bengali, Hindi, Kannada, Malayalam, and Tamil. Separate zip archives contain 10,000+ spoken sentences in wav format by both males and females, as well as their corresponding transcription files. The overall storage capacity requirement of Indic TTS dataset for Indian English is approximately fifty gigabytes.
124
S. Sharan et al.
3.2 Preprocessing DeepSpeech training and parameter tuning seek the input data in a certain format so that it may be immediately transferred into the algorithm’s designated input pipelines. For DeepSpeech training, we must analyse the metadata of the input in order to produce training, testing, and dev categorization, respectively, representing the training, testing, and development dataset. Initially, a Comma Separated Value (CSV) output file is created, which is then separated into train, test, and dev files using a bash script that has been developed in Linux terminal. IndicTTS datasets, on the other hand, contain a variety of abnormalities that are incompatible with DeepSpeech training. To prepare IndicTTS data for training, it must be preprocessed. There are various aspects in the IndicTTS dataset-generated CSV file that must be considered, such as: • Sampling rate: The sample rate of the input waveforms in the IndicTTS database is 48 kHz, whereas the requirement of the same for the purpose of DeepSpeech training is 16 kHz. Each of the audio file from the database has been analysed and down sampled to 16 kHz using Sox tool, in order to be compatible with DeepSpeech training. • Special characters: In the transcription files, there are various special symbols like @, #, $ etc., punctuation marks like ;,:„,!, etc., additional space, and capital character types. We improved the transcript data by removing all of these errors by way of writing a bash script for the same using the regular expressions.
3.3 Data Augmentation and Parameter Adjustment The DeepSpeech supports the data augmentation and hyperparameter configuration for improved model generalisation and fine-tuning adjustments. These are very helpful techniques for improving the generalisation of deep learning models. It is used in preprocessing pipelines to change some of the training parameters. When a relevant flag is applied to a certain augmentation, datasets will be processed and trained using the specifically configured data augmentation approach. However, whether or not an augmentation is supposed to apply on a training set of data samples is determined by chance based on the augmentation’s probability value. Most of the hyperparameters were left as they were in the DeepSpeech’s default pre-configuration; however, batch size, dropout, and learning rate were adjusted, so as to account for the quantity of data for training. The most effective hyperparameters obtained have been shown in Table 1 illustrated. Aside from data augmentation and hyperparameter tuning, DeepSpeech also provides the option to construct checkpoints during the training process so that if the training is interrupted due to an error, it may be resumed from the very same moment
Transfer Learning-Based End-to-End Indian English Recognition System Table 1 DeepSpeech hyperparameter values for Indian English dataset
Hyperparameters
Values
Batch size
22
Dropout
0.25
Learning rate
0.0001
125
using those created checkpoints. DeepSpeech makes it possible to build a checkpoint directory for recording the checkpoints during the training and then load the created directory, which contains training checkpoints.
3.4 Training NVIDIA DGX A100 GPU server of the university is used for initial training and testing of the system for Indian English corpus set from the IndicTTS database. The NVIDIA server we used is one of the first artificial intelligence-based servers that is built on A100 TensorCore GPU that delivers high computational performance. The server, which incorporates eight A100 GPUs with 320G GPU RAM, offers exceptional performance and is completely designed for NVIDIA CUDA as well as the end-to-end NVIDIA data centre solution. The training of the DeepSpeech for Indian English required around seventy minutes to train and validate using 50 epochs. A specialised and fine-tuned model after performing numerous experimentations has been finalised and saved for future implementation like integration with different applications. Moreover, this trained model outperformed the baseline DeepSpeech model. As indicated in Table 1, we looked for a decent range of hyperparameters. We choose the learning rate and train batch size in the first iteration to find the dropout having the least value of WER. We then utilised the best dropout as determined in the first iteration, i.e. 0.25, and kept the train batch size at 22 to get the optimal learning rate. Lastly, we used the optimum dropout value, i.e. 0.25, and learning rate, i.e. 0.0001, to test influence of batch size, which reveals that our initial choice of 22 was appropriate, even if somewhat better results were attainable with smaller batches.
4 Results and Conclusion We initially modelled the system by way of proper training and thereafter performed the testing on the Indian English modelled system for the IndicTTS dataset. Word Error Rate (WER) of the system is noted to be 0.1634, Character Error Rate (CER) to be 0.0354, and loss to be 15.6314 in first iteration. The greatest result that the system reported is WER—0.0264 and CER—0.0167 with loss 4.6381. Generally
126
S. Sharan et al.
speaking, on the whole, the inference can be drawn from the data that the modelled system outperforms the existing DeepSpeech framework. In the current research work, an end-to-end ASR system, trained on Indian English audio corpus gathered from the subset of IndicTTS dataset, is generated. This can be used in several ASR-based applications specifically in Indian states. As a result, the findings clearly support the notion that the DeepSpeech model might easily be transferred to other similar languages like Indian English. NVIDIA DGX A100 GPU server has successfully been utilised in training and running of the model being developed. We have mentioned the best hyperparameter values after performing numerous experiments. In the future, the model can be further re-trained and reoptimised with different Indian English accents other than the IndicTTS dataset, so as to cover different accents spoken across the country. The training might also be carried out on standard personal computers as the amount of data is limited in case of transfer learning, but the computational time could increase comparatively.
References 1. Agrawal SS, Devi A, Wason R, Bansal P (2018) Speech and language processing for humanmachine communications 2. Chang HJ, Lee HY, Lee LS (2021) Towards lifelong learning of end-to-end ASR. In: Proceedings of the annual conference of the international speech communication association, INTERSPEECH (2021). https://doi.org/10.21437/Interspeech.2021-563 3. Liu Q, Chen Z, Li H, Huang M, Lu Y, Yu K (2020) Modular end-to-end automatic speech recognition framework for acoustic-to-word model. IEEE/ACM Trans Audio Speech Lang Process. https://doi.org/10.1109/TASLP.2020.3009477 4. Dev A, Sharma A, Agarwal SS (2021) Artificial intelligence and speech. Technology. https:// doi.org/10.1201/9781003150664 5. Gales MJF (2010) Acoustic modelling for speech recognition: hidden markov models and beyond?https://doi.org/10.1109/asru.2009.5372953 6. Karpagavalli S, Chandra E (2016) A review on automatic speech recognition architecture and approaches. Int J Signal Process Image Process Pattern Recognit. https://doi.org/10.14257/ ijsip.2016.9.4.34 7. Swietojanski P, Ghoshal A, Renals S (2013) Revisiting hybrid and GMM-HMM system combination techniques. In: ICASSP, IEEE international conference on acoustics, speech and signal processing—proceedings. https://doi.org/10.1109/ICASSP.2013.6638967 8. Mendiratta S, Turk N, Bansal D (2019) ASR system for isolated words using ANN with back propagation and fuzzy based DWT. Int J Eng Adv Technol. https://doi.org/10.35940/ijeat. F9110.088619 9. Alam M, Samad MD, Vidyaratne L, Glandon A, Iftekharuddin KM (2020) Survey on deep neural networks in speech and vision systems. Neurocomputing. https://doi.org/10.1016/j.neu com.2020.07.053 10. Wang Q, Feng C, Xu Y, Zhong H, Sheng VS (2020) A novel privacy-preserving speech recognition framework using bidirectional LSTM. J Cloud Comput. https://doi.org/10.1186/s13677020-00186-7 11. Kumawat A. The national language of India, list of 22 languages in India, https://www.ssc adda.com/national-language-of-india/ 12. Wikipedia: Indian English. https://en.wikipedia.org/wiki/Indian_English 13. Language provisions in the constitution of the Indian Union
Transfer Learning-Based End-to-End Indian English Recognition System
127
14. Deep speech-scaling up end-to-endspeech recognition.pdf. https://arxiv.org/abs/1412.5567. https://doi.org/10.48550/arXiv.1412.5567 15. Welcome to DeepSpeech’s documentation! https://deepspeech.readthedocs.io/en/v0.6.1/Dee pSpeech.html. Last accessed 15 June 15 16. Kunze J, Kirsch L, Kurenkov I, Krug A, Johannsmeier J, Stober S (2017) Transfer learning for speech recognition on a budget. In: Proceedings of the 2nd workshop on representation learning for NLP, Rep4NLP 2017 at the 55th annual meeting of the association for computational linguistics, ACL 2017. https://doi.org/10.18653/v1/w17-2620 17. Asadolahzade M (2019) Transfer learning for ASR to deal with low-resource data problem 18. Asefisaray B, Haznedaroglu A, Erden M, Arslan LM (2018) Transfer learning for automatic speech recognition systems. In: 26th IEEE signal processing and communications applications conference, SIU 2018. https://doi.org/10.1109/SIU.2018.8404628 19. Hjortnaes N, Arkhangelskiy T, Partanen N, Rießler M, Tyers F (2020) Improving the language model for low-resource ASR with online text corpora. In: Proceedings of 1st joint workshop Spok language technology under-resourced language collaborate computing under-resourced language 20. Beibut A (2020) Development of automatic speech recognition for Kazakh language using transfer learning. Int J Adv Trends Comput Sci Eng. https://doi.org/10.30534/ijatcse/2020/249 942020 21. Indic TTS. https://www.iitm.ac.in/donlab/tts/
Impact of COVID-19 on the Sectors of the Indian Economy and the World Rahul Gite, H. Vathsala, and Shashidhar S. Koolagudi
Abstract It is known that the SARS-CoV2 (More popularly known as Corona Virus) has affected the way countries function. It has influenced the general health and economy of various countries. Earlier studies have discussed the economic repercussions of various epidemics qualitatively. This paper discusses employing correlation analysis in combination with machine learning techniques to determine the impact of the virus on country’s economic health. The results are justified by the trends seen in pre-COVID, COVID and post-COVID phases there by providing a base for predicting economic conditions of the world in case of any such pandemics in the future. The study includes country-wise analysis for which fifteen country’s economic data is analyzed and sector-wise impact analysis with a specific case of India has been attempted. Keywords Economy · AI · COVID-19 · Forecasting · Analysis
1 Introduction The disease caused by Coronavirus known as COVID-19 has majorly affected the social life along with economic health of all the countries. The virus continues to spread and there is no permanent solution to this worldwide pandemic. The first case of the coronavirus came out in the city of Wuhan in China and it started spreading quickly across the continents. The coronavirus infection slowly turned into an epidemic causing a lot of deaths. Coronavirus is a family of human and animal viruses [1]. Most of the affected people experience low to moderate respiratory illness. Some people are recovering R. Gite (B) · S. S. Koolagudi Department of CSE, NITK, Mangalore, Karnataka 575025, India e-mail: [email protected] S. S. Koolagudi e-mail: [email protected] H. Vathsala CDAC, Bangalore, Karnataka 560038, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 689, https://doi.org/10.1007/978-981-99-2322-9_11
129
130
R. Gite et al.
without any medical help, however, some people who have a prehistory of ailments like heart disease, diabetes, etc. require medical attention to recovering [2]. The virus is bound to stay for some more years, this is due to its ability to mutate into more contiguous strains [3]. The mechanism involved in its spread causes mass contraction of the virus within no time and this is a major concern. Though mass vaccination drives are being undertaken, their effect on the new strains is yet to be seen. Most countries were prioritizing the safety of their citizens, as a result, a lockdown was imposed from time to time. Due to the imposition of lockdown the economy of the country suffers, that in turn increases inflation rates [4], and this cannot be overseen. This makes it essential to have an overall descriptive and inferential outlook of the economy, to analyze and improve the economic health of countries and the world in general. To take corrective measures for facing such situations in the future, it is necessary to study the implications of this pandemic on various sectors (like health, financial, automobile, etc) affecting the economy of a country. The current study has two parts (1) Correlation coefficient analysis of various sectors affecting the economy during the pre-COVID-19 phase, lockdown phase and unlock phase and (2) Forecast of sector-wise trends in the unlock phase using Neural Networks (NN). The various sectors considered in the study and the methodology are discussed in later sections. The work includes deriving implied relationships and quantitative analysis, showing the general effects brought in by COVID-19 in India. The results obtained in this work are justified by the observed trends and hence, the method developed in the current work may be used to measure and predict the economic impact of future pandemics if any. The data used for the current work is collected from stock indexes present in stock exchanges. A stock exchange is an organized market, where traders can buy and sell the shares of different companies. Investors and traders connect to the exchanges via their brokers, and place buys or sell orders on these exchanges. There are currently two stock exchanges present in India, the National Stock Exchange (NSE) and the Bombay Stock Exchange (BSE). A stock index is an indicator of how a particular company or a sector to which the index belongs, is performing. The stock market reflects the conditions of an economy [5]. To analyze the relationship between different sectors, a case study on the stock index of various sectors in India is undertaken. A total of six sectors are considered and their relationship with each other as well as with BSE Sensex is observed with the help of correlation analysis. The data for the sectors is taken from BSE sector indices as BSE has 5243 companies listed with the total market capitalization of Rs.26,309,948.171 Cr (Crores). This market capitalization is a significant amount and makes it suitable to be considered for data collection. The details regarding the total number of companies and market capitalization can be found on the BSE website.2
1 2
Data as on 28 July 2022. https://www.bseindia.com/markets/equity/EQReports/allindiamktcap.aspx.
Impact of COVID-19 on the Sectors of the Indian …
131
The paper is further organized as follows, the next Sect. 2 gives the literature review, Sect. 3 provides details on the Methodology, Sect. 4 depicts the Results and related discussion and the conclusion is covered in Sect. 5.
2 Literature Review Research studies done earlier to assess the economic impact of epidemics were on simulation models. A study done by Karlsson [6] to assess the impact of the 1918 Spanish flu epidemic on the Swedish economy is based on the neoclassical growth model which is an extension of the standard difference-in-differences (DID) estimator. It was employed to exploit the differing flu mortality rates across Swedish regions. The policy brief issued by the Asian Development Bank to assess the economic impact of the Avian Flu pandemic on Asian economies has been done through macroeconomic simulations based on Oxford Economic Forecasting (OEF) global model, which incorporates both the demand and supply sides and adjusts to a new equilibrium after a shock [7]. The empirical estimates of the economic effects of the severe acute respiratory syndrome (SARS) epidemic are based on a global model called the G-Cubed (Asia-Pacific) model which was proposed by Lee and McKibbin [8]. Economic effects of epidemics are measured through economic costs derived from disease-associated medical costs or forgone incomes as a result of diseaserelated morbidity and mortality. In a global economy, the economic consequences of an epidemic in one country are transferred to other countries because of the integrated supply chains and capital markets. Some of the work undertaken in an attempt to make a study on the impact of COVID-19 on the Indian economy includes an empirical study conducted by Das and Patnaik [9] on secondary data collected from different sources such as the internet, books, articles & public investigations. This study forecasts a growth rate in GDP of 1.9% for the year 2020–2021. Nicola et al. [10] gives a review of the socio-economic implications of the COVID-19 pandemic. Walmsley et al. [11] studies the economic impact of COVID-19 in the United States of America by using a static computable general equilibrium model to simulate different phases of COVID-19. Pan et al. [12] studies the effect of COVID-19 on multiple factors based on the survey data obtained from different users. A correlation study was done by Bashir et al. [13] between climatic indicators and the COVID-19 pandemic. There have been a few studies so far with minimal insights into the impact of the COVID-19 pandemic on the economy of the country. Thus, leaving this issue open for research on significant findings and forecasting abilities using modern technologies, further helping combat such issues in the future.
132
R. Gite et al.
3 Methodology The current work aims at finding the impact of COVID-19 on different sectors of the economy; forecasts for the period and further. It also tries to find out the relation between different global economic factors and how COVID-19 has impacted these factors, thus, giving an idea about the global economy. The Correlation Coefficient (CC) is used to find the association between the different sectors and between global economic factors. Forecasts of different sectors are worked out using Neural Networks (NN), particularly Long Short-Term Memory Neural Networks (LSTM). For the CC calculations, six different sector-wise indices from the Bombay Stock Exchange (BSE) are considered. Correlation techniques are implemented on the data to study the relationship between the indices. Forecasting using the LSTM network is carried out, and the model is trained with five-time steps taken together, first fourtime steps are used to predict the fifth, with each time step consisting of the value of the particular stock index at a specific moment in time. For sector-wise analysis, data has been taken from 2019 and 2020, and further divided into 3 phases based on average global timelines. The different phases and their different periods are described in Sect. 4. For the second part of the work, ten geographically distant countries were chosen in each category, developed, developing and underdeveloped based on the World Bank’s classification of the world’s economies. The World Bank’s classification of the world’s economies is based on estimates of Gross National Income (GNI) per capita. Previous World Bank publications might have referred to this as Gross National Product (GNP). The GNI is gross national income converted to international dollars using Purchasing Power Parity (PPP) rates. The most current World Bank income classifications3 by GNI per capita (updated July 1 of every year) are as follows: • • • •
Low income: $1045 or less Lower middle income: 1046–4095 Upper middle income: 4096–12,695 High income: 12,696 or more
Low and middle-income economies are referred to as underdeveloped and developing economies, and the upper-middle-income and high-income economies are referred to as developed countries. The remaining of this section explains data sources, collection and processing of data, correlation and LSTM network concepts as applicable to the current study.
3
https://blogs.worldbank.org/opendata/new-world-bank-country-classifications-income-level2021-2022.
Impact of COVID-19 on the Sectors of the Indian …
133
3.1 Correlation Measuring the cause and effect relationship between two quantitative variables is defined as a correlation [14]. It is a statistical tool used to analyze the behavior of one variable concerning another. It is intended to find associations between different sectors and factors of the economy of the country and the world. Hence, CC analysis is employed to detect the most likely sectors affecting the economy. The Spearman rank-order correlation coefficient (Spearman’s correlation [15], for short) is a non-parametric measure of the strength and direction of association that exists between two variables. It is denoted by the symbol rs (or the Greek letter ρ ). To understand the level of correlation, the following scale [14] has been used to categorize the relationship between the indices where the variable R denotes the value of the correlation coefficient between the two variables • • • •
For 0 < R < 0.4 and − 0.4 < R < 0: Weakly Correlated For 0.4 < R < 0.7 and − 0.7 < R < − 0.4: Moderately Correlated For 0.7 < R < 1 and − 1 < R < − 0.7: Strongly Correlated Assumptions for Spearman’s rank correlation: The data must be ordinal, interval, or ratio [16]. In addition, because Spearman’s rank correlation measures the strength of a monotonic relationship, the data has to be monotonically related. This means that if one variable increases (or decreases), the other variable also increases (or decreases). In the current study, data is taken for a relatively short interval so we can assume a monotonic relationship between different sectors of the economy and economic growth. This assumption is also supported by the steady growth of the economy and the share market [17]. • The formula for the Spearman’s rank correlation coefficient when there are no tied ranks is given in 1 6 di2 (1) ρ =1− n(n 2 − 1)
ρ = Spearman’s rank correlation coefficient di = difference between the two ranks of each observation n = number of observations
3.2 Forecasting Post-Covid economic status was desired during the lockdown, so time series forecasting was used to forecast the trends. Forecasting involves taking models to fit historical data and using them to predict future observations. • Long Short-Term Memory (LSTM) Hochreiter and Schmidhuber [18], is a kind of Recurrent Neural Network (RNN) that solves some of the problems associated with traditional RNNs. RNNs are the kind of networks whose current output depends
134
R. Gite et al.
Fig. 1 Basic structure of RNN. Image source [19] Fig. 2 LSTM cell. Image source [19]
not only on the current input but on the previous inputs. A typical RNN cell is shown below in Fig. 1, As seen in the image, an RNN sees the current input xt and outputs a value h t . Sometimes it is required to learn long-term sequences to perfectly capture the trend of the sequence. In such cases, RNNs can learn to use past information. In theory, RNNs are absolutely capable of handling “long-term dependencies”. Each LSTM cell (Fig. 2) consists of a cell state which constitutes the current information held in that cell. LSTM consists of “gates” which control the information flowing through it, i.e., the cell state can be changed by the LSTM using the gates. The following steps are taken as each input enters an LSTM cell. First, the LSTM decides which information to throw away from the cell state. This is decided using
Impact of COVID-19 on the Sectors of the Indian …
135
the “forget gate layer”. The output of the forget gate is given by 2: f t = σ (W f .[h t−1 , xt ] + b f )
(2)
Where σ is the sigmoid function given by 3: σ (x) = 1/(1 + −x )
(3)
W f are the weights, b f is the bias, h t−1 is the previous output and xt is the current input. Next, the cell decides on the new information to be fed to the cell state. This is done by the input gate layer. Next, a tanh layer creates a vector of new candidate values that could be added to the state. The equations for the input gate are given by 4 and 5: (4) i t = σ (Wi .[h t−1 , xt ] + bi ) C0 = tanh(WC .[h t−1 , xt ] + bC )
(5)
The cell state is updated by Eq. 6. Ct = f t ∗ Ct−1 + i t ∗ C0
(6)
The output is calculated as given in Eqs. 7 and 8. ot = σ (Wo .[h t−1 , xt ] + bo )
(7)
h t = ot ∗ tanh(Ct )
(8)
LSTM has been widely used in speech recognition, text analysis and emotion analysis and can make accurate forecasting [20, 21]. It has also been used for stock market analysis [22–24]. The data must be prepared before it can be used to train an LSTM network. Two approaches for processing the time series data have been discussed next. Differencing is an important preprocessing operation that makes the data stationary and transforms the value series into a difference series. By doing differencing, small trends from the time series data have been removed so that the supervised learning model performs well. If the data is X , where xt represents the tth data point, each data point has been changed as given in Eq. 9 (9) xt = xt − xt−1 The scale of the data must be reduced to values between 0 and 1 so that we can remove the bias of different features having a different scale of values so that our model performs better. The Min-Max Scaling approach has been used to scale the data. Scaled values are given by the Eq. 10:
136
R. Gite et al.
X sc = (X − X /min )/(X /max − X /min )
(10)
The process of data collection and processing has been discussed in the next section.
3.3 Data Collection The BSE contains the data regarding the values of sector indices on all the days the stock market is open. This data is accessible through the BSE website.4 For each day, the BSE has the opening, closing, high and low values of each index for that day. Opening values correspond to the value of the index when the stock market opens for the day. Closing values correspond to the value of the indices just before the market closes. High and low indicate the highest and the lowest values of the indices observed for the day. The closing values of the sector indices were taken for each day. The data for the countries’ was obtained from world bank forums5 online. The collected data was converted into a series of time steps with each row containing five entries about data of five days. For example suppose dt contains the data related to the time step t then each row contains the values dt , dt+1 , dt+2 , dt+3 , dt+4 in the same order. This converts the forecasting problem to a supervised machine learning problem. The first four values are now used to predict the fifth step. Differencing was applied to the data values and then the data were scaled using a Min-Max scaler before sending the data for training into the forecasting model.
3.4 Sector Indices and Economic Factors The sector indices used for the study are BSE indices. These indices have been used to find the impact of one sector on the other sector and the overall economy. Health sector data has been included to find the impact of COVID-19 on the general health of the citizens of a nation. Due to the lockdown, the unavailability of the labor force increased, thus affecting the manufacturing sector. This relationship between the labor force and manufacturing units has been captured by including the auto sector in the data collection. To capture the effect on the consumer, the consumer durables sector has been included. To represent the banking and finance sector, the respective sectors have been taken into account. Finally, the BSE Sensex is taken to represent the overall economy of India. • BSE Healthcare—This index represents the healthcare sector. The BSE healthcare is designed to provide a benchmark reflecting companies that are classified as members of the healthcare sector. 4 5
https://www.bseindia.com/sensex/IndexHighlight.html. https://data.worldbank.org/.
Impact of COVID-19 on the Sectors of the Indian …
137
• BSE Auto—The automotive industry comprises a wide range of companies and organizations involved in the design, development, manufacturing, marketing and selling of motor vehicles. The BSE Auto index comprises constituents of the BSE 500 that are classified as members of the transportation equipment sector as defined by the BSE industry classification system. • BSE Consumer Durables—Consumer Durables are a category of consumer products that don’t have to be purchased frequently because they last for an extended period. The BSE Consumer Durables index comprises constituents of the BSE 500 that are classified as members of the consumer durables sector as defined by the BSE industry classification system. • BSE Bank—Banks and other financial institutions which provide lending and investments. The BSE Bank index comprises constituents of the BSE 500 that are classified as members of the banking sector as defined by the BSE industry classification system. • BSE Finance The financial sector is a section of the economy made up of firms and institutions that provide financial services to commercial and retail customers. The BSE Finance is designed to provide investors with a benchmark reflecting companies included in the BSE AllCap that are classified as members of the finance sector. • BSE Sensex The BSE Sensex is an index that is designed to reflect the condition of the entire economy of India. The Sensex is comprised of thirty of the largest and most actively traded stocks on the BSE and provides a gauge of India’s economy. To understand the impact of COVID-19 on the economy of the world in general, the following variables are chosen and these indicators are defined by World Bank6 : • GDP Growth Rate—Gross domestic product (GDP) is the total monetary or market value of all the finished goods and services produced within a country’s borders in a specific period [25]. This factor was chosen to note the overall economic output of a country. • Death Rate—a count of the number of deaths (in general, or due to a specific cause) in a particular population, per unit of time. • Economic Relief—Relief, in finance, public or private aid to persons in economic need because of natural disasters, wars, economic upheaval, chronic unemployment or other unprecedented factors. It is a form of incentive provided by the government to uplift the economy. • No of Doctors/1000—This is an important measure of the healthcare system. The number of doctors per 1000 population is a ratio of the total number of doctors in the country to 1000 people. • COVID Cases—This was the total number of COVID-19 cases recorded as of March 3, 2021. • Number of Lockdowns, when lockdowns were announced, and when lockdowns ended—This factor was chosen to note the varied responses countries had to handle the pandemic. The number of lockdowns and their durations were noted. 6
https://data.worldbank.org/indicator.
138
R. Gite et al.
Three developed countries (USA, France, Sweden), four developing countries (China, India, Russia, Brazil), and three underdeveloped countries (Sudan, Afghanistan, Syria) are taken for analysis. The results and their inferences are discussed in the next section.
4 Results and Discussion The current section shows the results of the experiments conducted. The results obtained for sector-wise analysis are discussed first followed by the results obtained through country-wise analysis. Significant results obtained through both analyses are discussed at the end of this section.
4.1 Sector-Wise Analysis A total of six sector indices are taken from BSE to measure the impacts of COVID19 on general health and the economy. These sector indices are representatives of the actual economic sector as they take into account the top fifty companies in the sector in the order of their market share. Market share represents the percentage of an industry, or a market’s total sales, that is earned by a particular company over a specified period. Market share is calculated by taking the company’s sales over the period and dividing it by the total sales of the industry over the same period. So the performance of these companies is a reflection of the sector’s performance in general as their market share is high. A brief description of the sectors is given in Sect. 3. This subsection describes the results obtained through correlation analysis and forecasting done for different sectors of the Indian economy.
4.1.1
Correlation Analysis
For correlation analysis, three timelines were chosen. They are labeled as: • Pre-COVID: This is the period taken from November 4, 2019 to March 13, 2020. During this time, COVID-19 had not impacted India to a reasonable extent and this period is named pre-COVID time. • Lockdown: This is the period taken from March 13, 2020 to June 3, 2020. This is the time from the lockdown announcement to the time Unlock-1 in India was announced. • Unlock: This is the period taken from June 3, 2020 to December 30, 2020. The purpose of the time frame is to study the impact of COVID-19 on the economy when some lockdown restrictions were lifted.
Impact of COVID-19 on the Sectors of the Indian …
139
Fig. 3 Correlation matrix for pre-covid phase as a heatmap for various sectors of Indian economy
Fig. 4 Correlation matrix for lockdown phase as a heatmap for various sectors of Indian economy
Figure 3 shows the correlation between the selected sectors during the pre-covid phase, Fig. 4 shows the correlation between the selected sectors during the lockdown phase, and Fig. 5 shows the correlation between the selected sectors in Unlock phase. In Figs. 3, 4 and 5, correlation factor of each attribute is taken with respect to all the other attributes thus forming a matrix. Correlation values range from − 1 to + 1 indicating the degree of dependency between the attributes. High positive values indicate a high degree of the directly proportional relationship between the attributes whereas high negative values denote a high degree of the inversely proportional relationship between the attributes. Values near zero show that the sectors are not much correlated with each other.
140
R. Gite et al.
Fig. 5 Correlation matrix for unlock phase as a heatmap for various sectors of Indian economy Table 1 Forecasting model details Epochs Loss function Optimizer No. of LSTM cells in each layer
4.1.2
1500 Mean squared error (MSE) Adam 4
Forecasting Model
To predict the direction of growth for the selected sectors, these forecasting models have been devised. Figure 6 depicts the neural network architecture using Long ShortTerm Memory (LSTM) with different layers stacked on top of each other. The same architecture has been trained separately on each sector’s data. In the training phase, the training data is divided into sets of different batches and taken into the input layer. It is then sent into the LSTM layer where the LSTM cells read the data and output the result. The outputs from the LSTM cells are sent to the output layer where the output of the LSTM layer is converted into the final prediction and then the prediction is compared to the original result based on which the model updates its weights. This process continues several times with each loop defined as an epoch. The trained model is then used for getting the forecast on the new data. The data for each sector indices have been taken from January 1, 2019 to December 30, 2020. The closing price on each day was taken as the price on that day. The details regarding the model are tabulated in Table 1 The model has been trained on GPU with a python 3 environment. The code has been developed using the Keras library and Tensorflow framework. The model was constructed using different layers stacked on top of each other with each layer consisting of multiple LSTM cells connected to each other. Each layer takes in an input and passes the output to the next layer after processing to generate the prediction. The Figs. 7, 8, 9, 10, 11 and 12 below show the predictions for the next
Impact of COVID-19 on the Sectors of the Indian …
141
Fig. 6 LSTM model used for forecasting
120 days along with the actual values of the sector indices for each of the sectors. This gives us the performance of the sectors with respect to the predicted performance of that sector. Each figure shows the Index value on the Y-axis and the Number of Days on the X-axis. The predictions are taken as the number of days moves from zero to sixtynine. The Starting Date is taken as January 1, 2021. As can be seen from the graphs, the overall error rate in the prediction of these sectors ranges from 2–3% which can be considered insignificant as the indices values range from six thousand in Finance Sector to nearly fifty thousand in Sensex showing that the model has performed well in forecasting the indices values.
4.2 Country Wise Analysis The countries are divided into three categories based on their overall development: • Developed countries Ex. USA, France, etc. • Developing countries Ex. India, Russia, etc. • Underdeveloped countries: Ex. Sudan, Afghanistan, Syria, etc.
142
R. Gite et al.
Fig. 7 Actual v/s forecasted values for auto sector
Fig. 8 Actual v/s forecasted values for bank sector
For finding the impact on various countries some factors have been chosen which have been given in Sect. 3: Figure 13 shows the correlation matrix of these factors for developed countries, Fig. 14 shows the correlation matrix of these factors for developing countries and Fig. 15 shows the correlation matrix of these factors for underdeveloped countries.
Impact of COVID-19 on the Sectors of the Indian …
143
Fig. 9 Actual v/s forecasted values for finance sector
Fig. 10 Actual v/s forecasted values for consumer durables sector
4.3 Discussion In this Section, some of the important results obtained are discussed. The definitions of high, moderate and low correlation are given in Sect. 3.
4.3.1
Sector-Wise Analysis
First, the results from the sector-wise analysis are discussed.
144
R. Gite et al.
Fig. 11 Actual v/s forecasted values for health sector
Fig. 12 Actual v/s forecasted values for sensex
Figure 4 shows the correlation between the selected sectors during the lockdown phase. A high correlation coefficient between the healthcare sector and Sensex during the lockdown period in India is observed. The Health Care sector is at the epicenter of recovering from the pandemic. With the ever-rising cases of COVID-19 leading to lockdown, the country and the economy are dependent on the health sector infrastructure. This can be seen as many businesses were stopped during the lockdown and due to the rising danger and fear among the people of COVID-19, people were dependent on the health sector for a cure or a vaccine. In Fig. 5, the correlation of different sectors is shown during the unlock phase. It can be seen that the dependence between the health sector and the economy(Sensex) didn’t diminish even in the unlock phase (Fig. 5) which suggests a dire need for a cure so that pre-covid times
Impact of COVID-19 on the Sectors of the Indian …
145
Fig. 13 Correlation matrix between different factors of the economy as a heatmap for developed countries
Fig. 14 Correlation matrix between different factors of the economy as a heatmap for developing countries
146
R. Gite et al.
Fig. 15 Correlation matrix between different factors of the economy as a heatmap for underdeveloped countries
can be achieved. The impact on the health sector of India by COVID-19 has been discussed in [26]. Figure 7 shows the actual v/s predicted value of the Indian auto sector for the future period relative to the period considered for correlation analysis. A general decline in the auto sector is being observed which is credited to the unavailability of labor during the lockdown phase. Rakshit and Paul [26] and Estupian et. al. [27] discusses the impact caused by COVID-19 in various sectors of our country and the labor market. It has been reasoned that industries like the automobile industry are affected due to lockdown and shutting down of factories and disruption in the supply chain. Unavailability of labor due to lockdown restrictions has also led to unemployment which is adding to this problem. Also, many industries are downsizing to cope with losses. Anghel [28] discusses the impact of unemployment on the overall economy. Figure 8 and 9 show the actual v/s predicted value of the Indian Banking and Finance sector, respectively. The Bank and Finance sectors spike up followed by a decline. As bank and finance sectors have the highest market capitalization of all the sectors we see BSE Sensex following a similar trend as seen in (Fig. 12) Fig. 10 and 11 show the actual v/s predicted value of the Indian consumer durables sector and the Indian Health sector. The forecasting model is predicting a growing trend in consumer durables and the health sector. The sector’s stock value is on a general increasing trend indicating recovery to pre-covid levels. Nomura [29] has shown that the economic activity in India had reached its pre-covid levels in September 2019. Maitra [30] had predicted a W-shape recovery for India.
Impact of COVID-19 on the Sectors of the Indian …
147
The credit for the recovery or rising trends has been given to the economic relief package which was announced by the government and labor availability during the later stages of the pandemic due to relaxation in the lockdown. The fall of the unemployment rate to pre-lockdown levels is discussed in [31].
4.3.2
Country Wise Analysis
Here some important results from the country-wise analysis are discussed. • Fig. 13 shows the correlation between the different global economic factors for developed countries. In developed countries, the GDP Growth rate has a negative correlation with the Number of lockdowns and the Number of COVID-19 cases: As the lockdowns were announced, many businesses were severely hit, and many of them got close. This leads to reduced incomes and increased unemployment, and hence this affects the GDP of the country [28]. As the number of COVID-19 cases increased rapidly, the governments tried to increase the number of lockdowns, overall impacting the GDP Growth Rate. • Fig. 14 and 15 show the correlation between the different global economic factors for the developing and underdeveloped countries respectively. As can be seen, the Number of Doctors per 1000 has a negative correlation with the number of COVID-19 cases in developing and underdeveloped countries: As the number of doctors per 1000 population is more, more patients can be attended to and taken care of properly. Hence the number of patients drops as time passes as more and more people recover. • Economic relief has a positive correlation with GDP growth rate in underdeveloped countries: As a large percentage of economic activities were halted due to COVID19, Economic Packages are released by the governments to boost the economy hence boosting the income of people which helps people to grow back their income to better levels and hence positively impacting the GDP.
5 Conclusion COVID-19 has impacted all of our lives and forced us to live in a restricted manner. It has spread to almost all countries majorly impacting the country’s economy and health. The governments have taken several steps to reduce the negative impact as much as possible. This work aims to find the impact of COVID-19 on various countries and sectors by finding how the relationship between different factors changes during the phases. Also, it tries to show the impact using the relationship between different factors(As shown in discussions). The intuitive results obtained are seen to be justified by the trends observed during the pre-COVID, COVID and post-COVID phases. which shows that correlation analysis can be used in the future to analyze the impact if any such pandemic occurs. Forecasting models can be used to predict the direction of growth.
148
R. Gite et al.
References 1. Decaro N, Lorusso A (2020) Novel human coronavirus (SARS-CoV-2): a lesson from animal coronaviruses. Veterin Microbiol 244:108693. https://www.sciencedirect.com/science/article/ pii/S0378113520302935 2. Balachandar V, Mahalaxmi I, Subramaniam M, Kaavya J, Kumar N, Laldinmawii G, Narayanasamy A, Reddy P, Sivaprakash P, Kanchana S et al (2020) Follow-up studies in COVID-19 recovered patients-is it mandatory? Sci Total Environ 729:139021 3. Tang X, Wu C, Li X, Song Y, Yao X, Wu X, Duan Y, Zhang H, Wang Y, Qian Z, Cui J, Lu J () On the origin and continuing evolution of SARS-CoV-2. Nat Sci Rev 7:1012–1023. https:// doi.org/10.1093/nsr/nwaa036 4. PHASE I (2020) The impact of COVID-19 on inflation: potential drivers and dynamics 5. Jiménez-Rodríguez R (2019) What happens to the relationship between EU allowances prices and stock market indices in Europe? Energy Econo 81:13–24 6. Karlsson M, Nilsson T, Pichler S (2014) The impact of the 1918 Spanish flu epidemic on economic performance in Sweden: an investigation into the consequences of an extraordinary mortality shock. J Health Econ 36:1–19 7. Bloom E, De Wit V, Carangal-San Jose M (2005) Potential economic impact of an avian flu pandemic on Asia. Asian Development Bank 8. Lee J, McKibbin W et al (2004) Estimating the global economic costs of SARS. In: Learning from SARS: preparing for the next disease outbreak: workshop summary, pp 92–109 9. Das KK, Patnaik S (2020) The impact of covid 19 in Indian economy—an empirical study. Int J Electr Eng Technol (IJEET) 11:194–202 10. Nicola M, Alsafi Z, Sohrabi C, Kerwan A, Al-Jabir A, Iosifidis C, Agha M, Agha R (2020) The socio-economic implications of the coronavirus pandemic (COVID-19): a review. Int J Surg 78:185–193 11. Walmsley T, Rose A, Wei D (2021) The impacts of the coronavirus on the economy of the United States. Econ Disasters Clim Change 5:1–52 12. Pan K, Yue X (2021) Multidimensional effect of Covid-19 on the economy: evidence from survey data. In: Economic research-Ekonomska Istraživanja, pp 1–28 13. Bashir M, Ma B, Komal B, Bashir M, Tan D, Bashir M et al (2020) Correlation between climate indicators and COVID-19 pandemic in New York, USA. Sci Total Environ 728:138835 14. Akoglu H (2018) User’s guide to correlation coefficients. Turkish J Emerg Med 18:91–93 15. Spearman C (1961) The proof and measurement of association between two things. AppletonCentury-Crofts 16. Hauke J, Kossowski T (2011) Comparison of values of Pearson’s and Spearman’s correlation coefficient on the same sets of data. Wydział Nauk Geograficznych i Geologicznych Uniwersytetu im, Adama Mickiewicza 17. Duca G (2007) The relationship between the stock market and the economy: experience from international financial markets. Bank Valletta Rev 36:1–12 18. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780 19. Olah C. Colah’s blog. https://colahgithub.io/posts/2015-08-Understanding-LSTMs 20. Zarrad O, Hajjaji M, Mansouri M (2019) Hardware implementation of hybrid wind-solar energy system for pumping water based on artificial neural network controller. Stud Inform Control 28:35–44 21. Saric T, Simunovic G, Vukelic J, Simunovic K, Lujic R (2018) Estimation of CNC grinding process parameters using different neural networks. Tehniˇcki Vjesnik 25:1770–1775 22. Yadav A, Jha C, Sharan A (2020) Optimizing LSTM for time series prediction in Indian stock market. Procedia Comput Sci 167:2091–2100 23. Kim H, Won C (2018) Forecasting the volatility of stock price index: a hybrid model integrating LSTM with multiple GARCH-type models. Expert Syst Appl 103:25–37 24. Petersen N, Rodrigues F, Pereira F (2019) Multi-output bus travel time prediction with convolutional LSTM neural network. Expert Syst Appl 120:426–435
Impact of COVID-19 on the Sectors of the Indian …
149
25. Callen T (2012) Gross domestic product: an economy’s all. International Monetary Fund, Washington, DC, USA 26. Rakshit D, Paul A (2020) Impact of COVID-19 on sectors of Indian economy and business survival strategies. Available At SSRN 3620727 27. Estupinan X, Gupta S, Sharma M, Birla B (2020) Impact of COVID-19 pandemic on labour supply, wages and gross value added in India. Ind Econ J 68:572–592 28. Anghel M, Anghelache C, Manole A et al. (2017) The effect of unemployment on economic growth. Romanian Stat Rev Suppl 65:174–186 29. Nomura India’s economic activity almost at pre-lockdown levels but Covid looms. The Economic Times (2020) 30. Maitra B, Kuruvilla T, Rajeswaran A, Singh A (2020) INDIA: surmounting the economic challenges of COVID-19. Arthur D Little 31. Sharma YS (2020) Unemployment rate falls to pre-lockdown level: CMIE. The Economic Times
Wind Farm Layout Optimization Problem Using Teaching–Learning-Based Optimization Algorithm Mukesh Kumar and Ajay Sharma
Abstract The wind farm layout optimization problem (WFLOP) is an emerging field in the area of non-conventional energy generation. The aim of WFLOP is to find out the arrangement of feasible number of turbines in a given circular wind farm that results in maximization of the energy output. The reviewed literature suggests that swarm intelligence (SI)-based techniques are performing very well to solve various real-world optimization problems. This article suggests to solve WFLOP using a recent SI-based technique namely teaching–learning-based optimization (TLBO). The optimum locations of the wind turbines are obtained for maximum power output in a wind farm of radius 500 m. The obtained outcomes are compared with renowned strategies available in the literature. The obtained outcomes validate the authenticity of TLBO algorithm to solve WFLOP. Keywords Wind energy · Wind farm · Swarm intelligence · Wind turbines · TLBO
1 Introduction In the field of renewable energy today, wind energy is considered as a significant energy source. Wind energy is collected from the wind turbines of wind farm where multiple wind turbines are integrated into a wind farm. This arrangement of wind turbines in a wind farm provides economic operation in term of installation as well as generation of wind energy. If the wind leaving the turbine has lower energy, then the wind upstream the turbine, the energy in the form of wind energy is extracted from the wind turbine. As a result, the wind downstream of the wind turbine slows down and becomes stormy. This downstream wind is known as the wake of the turbines. The reaction of the wake effect is to divert the flow of wind in a wind farm are (a) decrease in wind speed, which reduces the wind farm energy and (b) an increase in fluctuation, increasing the dynamic mechanical loading on downwind turbine. The M. Kumar · A. Sharma (B) Rajasthan Technical Univesity, Kota, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 689, https://doi.org/10.1007/978-981-99-2322-9_12
151
152
M. Kumar and A. Sharma
aim of WFLOP is to discover a better arrangement of wind turbines in expectation to get highest generated energy [1]. Table 1 demonstrates the contribution of various researchers in this area. The above discussion motivated the authors of this paper to strive for a more suitable algorithm to produce constructive result. As a result, this study applies a well organized SI-based method to solve WFLOP namely teaching–learning-based optimization algorithm (TLBO) proposed by [15]. This TLBO is based on the teachers behavior and impact of the teacher on the learners performance in the classroom. TLBO algorithm depicts the classroom behavior of teachers and learners approximately. The teacher teaches many subjects according to his/her knowledge and learner try to obtain the knowledge; therefore, the entire process contains two phase: teacher phase and leaner phase. In this teaching–learning process, the population is comprised of a set of learners, and many subjects are provided to learners that are comparable to the population design variable, and the output of the learner phase is similar to the fitness value. The teacher is the first and foremost, i.e., the best solution from entire population. The approach implies the optimal locations for wind turbines to generate maximum energy as well as the maximum number of turbines to be installed in a wind farm with a radius of 500 m. The space behind a wind turbine that is marked by decreased wind power capacity because the turbine itself used the energy in turning the blades is said to be wake loss. The collected outcomes are compared with the other well-known procedures available in the literature. The numerical results obtained indicate that the applied approach is valid enough. The paper is organized as follows: Sect. 2 presents the WFLOP formulation. The description of TLBO mechanism is covered in Sect. 3. The applied TLBO mechanism is presented in Sect. 3. The mapping of TLBO algorithm to solve WFLOP is described in Sect. 4. Implementation and experimental results are discussed in the Sect. 5. In the end, the Sect. 6 concludes the work and presents some ideas for future work.
2 Wind Farm Layout Optimization Problem (WFLOP) The WFLOP is designed to determine the optimal location of turbines in a wind farm to optimize expected power generation. The WFLOP is designed to determine the optimal location of turbines in a wind farm to optimize expected energy generation. The following principles explain the WFLOP [1]: 1. The number of wind turbinesT is known and fixed before the farm is built since electricity capacity is determined by the initial investment cost and the number of turbines that will be installed. For example, to attain 300 (MW) capacity, 200 turbines of 1.5 MW turbines are required. 2. The location of every turbine in the farm is represented by the two-dimensional coordinates (x, y), and the position is represented by a vector of length x 2 + y 2 . This premise means that the surface roughness of the terrain is negligible. The best solution is represented by the T position (xi , yi ). Where i = 1, 2, . . . , T for the T number of the turbine.
Wind Farm Layout Optimization Problem …
153
Table 1 Significant review of WFLOP using soft computing techniques. S. No.
Authors
Year
Proposed work
Contribution
1
Mosetti et al. [2]
1994
Genetic algorithm (GA)
Wind turbine optimization arrangement in a big wind farm of GA
2
Ozturk and Norman [3]
2004
Continuous location model
Heuristic method for wind energy conversion system positioning
3
Grady et al. [4]
2005
GA to lay out wind turbines
A 2 km by 2 km wind farm is subjected to the MPGA program while accounting for three different wind scenarios
4
Lackner et al. [5]
2007
Analytical framework for offshore
A strategy for employing the levelized cost of energy as the objective function in offshore wind farm layout optimization issues is presented
5
Mora et al. [6]
2007
Evolutive algorithm
The authors describe an evolutionary method for the best design of wind farms
6
Andrew et al. [7]
2010
Designed a multi-objective evolutionary algorithm
Designed a multi-objective evolutionary algorithm
7
Mittal et al. [8]
2010
Decreased the grid cell to 1 m Find out more available × 1m position make out close to unrestricted layout approach
8
Kusiak et al. [7]
2010
Continuously developing various metaheuristic approaches for solving WFLOP
Ant colony optimization (ACO) was used to optimize wind farm layout for maximizing the expected energy output
9
Eroglu et al. [9]
2012 and 2013
Demonstrated ant colony optimization (ACO)
Design of wind farm layout using ant colony algorithm
10
Eroglu and Seckiner [10]
2012
Particle filtering (PF) approach
Wind farm layout using filtering approach
11
Chen et al. [11]
2013
Wind farm layout optimization using genetic algorithm
Introducing wind farm layout optimization with different hub height wind turbines
12
Samorani [12]
2013
Wind farm design is solving the wind farm layout optimization problem (WFLOP)
Includes carefully arranging the turbines within the wind farm so that wake effects are minimized and thus the predicted power production is maximized
13
Shafiqur et al. [13]
2015
Designed modified particle swarm optimization
Designed modified particle swarm optimization
14
Rehman and Ali [13]
2015
Modified particle swarm optimization (MPSO)
Modified particle swarm optimization (MPSO)
15
Bansal and Farswan [1]
2017
Applied a biogeography-based optimization (BBO) algorithm
Applied a biogeography-based optimization (BBO) algorithm
16
Sharma et al. [14]
2022
Developed self-sacrificing ABC to solve WFLOP
Considered 3 case studies of wind farms 500, 750, and 1000 m radius
154
M. Kumar and A. Sharma
Table 2 Notation and abbreviation Notation/abbreviation Description
Notation/abbreviation
Description
Represent velocity deficit at turbine i wake of turbine j Turbine projected
V a, I v
Represent the wind speed at ith turbine
pt
Turbine thrust coefficient Represent wake spreading constant Represent the rest wind speed Represent the lowered wind Represent distance between two turbines
R
Maximum energy of ith turbine Radius of turbines
λ
Slope parameter
η
Intercept parameter
E( pi )
VD V
Velocity deficit Wind speed
a wa
Prated
Wind speed between the rated and cutout wind speed Wind direction Wind turbine
Vcutin
Power output of ith turbine Denotes the energy produced by ith turbine Angle Represent the wind probability of wind blowing at wind direction Cutin wind speed
(x, y) MW
Turbine placement Mega watt
Veldefi j
ith & jth ct k Vup Vdown d
θ T
Pa,i,v
3. On the farm, all wind turbines are equal in terms of internal and external characteristics such as brand, model, power curve, hub height, and so on. As a result, the farm is consistent. 4. At a given place, height, direction, and the v follows a Weibull distribution (k−1) − v k pv (v, k, c) = kc vc e ( c ) . This is a very usual hypothesis for many windy sites [1]. 5. Wind speed v is expressed as a parameter of the Weibull distribution and is a continuous function of wind direction θ . Therefore, v = v(θ ), i.e., k = k(θ ), that is, k = k(θ ), andc = c(θ ) where 0· ≤ θ ≤ 360· , its wind speeds in the same direction and in different directions have the same Weibull parameters. Figure 1 shows the different wind directions of the model. 6. Sufficient space is required between the two turbines to block wind turbulence. The two turbines located at position (xi , yi ) and (x j , y j ), respectively, must meet the following inequality criteria: (X i x j )2 + (yi y j )2 geq64R 2 where R represents the radius of the rotor.
Wind Farm Layout Optimization Problem …
155
Fig. 1 Wind farm with different wind directions [1]
7. Determining the border layout is an important task because WFLOP is a layout optimization problem. Our work takes into account the circular boundaries of wind farms, as shown in Fig. 1. 8. All turbines can only be placed on the farm. This can be geometrically expressed as xi 2 + yi 2 ≤ r 2 . The wake effect model reduces downstream turbines power generation. Jensen’s wake model is used in this investigation, with a continuous search space of WFLOP. Kusiak et al. [10] consider the power output model. The wake impact model reduces downstream turbine power generation. Jensen’s wake model with continuous search space of WFLOP is used in this work. The power supply model is used by Kusiak et al. Our goal is to find a way to increase the output and decrease the wake effect model using the constraints obtained based on the assumptions. The above two restrictions indicate that the two turbines must be separated by at least four rotor diameters and that all turbines can only be installed in the wind farm. Table 2 presents the notation and abbreviation presented in Sect. 2 and the rest of the paper.
156
M. Kumar and A. Sharma
Fig. 2 Wind turbine’s wake model [16]
2.1 Wake Effect Model Wind farm is designed to produce more electricity from single windy location. If the downstream turbines are located behind one or more turbines, their power generation may suffer as a result. This is referred to as wake loss. The wake effect’s primary impact is to disturb the flow of wind within the wind farm. This is a crucial consideration when building wind farm layouts. As a result, wind speed decreases from its original speed vup to a lower speed vdown . Figure 2: wake depicts the wake created by a wind turbine. In Fig. 2, vup , vdown , K , and d. The wake effect reduces the power output of wind farm turbines. According to review of the article, power loss can range from 50% to 100% depending on the direction and speed of the wind as well as or position of wind turbines. Jensen’s wake model [17] is used in this article. The Jensen’s wake model was used to calculate the (VD) using the following Equation: 1 vdown (1) Veldefi j = 1 − vup The above Eq. 1 can be written as √ 1 − 1 − Ct Veldefi j = 1 + (K d/R)2
(2)
In the case of a big wind farm, the wake of a wind turbine can affect more than one turbine. In this case, the cumulative wake effect for the ith turbine in the wake of the jth turbine can be calculated using the following Eq. 3: T Veldefi j = Veldefi2j
(3)
j=1, j=i
It is evident from the preceding Eqs. 2 and 3 that VD is a function (θ ) and (x, y). Because wind speed is a parameter of the Weibull distribution function, the scale parameter c will be impacted by the wake effect and is given by Eq. 4:
Wind Farm Layout Optimization Problem …
157
ci (θ ) = c(θ ) × (1 − Veldefi ), i = 1, 2, ....., T
(4)
The d in Eq. 2 is evaluated according to the θ using the Eq. 5: di, j = |(xi − x j ) × cos(θ ) + (yi − y j ) × sin(θ )|
(5)
The d is calculated as an absolute number, and the wind directions are integrated into the VD equation as sin(θ ) for the y coordinate distance and cos(θ ) for the x coordinate distance.
2.2 Power Model A wind turbine’s power production is typically expressed using a linear model in the form of wind speed. The predicted power for a given wind speed v is represented as follows: ⎧ ⎪ v < vcutin ⎨0, (6) f (v) = λv + η, vcutin ≤ v ≤ vrated ⎪ ⎩ Prated , vcutout > v > vrated The predicted power is represented in three parts in the preceding Eq. 6. In the first section, when the wind speed falls below the cutin wind speed vcutin , the turbine begins to generate power and the projected power is 0. The second portion is a linear equation for wind speed between cutin and rated wind speeds. The third and final section provides the rated power Prated for wind speeds between the rated and cutout wind speeds. This work uses the Weibull distribution function, which is affected by wind speed. The power model under consideration is based on Kusiak and Song’s energy output model. The expected power of a single turbine located at (x, y) and wind direction θ is expressed as follows:
∞ f (v) pv (v, k(θ ), c(θ ))dv
E(P, θ ) = 0
∞ = 0
(7)
k(θ ) v k(θ)−1 ( ) f (v) exp −(v/c(θ ))k(θ) dv c(θ ) c(θ )
The goal is to improve the power output of the wind farm while taking the hypotheses 6 and 8 into account. As a result, the WFLOP simulation model can be represented as follows: T E(P)i (8) max i=1
158
M. Kumar and A. Sharma
subject to (xi − x j )2 + (yi − y j )2 ≥ 64R 2 , i = 1, 2, ......, T, i = j xi2 + yi2 ≤ r 2 In its purest terms, 7 can be expressed as
Pa,i,v
⎧ ⎪ va,i,v < vcutin ⎨0, = wa (λa,i,v + η), vcutin ≤ va,i,v ≤ vrated ⎪ ⎩ wa Prated , vcutout > va,i,v > vrated
(9)
The maximum energy of ith turbine Pi is calculated using Eq. 10, and the total energy production of the farm is stated using Eq. 11. Pi =
Nθ Nv
Pa,i,v
(10)
Pi
(11)
a=1 v=1
Pf =
T a=1
The number of wind direction intervals is Nθ , while the number of wind speed intervals is Nv . The wind direction ranges from 0◦ to 360◦ , and the wind speed ranges from 3.5 to 14 m/s. After a few adjustments, the foregoing Eqs. 9 and 10 can be stated in full as Eq. (12) Nθ +1 θ +θ k θl +θ2l+1 l l+1 v j−1 + v j (θl − θl−1 )ωl−1 e− v j−1 /ci 2 ) ( Pi = λ 2 j=1 l=1 N v +1
Nθ +1 θ +θ l l+1 + Prated −e− v j /ci 2 (θl − θl−1 )ωl−1 l=1
θ +θ k θl +θ2l+1 l l+1 e− vrated /ci 2 Nθ +1 θ +θ k θl +θ2l+1 − vcutin /ci l 2l+1 (θl − θl−1 )ωl−1 e +η l=1
−e
− vrated /ci
θ +θ k l
l+1 2
θ +θ l l+1 2
(12)
Wind Farm Layout Optimization Problem …
159
Finally, the mathematical model of WFLOP can be represented as max
T
Pi
(13)
i=1
subject to (xi − x j )2 + (yi − y j )2 ≥ 64R 2 , i = 1, 2, ......, T, i = j xi2 + yi2 ≤ r 2
3 Teaching–Learning-Based Optimization (TLBO) TLBO algorithm maps the teaching behavior of teacher to students. TLBO is developed from the human physiology behavior inspired from teachers behavior to learners in a classroom. The teacher teaches different subject based on their knowledge, and learner attempt to get that information. The entire process is divided into two phases, teacher phase and learner phase. The population is accounted as a total of learners in this teaching–learning process, and different subjects are taught to learners, represent the design variables. First and foremost, the best solution from the total population is determined and treated as a teacher, the top solution has maximum fitness value. The specific operation of two phases described below. Teacher Phase: The teacher phase mimics student behavior in a variety of areas as students receive knowledge from the teacher, and the teacher is accountable for bringing all of the students close to itself based on its abilities, skills, and knowledge. The teacher wants to share his knowledge in order to improve the general mean of the class. The design factors are the number of learners N (population size, k=1,…,N) and the number of subjects M (subjects, j=1,…,M). Analyzing the difference between the teacher’s scores and the current learner’s mean helps students increase understanding. Consider the ith learner to be X i , the best learner or teacher in the class, and the rest of the students to use the following equation to get their mean close to that of the teacher [18]. Difference_Meani = ri × (Mnew − TF × Mi )
(14)
Here is a teaching factor that determines the adjusted mean value and ri is a random number between [0, 1]. The value of T F can be either 1 or 2, which is another pragmatic step decided at random with the same probability as the Eq. 15. TF = round[1 + rand(0, 1)2 − 1]
(15)
160
M. Kumar and A. Sharma
In teacher phase, the position update equation of the old solution is given by the Eq. 16. (16) X new,i = X old,i + Difference_Meani In which X old is the learner’s old position and X new is the improved value of X old is accepted only if the outcome of X new is superior than the earlier value; otherwise, it is rejected. Approved values from the teacher phase are subsequently provided to the learner phase as input. Learner Phase: This is the second step of the optimization algorithm, in which learners communicate and exchange their information and ideas. This step selects two learners at random, offers a comparing method, and produces the better of the two values. The learning approach for this phase is as follows: Choose two learners from the population ‘N’ at randomly, X p and X q , so that p = q. Equation describes the modified parameter X new . 16. Equations 14, 15, 16, are taken from [19]. if f (X p ) < f (X q ) X new = X i + randi × (X p − X q ) else
(17)
X new = X i + randi × (X q − X p ) The TLBO algorithm is represented in Fig. 3, where the flow begins with population initialization and progresses to the teacher phase, where the output of this phase is delivered as input to the second phase, which yields the best value of the objective function.
4 TLBO for WFLOP This section shows how to use TLBO to solve WFLOP. If T turbines are to be arranged in a given wind farm, then placing these T turbines in a two-dimensional wind farm indicates a possible solution in the TLBO algorithm. So for T turbines, their will be 2T decision variables and arrangement of populations (N) will be xi = (x11 , y11 , x22 , y22 , x33 , y33 , . . . , x N T , y N T ). The position of a turbine in a wind farm is represented by x and y coordinates. The complete mapping can be represented using the following steps Step 1: This is first step in the process. The algorithm works on following parameters. 1. 2. 3. 4.
Population size (N ) Number of turbines (T ) Dimension (2T ) Maximum number of generations (Maxgen)
Set the N solutions (turbine positions) in the search space (wind farm). For each solution, compute the wake loss and power output.
Wind Farm Layout Optimization Problem …
Fig. 3 TLBO algorithm flow chart
161
162
M. Kumar and A. Sharma
Step 2: Here, the position of each solution is updated as per the teacher phase. Firstly difference mean is calculated using the Eq. 14 and then, location of turbines is updated as per Eq. 16. Wake loss for the new location of turbines is calculated. If newly generated position is of lower wake loss it is accepted, otherwise, it is rejected. Again power output is computed. For each updated solution, recalculate T E(P i ). i=1 Step 3: Here, the location of turbines is updated as per the learner phase of the TLBO algorithm. Two solutions (position of turbines) are randomly selected and based upon the fitness comparison modification is done as per Eqs. 16 and 17. Again power T output is calculated for all solutions. For each updated solution, recalculate i=1 E(Pi ). Step 4: This is termination step based upon the terminated criteria step 2 and step 3 are repeated. Based upon the above discussion, the pseudo code of applied TLBO algorithm for WFLOP is depicted in Algorithm 1. This paper examined two types of wind data sets found in the literature [1]. Tables 3 and 4 show data sets I and II, respectively. Each wind data set is divided into 24 ports with numbers ranging from 0 to 23. The wind direction is spread in 24 intervals (15· intervals). The wind farm under consideration has a radius of 500 m. Algorithm 1 TLBO algorithm for WFLOP Initialize the parameters; Population Size N ; Number of turbines T ; Dimension D 2T ; Maximum Number of Generations Maxgen; Generation Index Curr ent I ndex; (learner) Initialize the: N p solutions in the search space as per the Eq. 16; T Compute the Veldefi , ci , wake loss and i=1 E(Pi ) for each solution; CurrentIndex=1; while Curr ent I ndex ≤ Maxgen do Produce new solution for teacher phase as per the Eq. 16 and evaluate the wake loss for newly generated solution; Apply the greedy selection process for the teacher phase to select the newly generated solution (position of turbines); T Update the Veldefi , ci , wake loss and i=1 E(Pi ) for each solution; Produce new solution for TLBO as per the Eq. 15 and 16. Evaluate the wake loss for newly generated solution; T Update the Veldefi j , ci , wake loss and i=1 E(Pi ) for each solution; Memorize the best solution found so far; CurrentIndex=CurrentIndex+1; end while Output the best solution found so far.
Wind Farm Layout Optimization Problem … Table 3 Wind data set (I) [1] l−1 θl−1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
0 15.0 30.0 45.0 60.0 75.0 90.0 105.0 120.0 135.0 150.0 165.0 180.0 195.0 210.0 225.0 240.0 255.0 270.0 285.0 300.0 315.0 330.0 345.0
163
θl
k
c
ωl−1
15.0 30.0 45.0 60.0 75.0 90.0 105.0 120.0 135.0 150.0 165.0 180.0 195.0 210.0 225.0 240.0 255.0 270.0 285.0 300.0 315.0 330.0 345.0 360.0
2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0
13.0 13.0 13.0 13.0 13.0 13.0 13.0 13.0 13.0 13.0 13.0 13.0 13.0 13.0 13.0 13.0 13.0 13.0 13.0 13.0 13.0 13.0 13.0 13.0
0 0.01 0.01 0.01 0.01 0.2 0.6 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0
5 Implementation and Experimental Results The following search environment is created to solve WFLOP using TLBO. 1. 2. 3. 4. 5. 6. 7.
Population size (N ) = 50, 100, Maximum number of generations = 50, Total number of runs (simulations) = 20, Rotor radius R = 38.5 m, Wind cutin speed, vcutin = 3.5 m/s, Wind rated speed vrated = 14 m/s, Rated power for wind speed, Prated = 1500 kW ,
The Weibull parameters (k = 2 and c = 13) are placed in each interval of wind direction for wind data set I. Wind blowing probability varies with wind direction. While the Weibull parameters (k = 2) are placed in each interval of wind direction for wind data set II, the Weibull distribution scale parameter (c) varies in each interval of wind direction. The pattern of wind blowing changes with wind direction. The
164
M. Kumar and A. Sharma
Table 4 Wind data set (II) [1] l−1 θl−1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
0 15.0 30.0 45.0 60.0 75.0 90.0 105.0 120.0 135.0 150.0 165.0 180.0 195.0 210.0 225.0 240.0 255.0 270.0 285.0 300.0 315.0 330.0 345.0
θl
k
c
ωl−1
15.0 30.0 45.0 60.0 75.0 90.0 105.0 120.0 135.0 150.0 165.0 180.0 195.0 210.0 225.0 240.0 255.0 270.0 285.0 300.0 315.0 330.0 345.0 360.0
2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0
7 5 5 5 5 4 5 6 7 7 8 9.5 10 8.5 8.5 6.5 4.6 2.6 8 5 6.4 5.2 4.5 3.9
0.0002 0.008 0.0227 0.0242 0.0225 0.0339 0.0423 0.029 0.0617 0.0813 0.0994 0.1394 0.1839 0.1115 0.0765 0.008 0.0051 0.0019 0.0012 0.001 0.0017 0.0031 0.0097 0.0317
number of wind turbines is directly proportionate to the amount of power generated. As a result, our applied technique determines the maximum limit of potential wind turbines and their configurations in the wind farm with the least amount of wake loss and maximum power output.
5.1 Wind Farm Radius 500 m For TLBO, the wake loss and expected power for wind data sets I and II are computed. To the best of our knowledge, TLBO has never been used to solve WFLOP. The acquired findings for wind farm radius 500 m for WFLOP utilizing our applied strategy are compared with the significant approaches listed below in the research. 1. Biogeography-based optimization (BBO) approach [1]. 2. Artificial bee colony algorithm (ABC) [20] 3. Self-sacrificing artificial bee colony algorithm (SeSABC) [20]
56182.95
70228.69
84274.42
98320.16
112365.9
126411.64 125940.82 126391.23 126357.7
4
5
6
7
8
9
10
42137.21
/0
/0
/8.89
/450.43
97869.73
/266.58
84007.84
/143.03
70085.66
/47.67
56135.28
/0
/0
98320.16
/0
84274.42
/0
70228.69
/0
56182.95
/0
42137.21
Average
/54.45
98265.71
/38.23
84236.19
/32.27
70196.42
/15.1
56167.85
/0
42137.21
/0
28091.47
Best
/0
98320.16
/0
84274.42
/0
70228.69
/0
56182.95
/0
42137.21
/0
28091.47
Average
/239.71
98280.5
/106.98
84167.4
/21.53
70207.2
/11.89
56171.1
/0
42137.21
/0
28091.47
/470.32
140457.37 Infeasible
Infeasible
/20.41
/7.25
Infeasible
/53.94
/30.99
Infeasible
/28.24
126383.4
/8.56
Infeasible
–
/867.07
Infeasible
Infeasible
/47.12
Infeasible
Infeasible
/197.56
Infeasible
Infeasible
/43.01
Infeasible
Infeasible
/471.99
/0
98320.16
/0
84274.42
/0
70228.69
/0
56182.95
/0 42128.32
Best 28091.47
/15.96
98304.2
/11.67
84254.7
/0
70228.69
/0
56182.95
/0
42137.21
/0
28091.47
Average
/0
98320.16
/0
84274.42
/0
70228.69
/0
56182.95
/0
42137.21
/0
28091.47
Best
ABC (N p =100)
/16.67
98303.49
/9.96
84262.46
/0
70228.69
/0
56182.95
/0
42137.21
/0
28091.47
Average
/0
98320.16
/0
84274.42
/0
70228.69
/0
56182.95
/0
42137.21
/0
28091.47
Best
SeSABC (N p =50)
/43.01
/9.09
/44.02
Infeasible
/34.76
Infeasible
/54.96
Infeasible
/32.96
Infeasible
/53.94
126376.88 126356.68 126378.68 126357.7
/8.28
Best
/0
/0
84274.4
/0
70228.69
/0
56182.95
/0
42137.21
/0
28091.47
/20.93
/0
112344.97 112365.9
/12.88
98307.28
/8.05
84266.37
/0
70228.69
/0
56182.95
/0
42137.21
/0
28091.47
Average
SeSABC (N p =100)
Infeasible
/0
Infeasible
/34.76
Infeasible
/0
126411.64 126376.88 126411.64
/0
112357.62 112322.89 112356.81 112321.88 112365.9
/0
98320.16
/0
84274.42
/0
70228.69
/0
56182.95
/0
42137.21
/0
28091.47
Best
ABC (N p =50)
/201.75
/13.11
98307.05
/5.23
84269.19
/0
70228.69
/0
56182.95
/0
42137.21
Average 28091.47
BBO (N p =100)
111894
/0
/0
42137.21
Best
28091.47
BBO (N p =50)
112164.19 112358.65 112334.91 112357.34 111498.83 112318.78 112168.34 112322.9
/78.34
/0
98320.16
98241.82
/51.92
/0
84274.42
84222.5
/10.89
/0
70228.69
70217.8
/0
/0
56182.95
56182.95
/0
/0
42137.21
/0
42137.21
28091.47
Average
3
Best
28091.47
28091.47
Average
28091.47
TLBO (N p =100)
2
TLBO (N p =50)
Ideal power
NT
Table 5 Anticipated power and wake loss in (kW ) in W F of radius 500 m for wind data set I
Wind Farm Layout Optimization Problem … 165
Ideal power
14631.37
21947.06
29262.75
36578.44
43894.12
51209.81
58525.5
65841.19
73156.87
NT
2
3
4
5
6
7
8
9
10
Infeasible
/12.22
Infeasible
/505.33
/1.92
65828.97
65335.96
/100.30
/0
58523.58
58425.2
/35.37
/0
51209.81
51174.44
/16.8
/0
43894.12
43877.29
/10.55
/0
36578.44
36567.89
/0
/0
29262.75
29262.75
/0
/0
21947.06
21947.06
/0
Best
14631.37
14631.37
Average
TLBO (N p =50)
Infeasible
/20.18
65821.01
/11.93
58513.7
/5.66
51204.15
/4.34
43889.78
/2.69
36585.75
/0
29262.75
/0
21947.06
/0
14631.37
Average
Infeasible
/5.34
65835.85
/1.99
58523.51
/0
51209.81
/0
43894.12
/0
36578.44
/0
29262.75
/0
21947.06
/0
14631.37
Best
TLBO (N p =100)
Infeasible
–
/2027.47
56498.03
/1390.10
50011.33
/712.42
43181.7
/294.37
36284.07
/80.37
29182.38
/31.28
21915.78
/0
14631.37
Average
Infeasible
Infeasible
/123.79
58401.71
/1.76
49819.71
/0
43894.12
/0
36578.44
/0
29262.75
/0
21947.06
/0
14631.37
Best
BBO (N p =50)
Infeasible
Infeasible
/396.92
58128.58
/216.79
51208.05
/98.53
43795.59
/71.09
36507.35
/30.21
29232.54
/0
21947.06
/0
14631.37
Average
Infeasible
Infeasible
/56.62
58468.88
/0
50993.02
/0
43894.12
/0
36578.44
/0
29262.75
/0
21947.06
/0
14631.37
Best
BBO (N p =100) Best
/0
36578.44
/0
29262.75
/0
21947.06
/0
14631.37
Infeasible
Infeasible
/505.11
58020.39
/295.09
51209.81
/196.87
Infeasible
/13.67
65827.52
/2.98
58522.52
/0
50914.72
/0
43697.255 43894.12
/69.77
36508.67
/30.01
29232.74
/0
21947.06
/0
14631.37
Average
ABC (N p =50)
Table 6 Anticipated power and wake loss in (kW ) in W F of radius 500 m for wind data set II
Infeasible
/22.92
65818.27
/15.96
58509.54
/11.23
51209.81
/7.84
43886.28
/3.67
36574.77
/0
29262.75
/0
21947.06
/0
14631.37
Average
Infeasible
/10.98
65830.21
/2.45
58523.05
/0
51198.58
/0
43894.12
/0
36578.44
/0
29262.75
/0
21947.06
/0
14631.37
Best
ABC (N p =100)
Infeasible
/27.82
65813.37
/17.17
58508.33
/11.27
51209.81
/2.45
43891.67
/2.07
36576.37
/0
29262.75
/0
21947.06
/0
14631.37
Average
Infeasible
/0
65841.19
/0
58525.5
/0
51198.54
/0
43894.12
/0
36578.44
/0
29262.75
/0
21947.06
/0
14631.37
Best
SeSABC (N p =50)
Infeasible
/13.04
65828.15
/3.99
58521.51
/.105
51209.81
/.08
43894.04
/0.76
36577.68
/0
29262.75
/0
21947.06
/0
14631.37
Average
Infeasible
/0
65838.21
/0
58525.5
/0
51209.7
/0
43894.12
/0
36578.44
/0
29262.75
/0
21947.06
/0
14631.37
Best
SeSABC (N p =100)
166 M. Kumar and A. Sharma
Wind Farm Layout Optimization Problem …
167
Fig. 4 Location´s of turbines (from 2 to 9) in wind farm (radius 500 m) for wind data set I a 2, 5, and 8 turbines, respectively, b 3, 6, and 9 turbines, respectively, c 4 and 7 turbines, respectively
Tables 5 and 6 show the wake loss and projected power for wind data sets I and II, respectively. Columns 1 and 2 of Tables 5 and 6 represent the number of turbines (NT) and the ideal projected power associated to the turbines, respectively. Tables 5 and 6 demonstrate outcomes obtained by TLBO algorithm for data set I and II, respectively, for total turbines and maximum number of feasible turbines. We compared the obtained outcomes with BBO [1], ABC [20], and self-sacrificing ABC (SeSABC) [20]. The population sizes for TLBO are 50 and 100. The given results indicate that TLBO gives the predicted power equal to the ideal anticipated power up to 9 turbines for both data sets I and II. This means that without any wake loss, we can find 9 turbines in a wind farm with a radius of 500 m for data sets I and II individually. The results of Tables 5 and 6 indicate that TLBO produces considerable results when compared to the other methodologies studied. The reported results also indicate that TLBO performs better than BBO [1], and ABC [20] though the TLBO could not surpass the outcomes of SeSABC algorithm. The results of Tables 5 and 6 reveal that no more than 9 turbines can be placed on the wind farm of radius 500 m. Figures 4 and 5 show the best locations for wind turbines in data sets I and II, respectively. Tables 7 and 8 show comparison for the total and maximum feasible turbines data for set I and data set II.
168
M. Kumar and A. Sharma
Fig. 5 Location´s of turbines (from 2 to 9) in wind farm (radius 500 m) for wind data set II a 2, 5, and 8 turbines, respectively, b 3, 6, and 9 turbines, respectively, c 4, and 7 turbines, respectively Table 7 Comparison for the total and maximum feasible turbines data for set I Algorithm Total turbines (no wake loss) Total feasible turbines EA ACO PF BBO (N = 50) BBO (N = 100) ABC (N = 50) ABC (N = 100) SeSABC (N = 50) SeSABC (N = 100) TLBO (N = 50) TLBO (N = 100)
1 3 3 6 7 7 7 9 9 9 9
6 8 8 8 8 9 9 9 9 9 9
Wind Farm Layout Optimization Problem …
169
Table 8 Comparison for the total and maximum feasible turbines data for set II Algorithm Total turbines (no wake loss) Total feasible turbines EA ACO PF BBO (N = 50) BBO (N = 100) ABC (N = 50) ABC (N = 100) SeSABC (N = 50) SeSABC (N = 100) TLBO (N = 50) TLBO (N = 100)
1 3 3 7 7 7 7 9 9 9 9
6 8 8 8 8 9 9 9 9 9 9
6 Conclusion and Future Works Teaching–learning-based optimization algorithm (TLBO) is inspired from the human physiology behavior mimicking teachers behavior to learners in the classroom. This research developed a new way to handle the problem of optimal wind turbine locations in a wind farm utilizing a population-based method using TLBO. As a result, the TLBO algorithm appears to be promising for the wind farm layout optimization problem (WFLOP). It is also determined that the TLBO algorithm provides wind turbine locations in a wind farm while minimizing the energy losses and maximizing the generated power. The research also endorses the highest number of wind turbines that can be arranged in a circular wind farm of 500 m radius. The TLBO algorithm produces location of 9 feasible turbines to a wind farm of 500 m radius for both considered data sets. TLBO performs better than ABC and BBO algorithms but when compared with recent modification of ABC, i.e., SeSABC, TLBO could not surpass the outcomes. This presents TLBO as a competitive candidate in the field of WFLOP. In the future, the TLBO and ABC may further be modified to solve WFLOP.
References 1. Bansal JC, Farswan P (2017) Wind farm layout using biogeography based optimization. Renew Energy 107:386–402 2. Mosetti GPCDB, Poloni C, Diviacco B (1994) Optimization of wind turbine positioning in large windfarms by means of a genetic algorithm. J Wind Eng Industr Aerodyn 51(1):105–116 3. Aytun Ozturk U, Norman BA (2004) Heuristic methods for wind energy conversion system positioning. Electr Power Syst Res 70(3):179–185 4. Grady SA, Hussaini MY, Abdullah MM (2005) Placement of wind turbines using genetic algorithms. Renew Energy 30(2):259–270
170
M. Kumar and A. Sharma
5. Lackner MA, Elkinton CN (2007) An analytical framework for offshore wind farm layout optimization. Wind Eng 31(1):17–31 6. Mora JC, Barón JMC, Santos JMR, Payán MB (2007) An evolutive algorithm for wind farm optimal design. Neurocomputing 70(16):2651–2658 7. Kusiak A, Song Z (2010) Design of wind farm layout for maximum wind energy capture. Renew Energy 35(3):685–694 8. Mittal A (2010) Optimization of the layout of large wind farms using a genetic algorithm. Ph.D. thesis, Case Western Reserve University 9. Ero˘glu Y, Seçkiner SU (2012) Design of wind farm layout using ant colony algorithm. Renew Energy 44:53–62 10. Ero˘glu Y, Seçkiner SU (2013) Wind farm layout optimization using particle filtering approach. Renew Energy 58:95–107 11. Chen Y, Li H, Jin K, Song Q (2013) Wind farm layout optimization using genetic algorithm with different hub height wind turbines. Energy Convers Manage 70:56–65 12. Samorani M (2013) The wind farm layout optimization problem. In: Handbook of wind power systems. Springer, pp 21–38 13. Rehman S, Ali SSA (2015) Wind farm layout design using modified particle swarm optimization algorithm. In: 2015 6th international on renewable energy congress (IREC). IEEE, pp 1–6 14. Sharma A, Sharma N, Sharma H (2022) Hermit crab shell exchange algorithm: a new metaheuristic. In: Evolutionary intelligence, pp 1–27 15. Venkata Rao R, Savsani VJ, Vakharia DP (2011) Teaching–learning-based optimization: a novel method for constrained mechanical design optimization problems. Comput-Aided Des 43(3):303–315 16. Abu-Mouti FS, El-Hawary ME (2012) Overview of artificial bee colony (ABC) algorithm and its applications. In: 2012 IEEE international on systems conference (SysCon). IEEE, pp 1–6 17. Spera DA (1985) Method for evaluating wind turbine wake effects on wind farm performance. J Solar Energy Eng 107:241 18. Rao R (2016) Review of applications of TLBO algorithm and a tutorial for beginners to solve the unconstrained and constrained optimization problems. Decis Sci Lett 5(1):1–30 19. Venkata Rao R, Savsani VJ, Vakharia DP (2012) Teaching–learning-based optimization: an optimization method for continuous non-linear large scale problems. Inf Sci 183(1):1–15 20. Sharma A, Bansal JC, Sharma N, Sharma H (2022) A new effective solution for wind farm layout optimisation problem using self-sacrificing artificial bee colony algorithm. Int J Renew Energy Technol 5(4):320–352
An Ensemble Multimodal Fusion Using Naive Bayes Approach for Haptic Identification of Objects R. Aravind Sekhar and K. G. Sreeni
Abstract Multimodal classification is gaining popularity recently in many areas including medical fields, human-computer interactions, etc. In multimodal object identification, ensemble-based techniques have been found to increase the accuracy to a great extent. This paper proposes an ensemble-based multimodal fusion using the Naive Bayes approach to improve the performance of multimodal classifiers in object identification. Here, image, acceleration and friction are used as the multimodal inputs and separate classifiers are developed for each input. Then based on a classifier’s prediction, a posterior probability measure of the expected prediction was calculated using the Naive Bayes theorem. Also, a diversity measure is determined using the class precision score of corresponding classifiers. Then, a weighted posterior probability measure was calculated such that the classifier with the maximum weighted posterior probability of prediction is chosen for the final prediction. The results showed a considerable improvement in accuracy when compared with existing techniques. Keywords Ensemble classifier · Posterior probability · Haptics
1 Introduction Multimodal feature-based surface classification techniques have been found to improve the accuracy to a great extent. In most cases, the multimodal inputs are applied to a single network which is further processed to get the final prediction. Feeding multiple inputs to a single network may result in increased network complexity. Ensemble methods have been found very much useful while dealing with high dimensional datasets [1]. Ensemble methods combine several algorithms to provide the best possible results. Hence, these techniques find wide applications in the field R. Aravind Sekhar (B) · K. G. Sreeni Department of Electronics and Communication, College of Engineering Trivandrum, Trivandrum, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 689, https://doi.org/10.1007/978-981-99-2322-9_13
171
172
R. Aravind Sekhar and K. G. Sreeni
of multimodal classifiers, where inputs from different modalities are being used for the final prediction [2, 3]. In [4, 5], the authors combined several algorithms to produce promising results with multimodal medical data. Another similar area where multimodal classifiers are used widely is in the field of haptics. Apart from the visual inputs, inputs from other modalities such as acceleration, friction and audio can provide a better understanding of the objects under consideration [6, 7]. Ensemble-based multimodal classifiers are found to reduce the network complexity without compromising accuracy. Such concepts will be very much useful if applied in the area of haptic object identification [7]. In multimodal haptics, using an ensemble model helps to combine the individual feature-based models effectively to make the final prediction. Another way of combining multiple classifiers is by using the confusion matrix [8, 9]. The confusion matrix describes the performance of a classifier on a set of test data for which the true values are known. The concept of the Naive Bayes theorem is also used to develop the ensemble classifiers as in [10, 11]. The paper proposes an ensemble-based approach using the Naive Bayes theorem to improve the performance of multimodal classifiers for object identification. The remaining of the paper is organized as follows. Section 2 presents some of the major contributions in the area. The methodology is explained in Sect. 3, and the results and discussions are presented in Sect. 4. Section 5 describes the conclusions and future work.
2 Related Works Jan et al. [12] have taken accuracy and diversity as the main factors for generating ensemble classifiers. They suggested that focussing on accuracy alone will make the ensemble classifier to suffer from diminishing returns. On the other hand, focussing on diversity alone will make it suffer in accuracy. The results showed their proposed method outperforms the other state of art ensemble classifiers when evaluated on 55 benchmark datasets from UCI and KEEL data repositories. Hasan et al. in [13] proposed a novel ensemble classifier to deal with the fitting and biaseness issues of conventional classifiers. The method was applied to multivariate medical data with missing values and was able to achieve better accuracy. Ensemble approaches have been widely used in classifying medical data [4, 5]. Wang et al. [8] proposed a probabilistic confusion matrix-based entropy for evaluating the classifiers. The results showed their method efficiently measures the classification accuracy and the class discrimination power of classifiers. They suggested that their method does not fall over the classifiers on different datasets. Peng et al. [14] suggested a probabilistic ensemble fusion model to obtain the complementary relation between two input modalities such as image and text. The results show that their method outperformed methods that use a single modality for classification. Huddar et al. [15] applied the ensemble approach in multimodal sentiment analysis and obtained an improved accuracy in sentiment prediction by using textual features and facial expressions. Nakamura et al. [16] proposed a novel model
An Ensemble Multimodal Fusion …
173
named ensemble-of-concepts model (EoCMS) in which they introduce weights to indicate the strength between modalities and various concepts. The concept corresponding to a particular modality is obtained by changing the weights that represent the modalities. Multimodal classification techniques have been used in the field of haptics for object identification [6, 16]. Li et al. in [17] used an ensembled generative adversarial network (GAN) to convert the texture attribute of the image to the corresponding tactile output and showed that the use of ensembled generative adversarial network made the training simple and could produce stable results. Abderrahmane et al. [18] used a deep learning framework for the recognition of daily life objects by a robot equipped with tactile sensors and was able to successfully recognize known as well as novel objects. Zheng et al. [6] developed a technique using a novel deep learning method based on fully convolutional network for surface material classification. They performed the experiments on TUM surface material database and obtained accurate and stable results. Hong et al. [19] proposed a maximum entropy-based ensemble classifier for author age prediction. The principle of maximum entropy is also used in the analysis of clustering, decision and spectrum [1].
3 Proposed Method This section details a Naive Bayes-based multimodal ensemble (NBME) classifier. Figure 1 shows the proposed model of the classifier. For the experiment, here image, acceleration and friction inputs are used. First individual classifiers are developed for image, acceleration and friction inputs. The classifiers are then added to the ensemble pool using a precision-based diversity measure such that the pool contains only diverse classifiers. Now using the proposed Naive Bayes approach a weighted posterior probability measure was calculated such that the prediction of the classifier with the maximum weighted posterior probability is taken as the final prediction.
3.1 Diversity Measure An important factor that needs to be taken into account when generating ensemble classifiers is the diversity between the classifiers. If the classifiers in an ensemble pool are similar, then there will be no improvement in the accuracy by combining those classifiers. On the other hand, selecting classifiers with increased accuracy and diversity will yield more accurate results than individual classifiers. In the proposed approach, a novel diversity measure has been generated using the class precision values which are obtained from the confusion matrices of the corresponding classifiers. Two classifiers are said to be diverse if their predictions are entirely different for a given test input. For each trained classifier, there corresponds to a confusion matrix from which several performance metrics can be evaluated. One such metric
174
R. Aravind Sekhar and K. G. Sreeni
Fig. 1 Model of the proposed classifier
is the individual class precision score which represents the ability of a classifier to predict the classes accurately. Class precision is computed as the ratio of true prediction to the sum of true predictions and false predictions corresponding to a class. The maximum value of precision is 1, and the minimum value is 0. So as the class precision increases it means that the ability of the classifier to predict the corresponding class also increases. Here, in the proposed method, the class precision is used as a diversity measure to select the diverse classifiers from the ensemble pool. The class precision-based diversity measure is explained in Algorithm 1. Algorithm 1 Selecting the diverse classifiers for a given test input. Input: Predicted class labels of the base classifiers for a test input. Output: Selecting only diverse classifiers from the ensemble pool. 1. Test inputs are given to all the classifiers, and their corresponding predicted class labels are noted. 2. Form n groups, denoted by X0 , X1 , ....Xi , ....Xn−1 corresponding to n classes where Xi denote the group containing the classifiers whose predicted class label is i. 3. For the set of classifiers in the group Xi select only those classifier/classifiers with maximum class precision score (which was obtained during the training phase). Repeat the process to select classifiers from every other group. 4. Selected classifiers are added to the ensemble pool such that the pool contains only diverse classifiers.
An Ensemble Multimodal Fusion …
175
3.2 Determination of Posterior Probability For every classifier, there corresponds to a confusion matrix that represents the performance of a classifier for known test data and performance metrics which are determined from its corresponding confusion matrix. Widely used metrics include precision, recall and F1-score. Precision quantifies the number of correct predictions made, while recall represents the number of correct predictions out of the total number of expected predictions. F1-score represents the combined measure of precision and recall. Here, the precision and recall metric has been used to determine the posterior probability of a classifier’s prediction using the Naive Bayes theorem. As per the Naive Bayes theorem, if A and B are two independent events then the posterior probability of event A given B denoted by P(A/B) can be determined as per the Eq. 1. P(B/A)P(A) (1) P(A/B) = P(B) where P(B/A) represents the probability of event B given A and P(A) and P(B) denote the corresponding event probabilities of A and B, respectively. Here proposed method aims to find the posterior probability of the expected prediction of a classifier given its actual prediction. Let the image, acceleration and friction classifiers be denoted by C I , C A and C F , respectively. The classifiers are trained, and their corresponding prediction probabilities are noted. Also for each trained classifier, their corresponding performance metrics such as precision and recall scores are noted. Now the determination of the posterior probability is done as per Eq. 2. P(A/E)P(E) (2) P(E/A) = P(A) Since the precision score of a class is defined as the probability of actual prediction out of the total number of expected classes, P(A/E) is given by the corresponding precision score of the predicted class. To be more specific if the predicted class label of a classifier is given, then its corresponding precision score is noted from its confusion matrix obtained during the training. Similarly, since recall is the number of correctly predicted classes out of the total number of classes, P(E) is given by the recall score of the corresponding class. Finally, P(A) denotes the single probability value of the class label predicted by the classifier for the given test input. Now with all the probabilities known we can determine the expected probability of the predicted class denoted by P(E/A) from Eq. 2. Similarly, for every classifier, the corresponding posterior probability of the expected prediction can be determined.
176
R. Aravind Sekhar and K. G. Sreeni
3.3 Weighted Posterior Probability-Based Ensemble Generation Let p denotes the set of the posterior probability of all the classifiers for the corresponding test input which is determined as per Eq. 2 such that p = {P(E/A) I
P(E/A) A
P(E/A) F }
(3)
where P(E/A) I represents the expected posterior probability of image classifier whereas P(E/A) A and P(E/A) F represent the posterior probabilities of acceleration and friction classifiers, respectively. Let w represents the set of weights to be assigned to all the classifiers such that w = {w I w A w F }
(4)
where w I , w A and w F represent the weights assigned to image, acceleration and friction classifiers, respectively. The weights are given the values of the precision score of corresponding classifiers. To be more specific for each classifier depending upon the predicted class label, the weights are assigned the values of precision score corresponding to the class label which is determined from its confusion matrix obtained during the training. Let pw denotes the set of the weighted posterior probability of all classifiers such that pw = {P(E/A) I w I
P(E/A) A w A P(E/A) F w F }
(5)
Then, the prediction of the classifier with the maximum weighted posterior probability is taken as the final prediction. To be more specific, if P f denotes the final prediction probability, then P f = max(pw ).
3.4 Dataset Used For the experiment with multimodal data, here LMT-108 multimodal surface material dataset (publically available to download from https://zeus.lmt.ei.tum.de/downloads/ texture) has been used. The multimodal dataset was collected by Strese et.al and consists of 108 different materials. In this experiment, multiple features have been used to haptically identify the objects. From the dataset, here image, acceleration and friction features are used for classification purposes. Here, the acceleration database used consists of two separate recordings, one for the movement phase and the other for the tapping phase. Movements are recorded along X, Y and Z directions. For each material, the movement database consists of separate .txt files for X, Y and Z coordinates and each file has 48,000 values sampled at 10kHZ. Each friction database of the material consists of .txt file having 48,000 values sampled at 10 kHz.
An Ensemble Multimodal Fusion …
177
Fig. 2 Train and validation accuracy (image data)
Fig. 3 Train and validation accuracy (acceleration data)
4 Results and Discussions The proposed method has been implemented using Python3.7/Jupyter Notebook. Separate deep learning models are developed for image, acceleration and friction inputs. Figures 2, 3 and 4 depict the train and validation accuracy using the image, acceleration and friction data separately. In all the cases, the training accuracy closely matches the validation accuracy indicating that the models perform well without much overfitting. Table 1 shows the performance accuracy of the individual feature-based classifiers and the proposed ensemble multimodal classifier. By using a fusion of the individual feature-based classifiers, the performance accuracy has improved a lot as is evident from Table 1. The proposed NBME technique has been compared against different multimodal classifiers, such as Naive Bayes classifier (NBC), classification tree (CT), Euclidean distance classifier using sequential forward selection (SFS) [20] and an ensemble-
178
R. Aravind Sekhar and K. G. Sreeni
Fig. 4 Train and validation accuracy (friction data)
Table 1 Comparison of the accuracy values obtained using the individual feature-based classifiers and the proposed NBME classifier on LMT haptic dataset Dataset Classifier Accuracy LMT haptic dataset
Image based Friction based Acceleration based Proposed ensemble classifier
91.30 82.10 89.52 98.60
Bold represents the best results Table 2 Comparison of the accuracy values obtained using the NBME classifier and the existing multimodal classifiers on LMT haptic dataset Naive Bayes Technique Accuracy Using SFS 75.5 Using NBME 78.3 Classification tree Using SFS 69.2 Using NBME 73.4 SVM using LS+MCFS 94.9 Using NBME 94.3 Euclidean distance classifier Using SFS 71.3 Using NBME 74.6 Bold represents the best results
based multimodal SVM classifier which uses a combination of Laplacian score (LS) and multicluster feature selection (MCFS) [7]. From Table 2, it can be seen that the proposed NBME classifier has better classification accuracy in the majority of cases. The proposed technique has the advantage that it can be used as a generalized technique to develop multimodal classifiers by combining individual feature-based classifiers.
An Ensemble Multimodal Fusion …
179
5 Conclusion and Future Work The paper proposes a multimodal ensemble classifier based on the Naive Bayes approach. The method has been tested on LMT Haptic Dataset. Here, a posterior probability of precision given the predicted class probability was calculated for each classifier and a diversity measure was applied to select only the diverse classifiers from the base classifier pool. The proposed classifier was found to be effective for generating multimodal classifiers by combining individual feature-based classifiers. In future work, the method aims to be extended to much larger multimodal datasets.
References 1. Moon H, Ahn H, Kodell RL, Baek S, Lin C-J (2007) Ensemble methods for classification of patients for personalized medicine with high dimensional data. Artif Intell Med 41(3):197–207 2. Zambelli M, Demirisy Y (2017) Online multimodal ensemble learning using self-learned sensorimotor representations. IEEE Trans Cogn Dev Syst 9 3. Hui S, Suganthan PN (2016) Ensemble and arithmetic recombination-based speciation differential evolution for multimodal optimization. IEEE Trans Cybern 46 4. Sridhar S, Sanagavarapu S (2020) Detection and prognosis evaluation of diabetic retinopathy using ensemble deep convolutional neural networks. In: International electronics symposium (IES) 5. Priya MMA, Jawhar SJ (2020) Advanced lung cancer classification approach adopting modified graph clustering and whale optimisation-based feature selection technique accompanied by a hybrid ensemble classifier. IET Image Process 14 6. Lu HZ, Ji FM, Matti S, Yigitcan O, Steinbach E (2016) Deep learning for surface material classification using haptic and visual information 18(12) 7. Liu X, Wu H, Fang S, Yi Z, Wu X (2020) Multimodal surface material classification based on ensemble learning with optimized features. In: IEEE international conference on e-health networking, application and services 8. Wang X-N, Wei J-M, Jin H, Yu G, Zhang H-W (2013) Probabilistic confusion entropy for evaluating classifiers. Open Access MDPI Entropy 9. Whenty A, Tassadaq H, Jia-Ching W, Chi-Tei W, Shih-Hau F, Tsao Y (2021) Ensemble and multimodal learning for pathological voice classification. IEEE Sens Lett 10. Guo W, Wang J, Wang Shiping (2019) Deep multimodal representation learning: a survey. IEEE Access 7:63373–63394 11. Wang D, Ohnishi K, Xu W (2019) Multimodal haptic display for virtual reality: a survey. IEEE Trans Industr Electron 67:610–623 12. Jan MZ, Verma B (2019) A novel diversity measure and classifier selection approach for generating ensemble classifiers. IEEE Access 7 13. Hasan MR, Gholamhosseini H, Sarkar NI (2017) A new ensemble classifier for multivariate medical data. In: International telecommunication networks and applications conference 14. Peng Y, Wang DZ, Patwa I, Gong D, Fang CV (2015) Probabilistic ensemble fusion for multimodal word sense disambiguation. In: IEEE international symposium on multimedia (ISM) 15. Huddar MG, Sannakki SS, Rajpurohit VS (2018) An ensemble approach to utterance level multimodal sentiment analysis. In: International conference on computational techniques, electronics and mechanical systems (CTEMS) 16. Nakamura T, Nagai (2018) Ensemble-of-concept models for unsupervised formation of multiple categories. IEEE Trans Cogn Dev Syst 10
180
R. Aravind Sekhar and K. G. Sreeni
17. Li X, Liu H, Zhou J, Sun F (2012) Learning cross-modal visual-tactile representation using ensembled generative adversarial networks. Signal Process Mag IEEE 29(6):82–97 18. Abderrahmane Z, Ganesh G, Crosnier A, Cherubini A (2020) A deep learning framework for tactile recognition of known as well as novel objects. IEEE Trans Industr Inform 19. Hong J, Mattmann CA, Ramirez P (2017) Ensemble maximum entropy classification and linear regression for author age prediction. In: IEEE international conference on information reuse and integration (IRI) 20. Strese M, Schuwerk C, Iepure A, Steinbach E (2017) Multimodal feature-based surface material classification. IEEE Trans Haptics 10
Chaotic Maps and DCT-based Image Steganography-cum-encryption Hybrid Approach Butta Singh , Manjit Singh, and Himali Sarangal
Abstract Internet technologies make multimedia data communication easy and affordable; meanwhile, securing the secret and private information over the open network also becomes a major threat and challenge. Steganography and encryption are two approaches for securing secret information. Encryption converts the secret data into a scribbled form whereas steganography is a data embedding approach to secure the secret information. In this study, a 2D discrete cosine transform (DCT)based image steganography-then-encryption hybrid method using chaotic map for hiding secret bits chaotically in least significant bits (n-LSBs) has been proposed. The proposed image steganography-then-encryption approach outperforms the existing techniques in terms of security, mean square error (MSE) and peak signal-to-noise ratio (PSNR). The key space of more than 10168 and correlation coefficient (>0.9) of cover with stego and with encrypted-stego image at different levels for different images make the proposed algorithm resistant to brute force attack. Keywords Image steganography · Encryption · Hybrid approach · DCT · Chaotic maps
1 Introduction In order to secure confidential data during transmission over an open network, several information encryption and information hiding approaches have been proposed recently [1, 2]. The encryption approaches scramble the confidential data through confusion and diffusion methodologies to make it an unintelligent and undetectable message to eavesdroppers. Encryption, also known as cryptography, has been employed for a reliable and secure transmission [3, 4]. Steganography is another approach to embed confidential data imperceptibly into a cover media and restricting access of any unauthorized eavesdroppers [2, 5]. Although both steganography and B. Singh (B) · M. Singh · H. Sarangal Department of Engineering and Technology, Guru Nanak Dev University, Regional Campus Jalandhar, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 689, https://doi.org/10.1007/978-981-99-2322-9_14
181
182
B. Singh et al.
encryption approaches are employed to secure the confidential and private data over unsecured transmission channels, they lie on different concepts. The steganography aims to conceal the data in the communication channel (cover media) to form a stego-signal and protection of confidential information. On the other hand, encryption protects the integrity of confidential information with or without concealing the existence of communication from an eavesdropper. The performance of data security methods can be additionally upgraded with hybrid approaches by joining features of steganography and encryption in a single technique only. The main objective of this work is to propose a joint hybrid approach for image steganography and encryption. In this study, a 2D discrete cosine transform (DCT)-based image steganographythen-encryption method using chaotic map for hiding secret bits chaotically in least significant bits (n-LSBs) has been outlined. DCT is an efficient technique which is used for image steganography but there are many enhancements possible to improve the data security [6–8]. Chaotic maps have been explored to select the random location for data embedding. The proposed approach consists of four stages; in the initial stage, secret bits are embedded chaotically in n-LSBs of selected cover image transform. In the second stage, the embedded transform is encrypted using chaotic maps and image decomposition. These two stages comprise the activities at the transmitter. In the third stage, the received encrypted image is decrypted using chaotic maps and image decomposition. In the last, secret data is retrieved from the decrypted image.
2 Preliminaries 2.1 Discrete Cosine Transform DCT is an orthogonal transform and is expressed as a sequence of finite data points in terms of sum of cosine functions oscillating at different frequencies [5, 6]. DCT possesses the high energy compaction property, which implies it packs most of the information into fewest coefficients. The computation of a 2D (N by M image) DCT is expressed by the following equation. C(u, v) = α(u)α(v) N −1 M−1
c(x, y)
x=0 y=0
π (2y + 1)v π (2x + 1)u cos cos 2N 2M
where α(u) =
√1
N
2 N
u=0 1≤u ≤ N −1
(1)
Chaotic Maps and DCT-based Image Steganography-cum-encryption …
183
Here, c(x, y) is the intensity of the pixel in row x and column y; C(u, v) is the DCT coefficient in row u and column v of the DCT matrix.
2.2 Chaotic Maps The chaotic system is sensitive to initial conditions, generates apparently random along with completely deterministic behavior at the same time as [9–11]. Sensitive dependence results in unpredictability of orbits, even though the dynamics is deterministic. Same set of values can be regenerated for the same mapping function and an initial value. Logistic map: The logistic map has a quadratic linearity and is represented by the equation [5]. LMF: χn+1 = μχn (1 − χn ) n = 0, 1, 2, . . .
(2)
Here, 0 < χn ≤ 1 is the state of the system and 0 < μ ≤ 4 is a parameter of the system which measures the strength of the nonlinearity. The logistic map shows a chaotic property in the interval (3.57, 4) beyond which the property does not hold. Sine map: The sine map has a close similarity to the logistic map, and it can be represented by the following equation [5]. SMF: χn+1 =
λsin(π χn ) β
(3)
λ in Eq. (3) is the control parameter, and χn lies in the range from (0, 1). A division with another control parameter β is required to make the range of control of the sine map the same as the logistic map. Typical value of β = 4 was used in this approach.
2.3 Exclusion of Low Frequency Coefficients High-frequency components are effective places to hide the secret data. So, low frequency column coefficients are excluded block-wise from obtaining DCT at different levels. In the proposed approach, for level 1, level 2 and level 3 excursions 12.5, 25 and 37.5% DCT coefficients are excluded, respectively.
184
B. Singh et al.
3 Proposed Method To improve the robustness of image steganography systems such as high-quality and security we present a novel scheme that makes use of DCT and chaotic maps to embed the secret message in the n-LSBs of randomly selected DCT coefficients. The chaotic map-based image steganography and encryption system comprises stages: (i) Secret message embedding (ii) Image encryption, (iii) Image decryption and (iv) Message retrieval at the receiver end. The overall methodology at the transmitter end is depicted in Fig. 1 with the primary stages of block DCT, random location identifier, embedding and encryption.
3.1 Secret Message Embedding Process The secret message embedding process is implemented at the transmitter end involves the following steps: Step 1: Load Image (I) and secret message (SM). SM is converted to secret bits (SB) to be embedded. Step 2: Block DCT: Apply 2D DCT on each block (8 × 8) laboring from top to bottom, left to right to achieve the Markovian condition as adjacent pixels are highly correlated. DCT efforts to de-correlate the image data. First N columns are excluded block-wise from obtaining DCT in level N to exploit energy compaction property of DCT. Thus, T includes only the middle and high-frequency coefficients of D. Step 3: Identification of random locations for data embedding: (i) Logistic map matrix: Input {μ R , χ R , μC , χC } ⇒ K S as the keys for embedding. Using χ R0 , χC0 as initial parameters and μ R , μC as control parameters, two logistic map matrices in each level: LMR of length 512 in all levels and LMC of length 448, 384 and 320 in level 1, level 2 and level 3, respectively, are generated using Eq. (2). (ii) Sorting-based quantization: The pseudo random logistic map matrix generated by chaotic maps lies in the range of (0, 1). Therefore, this range is required to be scaled at the integer levels for confidential data hiding. In this proposed approach, a sorting-based quantization process is applied to quantize and map the chaotic values into the integer range based on the element locations. The sorting of the chaotic values of LMR and LMC is performed through position permutation in ascending order and obtaining the corresponding original location in the sequences LMR and LMC, respectively. Quantize: SLMR = Sort ascending (LMR)
Chaotic Maps and DCT-based Image Steganography-cum-encryption …
Fig. 1 Proposed steganography-cum-encryption methodology
185
186
B. Singh et al.
R = obtain previous positional index of SLMR Quantize: SLMC = Sort ascending (LMC) R = obtain previous positional index of SLMC (iii) Random location identification: Using R and C obtained in the previous stage, random location is identified in T as T (R, C). These are the locations where SB obtained from SM will be embedded. Step 4: LSB embedding: The value at random location T (R, C) is converted to its binary equivalent. The n-LSBs of binary equivalent of T (R, C) are replaced sequentially with n-bits from SB until all bits are embedded. Low frequency columns are then added block-wise to the embedded high-frequency DCT to obtain F. Step 5: Block IDCT: F is split into non-overlap 8 × 8 blocks, then 2D IDCT has been applied to each block laboring from top to bottom, left to right. Here Stego Image is the 2D block IDCT of the embedded coefficients F.
3.2 Encryption Process Step 1: Identification of random locations R1, C1 for encryption. (i) Logistic map matrix: Input {μ R1 , χ R1 , μC1 , χC1 } ⇒ K e1 as the keys for encryption. Using initial parameters and control parameters, two logistic map matrices LMR of length 512 and LMC of length 64 are generated using Eq. (2). (ii) Sorting-based quantization: The pseudo random logistic map matrix generated by chaotic maps lies in the range of (0, 1). Therefore, sorting of the chaotic sequence LMR and LMC is done in a similar way as described earlier to obtain random integers R1 and C1. These locations are constrained in such a way that they only specify locations which belong to the first 64 columns of F. (iii) Random location identification: Using R1 and C1 obtained in the previous stage, random location is identified in F as F(R1, C1). These values belong to first 64 columns of F. Step 2: Identification of random locations R2, C2 for encryption: Step 1 is repeated for another set of encryption keys {μ R2 , χ R2 , μC2 , χC2 } ⇒ K e2 to generate R2 and C2. . These locations are constrained in such a way (i.e., C2 = C2 + 448) that they only specify locations which belong to the last 64 columns of F. Step 3: Scramble column blocks: DCT coefficients at random locations obtained in step 1 and step 2 are swapped mutually. The whole process is iterated three more times with every time adding 64 to C1 matrix ( f = f + 64) and subtracting 64 from C2 matrix (b = b − 64) and swapping everytime.
Chaotic Maps and DCT-based Image Steganography-cum-encryption …
187
Step 4: Block IDCT: F is split into non-overlap 8 × 8 blocks, then 2D IDCT has been applied to each block laboring from top to bottom, left to right. Here, E is the 2D block IDCT of scrambled coefficients of F. Step 5: Encrypted image (E) with secret message is saved which can be transmitted through a public channel to the receiver.
3.3 Decryption Process The reverse process is performed at the receiver end to obtain decrypted image and embedded data.
4 Results and Discussion The proposed algorithm is implemented at three levels for data embedding in the cover image. For example, we took a test image (Lena.bmp). Here, Lena.bmp is a grayscale image. Entire experimental work is performed in MATLAB (2018b), more exclusively MATLAB image processing toolbox. For the visual performance evaluation on cover image, steganography and encryption performed using stego key (K S ), encryption key (K e ) key and secret message as mentioned. K s = {μ R , χ R , μC , χC } K s = {3.678912345, 0.1221, 3.789123456, 0.9889} Secret Message: {Two Driven Jocks Help Fax My Big Quiz. Crazy Fredrick Bought Many Very Exquisite} K e = {μ R1 , χ R1 , μC1 , χC1 , μ R2 , χ R2 , μC2 , χC2 } K e = {3.891234567, 0.1331, 3.912345678, 0.9779, 3.654321987, 0.1441, 3.765432198, 0.9669} Figure 2 shows the cover image, stego, encrypted and decrypted image with 1-bit LSB. DCT coefficients at random locations are scrambled and the process is iterated three more times with every time adding 64 to C1 matrix and subtracting 64 from C2 matrix and scrambling again. After scrambling of DCT coefficients using K e , encrypted image is obtained to verify the performance of the proposed method. In order to examine the performance of a steganographic technique, evaluation parameters for steganographic systems are needed. Mean square error (MSE) is a parameter used to quantify distinction amongst cover and stego image/stegoencrypted image. MSE is explained as the measure of degree of difference or it
188
B. Singh et al.
Fig. 2 Image and histogram of a cover image b stego image c encrypted image d decrypted image
Chaotic Maps and DCT-based Image Steganography-cum-encryption …
189
is the extent of error or distortion between two signals [1, 5]. Peak signal-to-noise ratio (PSNR) is the ratio between the maximum possible power of pixel value and the power of noise [1, 5]. A high-quality image entails a large PSNR value and therefore both cover image and stego image are very similar and quite undistinguishable. Table 1 shows MSE and PSNR analysis for cover image with stego image when secret bits are embedded at 4-LSB into the DCT of cover image at different levels of frequency coefficients’ exclusion for different images.
4.1 Correlation Analysis The effect of embedding is related to the correlation of adjacent pixels. A best stego image will always have a high correlation value. Correlation is the determination of strength. A correlation value greater than 0.9 is usually robust correlation, whereas a correlation value less than 0.5 is delineated as weak. Table 2 shows the correlation coefficients of cover with stego and with encrypted-stego image.
4.2 Key Analysis For a robust system, beside a sufficiently large key space to withstand the brute force attack, the key must be very sensitive to minute changes in its security keys. In the proposed steganography-cum-encryption process the control parameters as well as initial parameters of the logistic map function form the key for secret message embedding and encryption. The security key (K ) is the collection of K S and K e . These are set to the precision of 28 decimal places. So, the total key space resulting from this precision is more than10168 making the proposed method resistant to brute force attack.
4.3 Performance Comparison with Existing Approaches This comparison of the proposed method with other state-of-the-art approaches is presented in Table 3. Proposed method outperforms in terms of PSNR and key sensitivity. To ensure coherence between compared data, we compare the obtained results of 512 × 512 Lena.bmp grayscale cover images with 8 × 8 DCT blocks (Level 1, 4-LSBs) for [9, 11] and [12] methods.
190
B. Singh et al.
Table 1 MSE and PSNR analysis for cover image with stego image, 4-LSBs Exclusion level
Image
Level 1
Boats
Bridge
Cameraman
Couple
Lena
Light house
Level 2
Boats
Bridge
Cameraman
Couple
Number of bits embedded 320
MSE
PSNR
0.0277
63.7092
640
0.0384
62.2842
1280
0.0990
58.1748
1760
0.1636
55.9940
320
0.0709
59.6229
640
0.1209
57.3063
1280
0.2592
53.9944
1760
0.4314
51.7824
320
0.0305
63.2842
640
0.0541
60.7994
1280
0.1341
56.8556
1760
0.1656
55.9403
320
0.0389
62.2328
640
0.0678
59.8174
1280
0.1506
56.3520
1760
0.2800
53.6591
320
0.0254
64.0789
640
0.0696
59.7050
1280
0.1216
57.2798
1760
0.1383
56.7235
320
0.0408
62.0281
640
0.0820
58.9938
1280
0.2107
54.8941
1760
0.3163
53.1304
320
0.0085
68.8234
640
0.0249
64.1757
1280
0.0531
60.8779
320
0.0335
62.8773
640
0.0819
58.9963
1280
0.2626
53.9375
320
0.0081
69.0304
640
0.0524
60.9411
1280
0.0863
58.771
320
0.0070
69.6869
640
0.0838
58.9009
1280
0.2191
54.7242 (continued)
Chaotic Maps and DCT-based Image Steganography-cum-encryption …
191
Table 1 (continued) Exclusion level
Image Lena
Light house
Level 3
Boats
Bridge
Cameraman
Couple
Lena
Light house
Number of bits embedded 320
MSE
PSNR
0.0173
65.7552
640
0.0320
63.0755
1280
0.0682
59.7954
320
0.0421
61.8842
640
0.1355
56.8120
1280
0.2128
54.8512
320
0.0078
69.2327
640
0.0192
65.2902
1280
0.0356
62.6132
320
0.0258
64.0110
640
0.0544
60.7765
1280
0.1700
55.8255
320
0.0075
69.4025
640
0.0165
65.9604
1280
0.0347
62.7253
320
0.0209
64.9363
640
0.0299
63.3764
1280
0.0880
58.6876
320
0.0313
63.1749
640
0.0458
61.5266
1280
0.0706
59.6450
320
0.0214
64.8266
640
0.0641
60.0647
1280
0.1233
57.2226
Table 2 Correlation analysis for cover image with stego image embedding 320 bits and with encrypted-stego image Image
Boats
Correlation: Stego and cover image Level 1 Exclusion
Level 2 Exclusion
Level 3 Exclusion
Correlation: Encrypted and cover
0.9999
1.0000
1.0000
− 0.0085
Bridge
0.9999
0.9999
0.9999
− 0.0073
Cameraman
0.9999
1.0000
1.0000
− 0.0596
Couple
0.9999
0.9998
1.0000
− 0.0141
Lena
0.9998
1.0000
1.0000
− 0.0119
Light house
0.9996
0.9999
0.9999
0.0041
192
B. Singh et al.
Table 3 Performance comparison of proposed approach Parameters
Hussain [11]
Ghebleh [9]
Shehab [12]
Proposed approach
PSNR
53.24
56.453
46.44
56.7235
MSE
0.3084
0.1472
1.4759
0.1383
Embedding capacity
0.0071
0.4425
0.75
0.00684
Key space
22193
2384
10128
10168 ~ 2558
5 Conclusion An approach for image steganography-then-encryption is proposed with random location identification for embedding and scrambling using logistic maps. The initial and control parameters at the input of every random location identifier acts as a sturdy password to decrypt and retrieve the secret message. The proposed algorithm for image steganography-then-encryption outperforms the other image steganography only or image encryption only techniques. This is a highly secure and robust technique and applied this approach on various images at different levels of exclusion.
References 1. Kadhim IJ, Premaratne P, Vial PJ, Halloran B (2019) Comprehensive survey of image steganography: techniques, evaluations, and trends in future research. Neurocomputing 335:299–326 2. Hussain M et al (2018) Image steganography in spatial domain: a survey. Sig Process Image Commun 65:46–66 3. Sharma VK, Mathur P, Srivastava DK (2019) Highly secure DWT steganography scheme for encrypted data hiding. In: Information and communication technology for intelligent systems. Smart innovation, systems and technologies, vol 106. Springer, Singapore, pp 665–673 4. Lyle M, Sarosh P, Parah SA (2022) Adaptive image encryption based on twin chaotic maps. Multimedia Tools Appl 81:8179–8198 5. Kaur R, Singh B (2021) A hybrid algorithm for robust image steganography. Multidimension Syst Signal Process 32:1–23 6. Rabie T, Kamel I (2017) High-capacity steganography: a global-adaptive-region discrete cosine transform approach. Multimedia Tools Appl 76:6473–6493 7. Xiang T et al (2015) Outsourcing chaotic selective image encryption to the cloud with steganography. Digit Signal Process 43:28–37 8. Yuen C-H, Wong K-W (2011) A chaos-based joint image compression and encryption scheme using DCT and SHA-1. Appl Soft Comput 11(8):5092–5098 9. Ghebleh M, Kanso A (2014) A robust chaotic algorithm for digital image steganography. Commun Nonlinear Sci Numer Simul 19(6):1898–1907
Chaotic Maps and DCT-based Image Steganography-cum-encryption …
193
10. Khade PN, Narnaware M (2012) 3D Chaotic functions for image encryption. IJCSI Int J Comput Sci 9(3):323–328 11. Hussain MA, Bora P (2018) A highly secure digital image steganography technique using chaotic logistic map and support image. In: 2018 IEEE International conference on information communication and signal processing (ICICSP). IEEE, pp 69–73 12. Shehab JN, Abdulkadhim HA (2018) Image steganography based on least significant bit (LSB) and 4-dimensional Lu and Liu chaotic system. In: 2018 International conference on advanced science and engineering (ICOASE). IEEE, pp 274–279
An Empirical Analysis and Challenging Era of Blockchain in Green Society G. A. Senthil , R. Prabha , B. Divya , and A. Sathya
Abstract Blockchain is one of the new inventions that can help develop sustainable and sustainable solutions as it is able to provide accountability, transparency, online tracking and robustness, and improve the efficiency of global collaboration. A massive increase in electricity demand has created a new dilemma. Power businesses, startups, commercial banks, policymakers, and researchers are all interested in blockchain technology, which has become the most generally acknowledged IoT Technology. Therefore, blockchain technology efforts from both industry and education have been included in the study. Recently, the blockchain has been employed as a new approach for developing and enhancing green societies. A full examination of all the constraints and potential of the blockchain through technological ideas is required. In this article, tremendous attempts are underway to thoroughly investigate the complete foundation of the green community and how the blockchain contributes to the green community. Finally, this article also highlights the applications and challenges of the green chain-driven community. Keywords Blockchain · IoT · Bitcoin · Green supply chain management · Green chain-driven community · Green society · BitHop G. A. Senthil (B) Department of Information Technology, Agni College of Technology, Chennai, India e-mail: [email protected] R. Prabha Department of Electronics and Communication Engineering, Sri Sai Ram Institute of Technology, Chennai, India e-mail: [email protected] B. Divya Department of Computer Science and Engineering, Sri Sai Ram Institute of Technology, Chennai, India e-mail: [email protected] A. Sathya Department of Artificial Intelligence and Data Science, Sri Sai Ram Institute of Technology, Chennai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 689, https://doi.org/10.1007/978-981-99-2322-9_15
195
196
G. A. Senthil et al.
1 Introduction On the blockchain, a digital technical platform, the use of cryptocurrencies is wellknown. Hundreds of crypto currencies (such as Ethereum, Ripple, NEO, and Bitcoin) as well as other unique applications are shown locally in domains. Understanding blockchain technology is essential for exploring the technology’s potentially wideranging and disruptive social, economic, and global implications, Sharma et al. [1]. Blockchain is a digital data platform that combines a growing log of activities and a logbook into one platform. Actions are organized into blocks, each of which has a timestamp and is connected to the blocks before it to create a series of descriptive information sequences Ferrag et al. [2]. Blockchains are used in digital networks. In those systems, data transmission is akin to copying data between sites, such as when transferring digital currency from a wallet to a single user in the field of cryptocurrencies. A popular solution is to utilize a centralized firm, such as a major bank, that serves as a trustworthy intermediary between trading groups and is responsible for maintaining, protecting, and restoring the ledger’s lawful status. If only a few sides are required to write. Similarly, the central authority can recreate competing controls and update the handbook accordingly Rane and Thakker [3]. Internal administration isn’t always possible or desirable, but it adds a little amount of money and allows network users to entrust the device’s operation to a third party. A single point of failure also has a significant impact on central networks, making both technological failures and malevolent attacks easier. The blockchain has been developed to recommence the production, delivery, and sale of energy, the most famous and mysterious technology of the Fourth Industrial Revolution. The use of blockchain to digitize and create renewable democratic power has the potential to promote global results throughout areas ranging from grid management to fundraising Liu et al. [2019]. Figure 1 green society with blockchain statistical analysis. Pilot initiatives have been launched by services, global marketplaces are expanding across the country, and the initial situation is rapidly improving, providing fertile ground for firms to shift their business models. In Southeast Asia, strong economic growth and development drive the growing demand for electricity, needed to create a carbon-rich future for the region Patil et al. [4]; Aravindhan et al. [5, 6]. Following this extensive introduction to blockchain and its benefits to society, Sect. 2 provides the background, Sect. 3 evaluates startups for green society, and Sect. 4 concludes with research directions and a conclusion [1–18]
An Empirical Analysis and Challenging Era of Blockchain in Green Society
197
Fig. 1 Green society with blockchain
2 Intensive Background Blockchain for Green Society 2.1 Blockchain Services to Society Major changes in production and consumption are required to address the influence of well-being, growth, and industry. Nature, too, must adapt to its surroundings. It must be transformed from a self-centered to a biological system that balances human needs with those of the planet. From quantum computations and artificial intelligence to DNA sequences and delicate robots, the new invention will revolutionize our dedication to development. Among all the new advancements, including the Fourth Industrial Revolution, the blockchain is probably the most interesting. Blockchain is an open-source, digital, and error-free information database. That means you can scan the tuna packaging with your smartphone to find out when the fish were caught, as well as what vessel and fishing method they used. Consumers may be confident that the tuna they are purchasing is free of slaves who have been legally taken over Ranjani and Sangeetha [15]; Jiang et al. [11]. Figure 2 aside from compensated shift management, blockchain has the potential to break down many hurdles to sustainability and transformation in a variety of industries. Coin recycles and Plastic Banks are two examples of blockchain-based programs that award cryptographic tokens while also keeping track of quantities, expenses, and effects. Such measures will encourage the development of a circular economy, as well as share expansion and replication. People can invest directly in renewable energy through emerging platforms like ElectricChain. Blockchain can
198
G. A. Senthil et al.
Fig. 2 Blockchain services to society
help to promote a clean, long-lasting, low-cost energy system that blends mediumpower and renewable energy and generates energy at the maximum efficiency possible. By monitoring carbon footprints and assisting in the determination of acceptable carbon tax rates, blockchain will stimulate the manufacture and usage of low-carbon assets. It may also make it possible for governments and enterprises to meet their duties and monitor compliance with international environmental treaties without the need for third-party audits. Wherever feasible, avoid regulation and corruption, as well as the disbursement of funds without the use of advanced financial technology.
2.2 Green Investments According to the Sustainable Digital Finance Alliance, funding for sustainable development is a critical aspect in today’s global concerns. The United Nations Environment Program and Ant Financial Services formed the coalition to investigate FinTech’s possibilities for sustainable development. Three studies were conducted to demonstrate the potential of emerging technologies in green investment, including the use of blockchain technology. Digitalization of the Internet of Things (IoT) and blockchain, along with an automated green currency authorization process, were established in 2010 by the Shenzhen Green Finance Committee. The goal is to discover more effective techniques to assure green investment. Inadequate knowledge and ineffective procedures for giving certifications are issues that can be addressed with new techniques Liu et al. [14].
An Empirical Analysis and Challenging Era of Blockchain in Green Society
199
2.3 A Global Green Deal Climate change is a worldwide issue that impacts us all, regardless of where we originate from or what we do. There is a sense of urgency, and many people are participating in the discussion. Blockchain-based companies recognize the opportunity and are working to alter it. Industry, on the other hand, is a vital player: it is expected of them to develop alternative production methods that can lessen climate change-related pollution and degradation. The majority of today’s environmental issues may be traced back to industrial development, particularly after the “rapid acceleration” of global economic expansion in the 1950s, which began to increase commercial production of all sorts of goods [1, 3, 4, 15–19]. We are beginning to recognize the importance of economic growth as the Fourth Industrial Revolution (4IR) proceeds with emerging technologies that are faster, more productive, and more broadly available. Redesign is a crucial and fast-paced phase of today’s industrial development, with no signs of slowing down technical advancements. Blockchain is a technology that creates a secure and consistent cryptographic record of any outcome, including money, items, properties, jobs, and votes [20]. It has emerged as a foundation for the Fourth Industrial Revolution in recent years, much like the preceding (or third) industry. Demand and supply in the supply chain blaid the capacity of blockchain to build a commercial sector that can also be applied to supply chains to give customers long-term tracking, reporting, and verification. Due to blockchain, it is possible to track the whereabouts of products and services. For a full description of risk management and reward profiles, information can be filtered by management level Taskinsoy [17].
2.4 Environmental Conservation Many of the issues arise as a result of inconsistencies in the law controlling commerce and ownership, as well as the way we manage natural resources and their conservation: • Do governments and other consumers value the rights of natural resources? • Is it possible to evaluate and trust the company’s claims about the reduced impact on the environment? • Can sustainable actions be successfully developed in terms of environmental factors? The capacity of blockchain to give verified and transparent records will make it easier for it to respond. By dispersing provinces and digitally integrating trustworthy measures, blockchain has the ability to stimulate broader stakeholder groups and intensify sluggish, cost-effective solutions for our present environmentally-regulated system. However, certain circumstances must be met in order to do so. In this instance, not all blockchains are public. It’s also worth noting. Unlike social networks, which
200
G. A. Senthil et al.
are not approved, “allowed” blockchains are designed to limit access to only authorized parties. This action has major ramifications since blockchains have the capacity to challenge established power structures and destabilize strong central organizations like governments and businesses Chohan [10].
3 Entrepreneurs for the Green Society 3.1 Provenance (https://www.provenance.org/) Blockchain continuously and transparently records transactions to the leader, allowing for consistent and trustworthy tracking. From point A, you can see how things are being sold or removed. It keeps track of all transactions, so it appears that blockchain time, additional expenses, and human errors are all avoided. Provenance was one of the first blockchain startups to create construction materials and a product tracking system. First, it informs customers about suppliers by identifying the historical background of items. Another luxury product industry, since in the fight against counterfeit goods, technology will win by following their origins. The Soil Association and Provence have partnered up to produce an Android fraud detection app. Customers can examine the automated certificate mark to see if the goods are genuine by doing so.
3.2 Begive Foundation (https://www.bitgivefoundation.org/) The Foundation established GiveTrack, a multi-faceted donation website, to aid in the passage and tracing of needy people around the world. Unfortunately, there is still a viable market for charity, but this attempt may jeopardize it. By enhancing donors’ faith, GiveTrack will enable people to talk about personalities and do more. Humanity-mindedness. This relationship can be crucial for donors.
3.3 Poseidon Foundation (https://poseidon.eco/) The goal of contributing to the “green” economic system by using a 360-degree approach to close the carbon emissions gap. Climate change causes and strategies to reduce greenhouse gas emissions while also providing critical services to the world’s most disadvantaged people. Poseidon helps projects by purchasing their carbon certificates; “carbon debt” is created for every tone of carbon offset emissions metrics.
An Empirical Analysis and Challenging Era of Blockchain in Green Society
201
3.4 Electron (http://www.electron.org.uk/) Electron is a UK company that uses blockchain technology to lower its energy consumption for the benefit of UK families. Due to significant developments in the energy sector, the purpose of this blockchain initiative is to “monetize through decarbonization, distribution, digitization, and democratizations”.
3.5 SolarCoin (https://solarcoin.org/) Powered by BitCoin and blockchain enterprises, which use a peer-to-peer computer network to record and verify transactions. The business provides an automated token system that allows customers to pay for their electricity. People who want to live on their rooftops in homes equipped with solar panels will profit, as will solar energy generators. This site is also available, and any solar owner who generates electricity with their PV system earns SolarCoins. You can trade SolarCoin for a solid currency by transferring it to a BitCoin wallet.
3.6 ElectricChain (http://www.electricchain.org/) The stadium enables the government and the solar sector to supply future generations with affordable and clean solar energy. This project is the SolarCoin ecosystem’s primary commercial production unit. ElectricChain unites other connected organizations operating in blockchain and digital assets to launch energy transformation using SolarCoin as a digital asset and blockchain technology. ElectricChain creates solar energy blockchain firms in collaboration with corporate partners such as Grid Singularity and Ethereum, Chain of Things and IOTA, BitSeed, MIT, NASA, and IBM.
3.7 Oxyn (https://www.oxyn.io/) The OxFoundation’s objective is to enable charitable and for-profit organizations to effectively address plastic waste by empowering open, fast, and scalable blockchain ecosystems and cryptocurrency. They donate 5% of all purchases to many proenvironmental activities for everyone through the Oxyn scheme. In transactions, the Green Wallet works as a middleman.
202
G. A. Senthil et al.
3.8 Solcrypto (https://www.solcrypto.com/en) Solcrypto is an essential gateway for SolarCoin awards at SolarCoin Base. Solcrypto has a large team of coders, developers, scientists, and security specialists that collaborate on such a daily basis with the SolarCoin Foundation and other associated services to enhance the SolarCoin Ecosystem. They also employ their technology and skills to construct a SolarCoin verification machine algorithm that matches solar power estimates. They seek to build a strong foundation for the exchange of critical trading data so that customers may make informed decisions before buying and selling. Finally, Solcrypto possesses no crypto properties, coins, or tokens on behalf of the customer. All details and data are managed by the SolarCoin Foundation.
3.9 Social Plastic (https://plasticbank.com/social-plastic-33/) Plastic Bank intends to transform garbage into income in order to prevent plastic from entering the ocean. They build up recycling centers for recyclable plastics, telephone service, and items like cooking oil and soap in third-world countries. Plastic Bank gathers plastics and sells them to the public for a price as public plastics. Plastic Bank has partnered with IBM to integrate the recycling system, ensuring that the number of collectors is transmitted safely. The goal is to assist underprivileged people in achieving a better future. This can be accomplished by allowing plastic to be exchanged for cash, products or blockchain digital tokens.
3.10 BitHope (https://bithope.org/) BitHope can assist you in checking the balance and history of the address you’d like to provide. The BitHope Foundation, on the other hand, has exclusive access to them. The BitHope Foundation founded the forum, which is the first Bulgarian non-governmental organization to use cryptocurrencies solely for the benefit of the community. BitHope transforms bitcoins into tangible assets for individuals, animals, and the environment. The most important aspect is that it will provide funding to charities all across the world to help them carry out their missions. They accept Bitcoin donations and promote cryptocurrency awareness, but only if they can support themselves. Our social and environmental systems have been revolutionized as a result of the blockchain’s connection to the rest of the globe. The number of people learning about blockchain and cryptocurrency is steadily increasing.
An Empirical Analysis and Challenging Era of Blockchain in Green Society
203
4 Case Study of Green Supply Chain Management Figure 3 according to the broad definition, any activity in the supply chain, whether in business or with external partners, is deemed a green supply chain management (GSCM) management practice if it helps to reduce the area’s environmental effect in some way. The GSCM’s actions are practical means for the company to implement an environmental strategy. GSCM functions also differ between operational processes like environmental management systems and constructive processes like asset disposal and environmental technology. Organizations are increasingly recognizing the importance of incorporating environmental considerations into their operations Sezen and Çankaya [16]. Figure 4 integrating local thinking into supply chain management, which includes product design, inventory and manufacturing processes, final product distribution, and product life cycle management. As a result, the major supply chain management gaps are heavily focused on climate change and its spread. Prabha et al. [19] Figure 5 the conceptual framework of green supply chain management of the traditional approach to supply chain management focuses on a linear chain, which includes the possibility of products being transferred from immature goods to product distribution. Furthermore, by incorporating sustainable processes into the strategic center and policies of their business, organizational responses shift from reactive to proactive Yunus and Michalisin [18]; Bajaj et al. [9].
Fig. 3 Green supply chain management practices
204
G. A. Senthil et al.
Fig. 4 Impact of green supply chain management
Fig. 5 Conceptual framework of green supply chain management
5 Applications and Challenges The Internet encompasses not only telecommunications but also tangible products utilized for wireless communication, such as cars, laptops, and household goods. The Internet has revolutionized the way we live and communicate in every part of our lives, from job to social relationships. Because of a growing awareness of environmental issues around the world, green IoT technology systems should be explored. Greening IoT refers to technologies that enable the Internet of Things to gather, save, retrieve, and manage a range of data using environmentally friendly resources and storage. ICT technology is the green IoT support technology. IoT has exploded in popularity in recent years. This has had an impact on IoT device network performance and power resources. The Internet of Things (IoT) in
An Empirical Analysis and Challenging Era of Blockchain in Green Society
205
green denotes its natural and energy-saving qualities Ahmad et al. [7]. A different solution that provides energy savings with green IoT technology is proper hot air input from servers and data centers. As a result, IoT has been changed into green IoT. They should be built to conserve energy. Processors based on hardware, sensors, servers, ICs, and RFIDs can all be employed. While cloud-based software, virtualization, data centers, and other technologies are conceivable, policies can be based on intelligent metrics, power forecasts, and other factors Leonardo et al. [12].
6 Conclusion The advantages of green IoT for IoT in the long-term purchase of alternative energy sources to mitigate IoT damage. In terms of economic, environmental, and social sustainability, the numerous uses of raw IoT help save natural resources and enhance human health. The significant technological advancements of the twenty-first century have several advantages. However, there is a growing demand for more powerful technology, as well as e-waste systems and hazardous emissions. This article contributes significantly to the green community by diving into the green community’s whole history. Ultimately, this chapter examines the advantages and disadvantages of the green community sponsored by the green chain.
References 1. Sharma PK, Kumar N, Park JH (2020) Blockchain technology toward green IoT: opportunities and challenges. IEEE Network 34(4):263–269. https://doi.org/10.1109/MNET.001.1900526 2. Ferrag MA, Shu L, Yang X, Derhab A, Maglaras L (2020) Security and privacy for green IoTbased agriculture: review, blockchain solutions, and challenges. IEEE Access 8:32031–32053. https://doi.org/10.1109/ACCESS.2020.2973178 3. Rane SB, Thakker SV (2019) Green procurement process model based on blockchain–IoT integrated architecture for a sustainable business. Manage Environ Qual Int J. Emerald Publishing Limited. https://doi.org/10.1108/MEQ-06-2019-0136 4. Patil AS, Tama BA, Park Y, Rhee K-H (2018) A framework for blockchain based secure smart green house farming. In: Advances in Computer Science and Ubiquitous Computing. Springer, Singapore, pp 1162–1167. https://doi.org/10.1007/978-981-10-7605-3_185 5. Aravindhan K, Sangeetha SKB, Periyakaruppan K, Manoj E, Sivani R, Ajithkumar S (2021) Smart charging navigation for VANET based electric vehicles. In: 7th International conference on advanced computing and communication systems (ICACCS). IEEE, pp 1588–1591. https:/ /doi.org/10.1109/ICACCS51430.2021.9441842 6. Aravindhan K, Sangeetha SKB, Periyakaruppan K, Keerthana KP, Giridhar VS, Shamaladevi V (2021) Design of attendance monitoring system using RFID. In: 7th International conference on advanced computing and communication systems (ICACCS). IEEE, pp 1628–1631. https:/ /doi.org/10.1109/ICACCS51430.2021.9441704 7. Ahmad R, Asim MA, Khan SZ, Singh B (2019) Green IoT—Issues and challenges. In: Proceedings of 2nd international conference on advanced computing and software engineering (ICACSE) (SSRN). Retrieved from https://ssrn.com/abstract=3350317 (or) http://dx.doi.org/ 10.2139/ssrn.3350317
206
G. A. Senthil et al.
8. Al-Saqaf W, Seidler N (2017) Blockchain technology for social impact: opportunities and challenges ahead. J Cyber Policy 2(3):338–354. https://doi.org/10.1080/23738871.2017.140 0084 9. Bajaj PS, Bansod SV, Paul ID (2018) A review on the green supply chain management (GSCM) practices, implementation and study of different framework to get the area of research in GSCM. In: Techno-Societal. (ICATSA 2016). Springer, Cham, pp 193–199. https://doi.org/10.1007/ 978-3-319-53556-2_20 10. Chohan UW (2019) Blockchain and environmental sustainability: case of IBM’s blockchain water management. In: Notes on the 21st Century (CBRI) (SSRN). Retrieved from https://ssrn. com/abstract=3334154 (or) http://dx.doi.org/10.2139/ssrn.3334154 11. Jiang L, Xie S, Maharjan S, Zhang Y (2019) Blockchain empowered wireless power transfer for green and secure internet of things. IEEE Network 33(6):164–171. https://doi.org/10.1109/ MNET.001.1900008 12. Leonardo RR, Giungato P, Tarabella A, Tricase C (2019) Blockchain applications and sustainability issues. Amfiteatru Econ J 21(S13):861–870. https://doi.org/10.24818/EA/2019/S13/ 861 13. Liu X, Ansari N (2019) Toward green IoT: energy solutions and key challenges. IEEE Commun Mag 57(3):104–110. https://doi.org/10.1109/MCOM.2019.1800175 14. Liu J, Lv J, Dinçer H, Yüksel S, Karaku¸s H (2021) Selection of renewable energy alternatives for green blockchain investments: a hybrid IT2 -based fuzzy modelling. Arch Comput Methods Eng 28:3687–3701. https://doi.org/10.1007/s11831-020-09521-2 15. Ranjani V, Sangeetha SKB (2014) Wireless data transmission in ZigBee using indegree and throughput optimization. In: International conference on information communication and embedded systems (ICICES2014). IEEE, pp 1–5. https://doi.org/10.1109/ICICES.2014.703 3901 16. Sezen B, Çankaya SY (2018) Green supply chain management theory and practices. In: Operations and service management: concepts, methodologies, tools, and applications, edited by Information Resources Management Association, IGI Global, 2018, pp. 118–141. https://doi. org/10.4018/978-1-5225-3909-4.ch006 17. Taskinsoy J (2019) Blockchain: an unorthodox solution to reduce global warming. In: SSRN: Retrieved from https://ssrn.com/abstract=3475144 (or) http://dx.doi.org/10.2139/ssrn.3475144 18. Yunus E, Michalisin MD (2016) Sustained competitive advantage through green supply chain management practices: a natural-resource-based view approach. Int J Serv Oper Manage 25(2):135–154. https://doi.org/10.1504/IJSOM.2016.078890 19. Prabha R, Razmah M, Asha RM, Subashini V (2020) Blockchain based decentralized system to ensure the transparency of organic food. EEO 19(2):1828–1837. https://doi.org/10.17051/ ilkonline.2020.02.696766 20. Jayaraman R, Nisha ASA, Somasundaram K, Karthikeyan C, Babu DV, and Prabha R (2022) Effective COVID monitoring process to observe patients status using blockchain. In: 2022 7th International conference on communication and electronics systems (ICCES). IEEE, pp 804–811. https://doi.org/10.1109/ICCES54183.2022.9835995
3D Modeling of Automated Robot for Seeding and Transplantation of Rice and Wheat Crops G. Venkata Sai Krishna, Palanki Amitasree, P. V. Manitha, and M. Rajesh
Abstract The most important sector of Indian economy which is identified as the backbone is agriculture. The major part of agriculture lies in paddy cultivation. In earlier days, agriculture was the major sector where most people were employed but nowadays, due to industrial growth and IT revolution, the employment in agriculture has become too less. The major part of economy which is vital for India’s economic growth is now facing serious issues due to the manpower shortage, lack of advancement in technological usage. This proposes a solution to meet out this issue by bringing in automation in the agriculture domain. Agricultural processes are chains of systematic, repetitive and time-dependent tasks. It requires lot of manual labor and accurate placement and managing of seeds and saplings. It’s a case where a human error is highly possible. In this work, an attempt is made to automate the process of distributing the seeds and once they are grown big enough to get transplanted, the robotic solution will transplant these plants. The automated solution will distribute the seeds in an organized and optimized way. The robotic platform will pluck the plants and replant them in an organized way. Keywords Selective laser melting · Selective laser sintering · LOM-laminated object manufacturing · CATIA-computer aided three-dimensional interactive application · High density polymer ethylene · USB-universal serial bus
G. V. S. Krishna · P. Amitasree · P. V. Manitha (B) Department of Electrical and Electronics Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Bengaluru, India e-mail: [email protected] M. Rajesh Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Bengaluru, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 689, https://doi.org/10.1007/978-981-99-2322-9_16
207
208
G. V. S. Krishna et al.
1 Introduction Wheat and rice are most consumed food products in the sub-continent. Production of these two products has been ever increasing. India has estimated production of 105 million tons of wheat and 117 million tons of rice. When compared to developed countries the production per hectare is low in India because of lack of knowledge and proper technology. Both crops are cultivated using transplantation method where possibility of human error is more. Self-sufficiency in agriculture production is the most essential part for any developing economy. India has reached its goal of adequacy in food production through the introduction of green revolution in late 1960’s. Increasing productivity by utilizing mechanization in agriculture was one of the missions of green revolution. Farm mechanization increases the productivity of food grains from 50.82 million tons in 1951 to 283.137 million tons in 2019. Rice and wheat are the major food products grown in India to feed its largest population. Initial stage of paddy cultivation has three important processes namely preparation of field, adding manure to the field, transplanting the seedlings from nursery to field. Transplanting by hand takes much time and also requires more labor work. This makes it difficult for large scale production. The first and foremost process in paddy cultivation is the growth of paddy seedlings. It takes about two to three weeks for the seedling’s growth. They are grown in nurseries. After this, the seedlings are relocated to the farm field by the labors. This process requires about 240–340 man hours per hectare. Since the rapidly growing population requires fast production, it is impossible to produce the required amount of products to satisfy the growing demand by traditional or conventional methods. So, it has become highly essential to implement the technological innovations in agricultural field too. The Automated Rice and Wheat Plants Planting Bot will reduce the dependency of more labors for transplanting paddy seedlings. Also it will reduce the cost for transplanting seedling. Direct Seeded Rice (DSR) causes large weed growth which requires more herbicides or hand weeding. DSR is high costing process which requires equipment like DSR driller. This process causes iron deficiency and only suitable for heavy soil. Automated robots are being used to assist the agriculture activities like ploughing, sowing, weeding, pesticide spraying and fruit picking all over the world [1]. In the major part of the world, the widely used transplantation method is generally manual transplantation. During a manual transplantation process, there is a very high chance of human error as the placing of saplings has to be at a specified distance from each other [2]. This causes nutrition deficiency to the plants as they are placed very far or very near to each other. As a result, crop yield decreases. So in order to make the yield better, mechanical transplantation is being introduced [3]. By using Automated Mechanical Transplantation precise placement is possible. There are various types of robots under development in view of agriculture like soil analysis bot, seeding and transplantation bot, harvesting and crop scouting bots [3]. Usage of seeding and transplantation bots reduces the possibility of high human
3D Modeling of Automated Robot for Seeding and Transplantation …
209
error. The precision of seed placement is very important in seeding process. The optimistic placement of seeds during seeding process is 1–1.5 cm for wheat and 5–10 cm for rice [2]. The device used for navigation for this type of robots is with the help of ultrasonic sensor. On the field the robot operates on automated mode, but outside the field is strictly operated in manual model [4]. One of the basic systematical approach is to design and development of a mobile robotic platform for agricultural applications. This proposed robot is used for sowing seeds. It can navigate on paddy and wheat fields and sow seeds effectively. The infrared sensor and ultrasonic sensor are used to detect the obstacles for proper navigation of the robot [4]. The robot can be designed specific to tasks. Task-based robotic platform will be more commercially viable [5]. Any robotic platform developed for agricultural use will have to handle the challenges arise due to the difficulties in navigation part. Understanding the area where robots are getting deployed will benefit the design. [6] Discusses a mobile platform for robots used in agricultural purpose. [7] Summarize the various visual navigational models. A 3D printer is a tool that performs rapid prototyping using Computer Aided Design (CAD) models. The capacity of 3D printers to produce three-dimensional prints (x, y, z axis). The procedures concerned fall within the category of additive manufacturing processes. One may make the comparison between 3D printing and milling, a subtractive manufacturing process. A 3D printer adds mass to a model to create it, not subtracts from it. A 3D printer may construct a model out of a variety of materials, including plastic, polymer, metal, and their mixtures to generate hybrids, composites, or functionally graded materials. The advancements in the field of 3D printing have removed restrictions on product design [8]. By adding materials one at a time, digital fabrication technology, also known as 3D printing or additive manufacturing, produces real items from a geometric representation. A rapidly developing technology is 3D printing. In the modern world, 3D printing is widely utilized for mass customization, manufacture of any kind of open source designs, and industries like agriculture, healthcare, the automotive, locomotive and aviation. The seven categories of 3D printing technologies are listed in ASTM Standard F2792. These categories include binding jetting, directed energy deposition, material extrusion, material jetting, powder bed fusion, sheet lamination and vat photo polymerization. Today, 3D printing technology is widely employed to create a wide range of items rather than only prototypes [9]. Since the late 1970s, a wide range of 3D printing techniques and technologies have been developed. What the printers could generate was enormous and expensive. Selective laser melting (SLM), selective laser sintering (SLS) and fused deposition modeling (FDM) are some of the techniques that melt or soften the material to create the layers, whereas stereo lithography (SLA) and laminated item production use other technologies to cure liquid materials (LOM) [10]. To automate processes and minimize human error, robotics is widely applied in the fields of research, laboratory-based work, industrial work and agriculture. When moving an object from one location to another, a robotic arm is frequently suggested. Automated processes have the advantage of completing tasks more quickly and with fewer errors. Robotic arms can be used to plant seedlings at the right distance and
210
G. V. S. Krishna et al.
in the right manner in order to boost agricultural productivity and produce better outcomes. [11–13]. Certain robotic framework uses multiple robots to carry out complex tasks, but such team of robots require communication scheme which is described in [14]. Robots are used for various applications like plants health monitoring and removing infected plants [15]. Rajesh and Nagaraja [16] Discusses about a multi-robot system designed for search and rescue operation. [17] Provides a robotic solution which is used in automobile industry. Robotic solutions are even used in disaster management also and [18] Discusses about such a system developed for detection and removal of hazardous gases. [19] Describes the automated solution built for agricultural application. In this paper design and 3D modeling of an Autonomous Robot for rice and wheat seed distribution and to replant the rice and wheat plants with proper distance is presented.
2 Design of the Proposed System A basic understanding for the selection of dimension for chassis, motor and wheels along with specifications of components that are used and the design and necessary calculations of the 3D models are also done.
2.1 Block Diagram The Fig. 1 shows the block diagram for building an autonomous seeding and transplantation robot. Arduino UNO acts as an interface between the sensors and wheels for a programmed and controlled movement. A motor driver shield has been mounted for the control of motors. A range identification sensor and obstacle identification sensor have been used for the proper guidance of the automated robot. A seed dispenser has been designed and mounted on the chassis part. Fig. 1 Block diagram of seeding and transplantation of crops
3D Modeling of Automated Robot for Seeding and Transplantation …
211
Fig. 2 Chassis dimensions
2.2 Chassis A chassis is the basic structure of the robot on which required components and all the operation material can be mounted. For the mounting of different material and structures on chassis, a basic rectangular shape would be an ideal form for the chassis which makes the mounting of future work easy. A 40 cm × 50 cm rectangular shape has been used. The ideal material for the building of chassis is of High Density Poly Ethylene (HDPE), Aluminum. Other important aspect of designing a chassis is ground clearance. Ground clearance implies distance between the surface and the lower part of the chassis. In our case, we are working in a mud field with a thickness of 1.5–2 cm. So a ground clearance of 6–6.5 cm is ideal. Figure 2 shows the structure of the chassis part with the dimensions.
2.3 Motor For the design of any mobile robot, a suitable selection of motors has to be done. In the selection of a motor, the main aspects to be considered are the total weight of the system and the power required to move the whole structure. The motor torque provides the power required for the movement of the system. So a motor with specified torque has to be selected. In this project 0.35 kgcm rated torque and 180 rpm motor has been used. For the chassis part the motor that used is SCR 18–3702 that can be used in order to run the chassis part of the robot smoothly and efficiently.
2.4 Wheels and Sensors The selection of type of wheels has to be done according to the field and the diameter of the wheels has to be done with respect to the torque or RPM of the motor and also with consideration of weight of the automated robot. The specifications that are used
212
G. V. S. Krishna et al.
in this project for the wheels are of diameter 65 mm and the width 27 mm. A HCSR04 ultrasonic sensor have been used for the range identification with a minimum range of 2 cm and maximum range of 80 cm. A servo motor of 180-degree rotation is used to support the ultrasonic sensor.
2.5 Motor The main objective for this work is distribution of seeds. The seeding process for the rice and wheat crops is different. In this process, farmers spread the rice and wheat seeds on a mud field as a thin layer. To accomplish this, the seed carrier has been designed in the shape of rectangular funnel and it is mounted on the chassis. Dimensions of the lower part of the funnel are 20 cm long and 8 mm wider, and the dimensions of upper part of the funnel are 25 cm in length and 5 cm wider.
3 Design and Calculation of the 3D Models Generally, when a robot need to be developed there are two aspects of design to be considered. 1. Physical Design 2. Electrical Design Physical design deals with aspects like stability and weight distribution. When proceeding with the physical design it must be guided with the aspects such as space constraint, environmental constraints. After completing the physical design next priority is knowing mechanical and physical properties of wheels. From the information known until now in the design is now used to calculate the force required to move the robot. From the required force, calculate the amount of power required to drive the robot as well as torque of each motor. The placement of seedlings in transplantation of rice and wheat crops is 10 ∗ 10 cm to 15 ∗ 15 cm. The seedling planter plants a stack for every rotation which implies for the next rotation the robot should travel 10 cm distance. Assuming the speed for the robot to be 0.3 m/s. The mass of robotic arm and chassis combined is about 2.3 kg (M R ). The load (seeds) that can be stored is about 0.5 kg (M L ). Now considering that as the robot is moving in mud field, for the stability consider an inclination angle α = 10◦ (as inclinations are small). As the terrain is covered with water to the height of 3 cm, and the wheel diameter is taken as 10 cm (DW ). The available data helps in finding the RPM (N W ) of the wheels. N W = (60 × N R ) + (π × DW ) N W = (60 × 0.3) + (3.14 × 0.1) = 57 RPM
(1)
3D Modeling of Automated Robot for Seeding and Transplantation …
213
where N W = RPM of the wheels DW = Diameter of the wheels N R = Speed of the robot Minimum force required for the movement of robot is F = g(M R + M L ) × sin α F = 9.8(2.8) × sin 10◦ = 4.76 N
(2)
where F = Minimum Force α = Slope angle M R = Weight of robot M L = Weight of load Power required from motors is P = F × speed of robot P = 4.76 × 0.3 = 1.43 W
(3)
The power required by each motor is P 4. Torque on each wheel at minimum force required is T = 0.25 × (DW ÷ 2) × F T = 0.25 × (0.1 ÷ 2) × 4.76 = 0.059 Nm
(4)
For the calculation of gear box consider the characteristics of Scr18-3702 DC motor. The torque required is 5.9 milli N-m and 0.47 W power requirement is there. As the required speed is very-low assume gear efficiency is low. Assuming efficiency to be 40 percent. Minimum motor output PM = P ÷ (Efficiency of gear × 4) PM = 0.375 ÷ 0.4 = 0.89 W
(5)
where PM = Power required for each motor Figure 3 shows the characteristic graph of the DC motor. From characteristic graph at the motor power PM, the speed of motor is around 6700 rpm.
214
G. V. S. Krishna et al.
Fig. 3 Characteristic graph of the DC motor
The gear ratio is given by Gear ratio = N ÷ (motor speed at PM ) = 57 6700
(6)
The nearest gear available is of ratio 1 90. The new motor speed is N M = N × Gear ratio = 57 × 90 = 5130 RPM
(7)
At this point again, from the Fig. 4, by taking the characteristic graph at 5130 RPM, we get N W = 4.8 m Nm From the Eqs. (1), (2), (3), (4), (5), (6) and (7), shows the design formulas and also necessary calculations of the 3D models. Fig. 4 Characteristic graph of the DC motor at 5130 rpm
3D Modeling of Automated Robot for Seeding and Transplantation …
215
4 3D Structure Design After doing the design for hardware material needed for the project focus should be on the physical structure of the chassis, robotic arm, planter and planter holder comes forward. The design is done using Fusion 360 software. For the first part of the design the type of robotic arm for the application need to finalize. For the design of robotic arm base first select a plane to draw the sketch for the rectangular stand. After specifying the dimensions extrude the sketch using extrude function. Move to the base of the structure and begin the second sketch, draw a circle on the rectangular structure of radius 15 cm and select extrude function of 8 cm height while using join function as the type of interaction with previous sketch to get a complete base together. At the top of cylinder which is obtained by extruding the circle now draw a circle of 14 cm radius on it. Extrude the new circle to − 1 cm which creates a joining section for motor connector.
5 Implementation For the design of an automated robot, an important condition that has to be taken care is operational specifications of each hardware component and also the 3D structure implementation.
5.1 Hardware Implementation The controller of the model is developed using Arduino Uno microcontroller. Arduino Uno has multiple analog and digital pins for interfacing suitable sensors and actuators. Servo motors are used for the actuators control. Such motors generally come with a gear arrangement which allows motor to get a very high torque in small and lightweight packages. The seedling unit of the system is driven by DC motor. The motor used in the implementation is 180 RPM Dual Shaft BO Motor. The movement of the proposed system is carried out by wheels driven by SCR 18–3702 DC Motor. The DC motors specialty of speed control is useful in driving the proposed system. An L293D motor shield driver consists of 2 L293D motor driver chips, each motor driver helps in the control of 2 DC gear motors as such the driver shield can control up to 4 DC gear motors and 2 servo motors. The usage of driver shield reduces the space and decreases complexity as there is no need for separate operation of the driver chips.
216
G. V. S. Krishna et al.
The power source of the proposed system is driven by Li-ion battery which is a rechargeable battery commonly used for portable electronic and electric vehicles. For obstacle avoidance and object detection, ultrasonic sensors are used. A basic structure has been developed according to the details mentioned and discussed above. The back side of the structure is the seed carrier and distributer which helps in carrying as well as distributing the seeds across the seed bed. Ultrasonic sensor helps in the identification of borders of the farm and helps in guiding the robot, the sensor is connected the digital pins of Arduino through motor driver shield. Arduino has been powered with two Li-ion batteries through a switch.
5.2 3D Structure Implementation Robotic Arm Base Robotic arm base is a rigid structure (unmovable) as shown in Fig. 5. The base will help the arm to be mounted on a required platform. The upper part is extruded downward certain length (−20 mm here) for the purpose of mounting rotating connector. Motor Holder (Rotating Connector) It is a moving structure as seen in Fig. 6, which can be joined on top of the base. This structure supports two motors, one for rotation of arm and other for movement of the arm. The bottom of the structure is extruded to 20 mm to fit into the base extrusion. Arm Link, Motor Holder-2 Arm link which is shown in Fig. 7, is the connection between the two motor holders. The bottom side connects to first motor holder’s arch structure. A 2nd motor holder is connected to arm link and gripper. The angle of arm link movement is about Fig. 5 Robotic arm base
3D Modeling of Automated Robot for Seeding and Transplantation …
217
Fig. 6 Motor holder
Fig. 7 Arm link
0–120 degrees. The dimensions of the arm link are 30 cm slant length and 20 cm diameter at the bottom and 10 cm diameter at the top. Assembled Arm The design of the Robotic arm for the transplantation is realized by assembling separately designed structures mentioned about that is from robotic arm base to arm link and motor connector 2. This assembled arm can be used to pick up the saplings from the ground without any damage. The assembled arm 3D structure is shown in Fig. 8. Seedling Planter and Holder The seedlings collected by the arm are placed in the tray. The function of the planter design works like a typewriter. For every rotation, a stack of seedlings will be planted. The seedling holder is a sledge like structure width two poles extruded to a height of 60 cm which helps in supporting the two rotating wheels of the planter. The two
218
G. V. S. Krishna et al.
Fig. 8 Assembled arm
rotating wheels are connected to 180 RPM DC motor. For every second, the rotating wheels of a planter makes three revolutions, so the planter could match the speed of a robot to plant seedlings at a 10 cm apart from each other. The seedling planter and holder 3D model is shown in Figs.9 and 10, respectively.
Fig. 9 Seedling planter
3D Modeling of Automated Robot for Seeding and Transplantation …
219
Fig. 10 Planter holder
6 Conclusion The proposed automated solution for seed distribution and plant planting is developed. First part of the solution is the seed distribution which is implemented in hardware. The robotic platform which can move automatically by avoiding obstacles is designed. Design of 3D model for robotic arm for seedling collection is done. Design for transplanter 3D model has been developed. The design for the robot chassis and robotic arm operation is done. The project can be further developed into areas where it can analyze the crops growth quality and maintenance of the field as well as a helping tool for farmers in analyzing of the soil and prescribing the required amount of fertilizers.
References 1. Ramesh K, Prajwal KT, Roopini C, Gowda MMH, Gupta VVSNS (2020) Design and development of an agri-bot for automatic seeding and watering applications. In: 2020 2nd International conference on innovative mechanisms for industry applications (ICIMIA). IEEE, pp 686–691 2. Kaur K, Kaur P, Kaur T (2017) Problems faced by farmers in cultivation of direct seeded rice in Indian Punjab. Agric Res J 54(3):428–431 3. Majid A, Rehman A, Akram M, Zafar AW, Ahmed M (2018) Problems and prospects of mechanical rice-transplanting in Pakistan. Sci Technol Dev 22(2):39–43 4. Khadatkar A, Mathur SM, Dubey K (2020) Design, development and implementation of automatic transplanting based on embedded system for use in seedling transplanters. Chin J Mech Eng (pp 1–30). https://doi.org/10.21203/rs.3.rs-27290/v1 5. Aravind KR, Raja P, Pérez-Ruiz M (2017) Task-based agricultural mobile robots in arable farming: a review. Span J Agric Res 15(1):1–16. https://doi.org/10.5424/sjar/2017151-9573 6. Stelian-Emilian O (2019)Mobile robot platform with Arduino UNO and raspberry Pi for autonomous navigation. Procedia Manufact 32:572–577 7. Bonin-Font F, Ortiz A, Oliver G (2008) Visual navigation for mobile robots: a survey. J Intell Rob Syst 53:263–296 8. Chandrashekhar K (2016) A review on 3D printing. Int J Adv Res Electron Commun (IJARECE) 5(7). ISSN: 2278 – 909X 9. Shahrubudin N, Lee TC, Ramlan R (2019) An overview on 3D printing technology: technological, materials, and applications. Procedia Manufact 35:1286–1296
220
G. V. S. Krishna et al.
10. Gokhare VG, Raut DN, Shinde DK (2017) A review paper on 3D-printing aspects and various processes used in the 3D-printing. Int J Eng Res Technol (IJERT) 6(6):953–958. ISSN: 22780181 11. Rahman MA, Khan AH, Ahmed T, Sajjad MM (2013) Design, analysis and implementation of a robotic arm- the animator. Am J Eng Res (AJER) 2(10):298–307. e-ISSN: 2320-0847, p-ISSN: 2320-0936 12. Baboria M, Kaith M (2018) Literature review on design and simulation of industrial robotic arm using CAD/CAM software. IJESC 8(3):16630–16632 13. Gautam R, Gedam A, Zade A, Mahawadiwar A (2017) Review on development of industrial robotic arm. IRJET 4(3). e-ISSN: 2395-0056, p-ISSN: 2395-0072 14. Rajesh M, Nagaraja SR (2021)An energy-efficient communication scheme for multi-robot coordination deployed for search and rescue operations. In: Communication and intelligent systems. Lecture notes in networks and systems, vol 204. Springer, Singapore, pp 187–199 15. Rahul MSP, Rajesh M (2020) Image processing based automatic plant disease detection and stem cutting robot. In: 2020 Third international conference on smart systems and inventive technology (ICSSIT). IEEE, pp 889–894 16. Rajesh M, Nagaraja SR (2019) An autonomous system of mobile robots for search in disaster affected sites. In: 2019 3rd International conference on electronics, communication and aerospace technology (ICECA). IEEE, pp 1190–1194 17. Nair A, Sreedharan RP, Rajesh M (2017) Intelligent motion control of bots using hill hold assistance mechanism. In: 2017 International conference on smart technologies for smart nation (SmartTechCon). IEEE, pp 618–621 18. Joshna V, Kashyap M, Ananya V, Manitha PV (2019) Fully autonomous robot to detect and degasify hazardous gas after flood disaster. In: 2019 2nd International conference on power and embedded drive control (ICPEDC). IEEE, pp 134–139 19. Vandana K, Supriya M, Sravya S, Manitha, PV (2022) Hybrid pump hydro-photo voltaic system for agriculture applications. In: 2022 International conference on applied artificial intelligence and computing (ICAAIC). IEEE, pp 1807–1815. https://doi.org/10.1109/ICAAIC53929.2022. 9792815
Metaheuristics based Task Offloading Framework in Fog Computing for Latency-sensitive Internet of Things Applications Priya Thomas and Deepa V. Jose
Abstract The Internet of Things (IoT) applications have tremendously increased its popularity within a short span of time due to the wide range of services it offers. In the present scenario, IoT applications rely on cloud computing platforms for data storage and task offloading. Since the IoT applications are latency-sensitive, depending on a remote cloud datacenter further increases the delay and response time. Most of the IoT applications shift from cloud to fog computing for improved performance and to lower the latency. Fog enhances the Quality of service (QoS) of the connected applications by providing low latency. Different task offloading schemes in fog computing are proposed in literature to enhance the performance of IoTfog-cloud integration. The proposed methodology focuses on constructing a metaheuristic based task offloading framework in the three-tiered IoT-fog-cloud network to enable efficient execution of latency-sensitive IoT applications. The proposed work utilizes two effective optimization algorithms such as Flamingo search algorithm (FSA) and Honey badger algorithm (HBA). Initially, the FSA algorithm is executed in an iterative manner where the objective function is optimized in every iteration. The best solutions are taken in this algorithm and fine tuning is performed using the HBA algorithm to refine the solution. The output obtained from the HBA algorithm is termed as the optimized outcome of the proposed framework. Finally, evaluations are carried out separately based on different scenarios to prove the performance efficacy of the proposed framework. The proposed framework obtains the task offloading time of 71 s and also obtains less degree of imbalance and lesser latency when compared over existing techniques. Keywords Task offloading · Internet of things · Latency-sensitive IoT · Fog computing · Quality of service · Cloud computing
P. Thomas (B) · D. V. Jose CHRIST (Deemed to Be University), Bangalore, Karnataka, India e-mail: [email protected] D. V. Jose e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 689, https://doi.org/10.1007/978-981-99-2322-9_17
221
222
P. Thomas and D. V. Jose
1 Introduction The Internet of things (IoT) has seen tremendous growth in the past few decades and it is one of the largest and most successful developments of the information technology (IT) industry. IoT is a systematic application and a strong integration of the modern IT generation [1]. Since the IoT is facilitated with Internet connection, a large amount of data is generated by each IoT device every second of a day. It has been predicted by the international data corporation (IDC) that about 55.7 billion devices will get connected by the end of 2025 among which 75% of them belong to the IoT platform [2, 3]. This type of interconnection leads to the generation of large volumes of data that requires huge computational and storage resources for effective execution. Most of the generated data are from industrial and surveillance applications [4]. Apart from these, the emergence of 5G services have resulted in a huge demand for computationintensive and latency-aware applications including online gaming, object detection, augmented reality, etc. [5]. The IoT provides facilities of network connections to different applications and it has revolutionized the appearance of cities from old fashioned to smart cities with considerable improvement in the quality of services (QoS) offered [6]. Apart from the advantages offered, it is also important to consider the maintenance of the devices and efficient execution of the applications. To deal with the data load, the centralized cloud services are available by the IoT where the data are transferred to the cloud servers and processed and then the results are provided back to the users [7, 8]. Moreover, this extension enables it to meet the QoS and quality of experience (QoE) requirements of the applications. The cloud is resource-rich and utilization of resources from the cloud improves the overall performance by reducing the energy consumption of the devices [9]. However, the cloud server is usually placed in a remote location that is far away from end devices resulting in communication latency. The problem worsens when the incoming tasks are time-sensitive [10]. Several computing technologies are introduced to address the above-mentioned issues such as cloudlet, fog computing, ad-hoc mobile cloud and multi-access edge computing (MEC) where fog computing is found to be a promising solution [11]. The fog is preferred due to the following reasons: supports real-time applications, heterogeneous in nature, supports effective scaling and decreases the load in the backhaul network as the services are provided to the edge network. Therefore, the process of task offloading to the fog nodes enables timely execution of delay-sensitive IoT applications with an advantage of prolonged battery lifetime of IoT devices [12, 13]. Apart from these benefits, fog computing enables complete utilization of under-utilized resources thereby providing better services to the proximity of user equipment (UE) [14]. Moreover, the overloading problems occurring in the cloud are eliminated through effective offloading of the tasks to the fog nodes. This reduces the computational complexity of the cloud servers and improves efficient execution of the IoT applications by meeting the QoS requirements [15]. Despite the potential benefits offered by fog computing in delivering efficient execution of IoT applications, there are certain complexities that restrict the wide deployment of fog in
Metaheuristics based Task Offloading Framework in Fog Computing …
223
the IoT environment [16]. The cooperation of fog nodes is important to complete the execution of tasks. Therefore, realizing a reliable and efficient task offloading framework in the fog layer is a big challenge [17]. It is also required to consider the storage capacities of the fog nodes involved in processing and the computing power of the nodes to complete the execution. The fog nodes involved must have strong performance to deal with the computationally intensive tasks from the IoT [18]. Since it is an active area of research, a task offloading framework for distributing the incoming workload to available fog nodes is introduced in this work to enable efficient execution of latency-sensitive IoT applications.
1.1 Motivation The cloud computing paradigm performed well in executing the IoT applications but it is less efficient to handle the requests from time-sensitive applications. This resulted in the invention of fog computing prior to the cloud layer for faster execution of the applications. Though it is advantageous, the problem identified is that the fog computing is unable to satisfy the QoS requirements of the users. There are several techniques introduced in literature to deal with this problem but most of them are unable to completely address the problem of task offloading as it is more critical to be solved in real-time. The major research gaps of task offloading are as follows: Most of the existing methodologies are solely constructed to reduce the response time whereas there are more metrics to be considered to gain performance improvement. Since the fog nodes are limited in resource and storage capacities, the resource intensive tasks without any predefined deadlines are unable to be executed. Therefore, it is important to consider some requirements of the users before offloading. Another important gap is that the balance of load onto the fog nodes is not adequately addressed in the current literature. This problem impacts the QoS requirements of the IoT applications. Therefore, a new approach is proposed in this work to jointly optimize the QoS requirements and to achieve performance gain in task offloading in every dimension. The major contributions of the work are as follows: • Proposing a hierarchical efficient task offloading scheme for the arriving latencysensitive IoT applications using fog computing based on effective metaheuristic algorithms. • Formulating a multi-objective optimization problem for the optimization of the proposed offloading problem based on network latency, energy consumption, communication cost, resource utilization, arrival rate, total load and execution time. • The identified problem is NP-hard and to effectively solve the problem, two major optimization algorithms such as flamingo search algorithm (FSA) and honey badger algorithm (HBA) are jointly proposed to reduce the overall network latency.
224
P. Thomas and D. V. Jose
• Performing extensive evaluations on the proposed metaheuristics on different scenarios to understand the performance improvement of the proposed approach over the compared approaches. The remainder of the paper is structured as follows: Sect. 2 provides a survey of the recent existing works in task offloading in fog computing, Sect. 3 elaborates on the proposed methodology with mathematical descriptions and techniques used, Sect. 4 covers the results and discussions and Sect. 5 concludes the paper.
2 Related Work There are several techniques established in literature to deal with the offloading problems of the IoT applications in the fog environment. Some of the latest and effective schemes are detailed below: The real-time IoT applications are time-sensitive and the current scenario of task offloading in the cloud computing leads to latency issues due to large response times. Also, with the increasing growth of multimedia data, the latency issues are further increased. To deal with this, fog computing is utilized for task offloading problems. To avail the advantages of fog computing in the execution of real-time IoT applications, Kishor and Chakarbarty [2] introduced an offloading scheme based on metaheuristic algorithm. The approach introduced a scheduler called smart ant colony optimization (SACO) algorithm to efficiently offload the tasks generated by IoT applications to the fog layer. The average latency of the task was taken as the objective and it was optimized using the nature inspired SACO algorithm that was formulated based on the behavior of the ant system. The experimental outcomes of the model against the traditional schedulers and metaheuristics proved the improvement of the model against the other schemes. Among the different applications of IoT, healthcare is one of the significant applications wherein enormous amounts of data are being generated to avail the services. Normally, the data generated from the IoT devices that are solely used for healthcare applications demand a surplus amount of storage, data analytics and computation operations leading to the requirement of a resourceful environment. Apart from this, the security guidelines for the healthcare data must be maintained and the data should also be made readily available to the concerned users. A solution to assure both the above constraints was put forth by Meena et al. [19]. The approach was named trust enforced computation offloading technique for applications using fog computing (TEFLON) and provided a secure way to handle the private data. The TEFLON included two major algorithms such as optimal service offloader and trust assessment to minimize the response time and to deal with the security issues. The overall experimental evaluations proved the efficacy of the model in reducing the average latency and ensuring trust for delay-sensitive trustworthy applications. The advent of cloud computing offered several benefits to the mobile devices in executing the mobile applications with increased efficiency. But the cloud-based
Metaheuristics based Task Offloading Framework in Fog Computing …
225
solution is not suitable to deal with delay-sensitive applications for task offloading as it is in remote locations. In contrast, fog computing provides a solution for task offloading but is unable to completely satisfy the users’ requirements due to the inadequate number of resources. Thus, the problem of task offloading was considered as a multi-objective optimization problem by Keshavarznejad et al. [1] and the solution was constructed to jointly optimize the power consumption and delay in task execution. Two metaheuristic algorithms were utilized such as non-dominated sorting genetic algorithm (NSGA-II) and Bees algorithm to optimize the problem. Upon evaluations, the proposed model proved to achieve effective tradeoff between the concerned objectives. To overcome the demerits of the traditional two-tiered cloud-IoT framework in executing the delay-sensitive applications, Mazumdar et al. [20] introduced a three-tiered IoT-fog-cloud framework. The introduced framework supported efficient execution of time-sensitive applications through load offloading. The load present in the overloaded fog node was shared to offer timely execution of the service using the proposed collaborative scheme. Apart from that, the scheme also ensured security using the trust-based load offloading method. The model was simulated and the outcomes proved that the proposed model outperformed the compared models with enhancement in the optimization of service response rate and latency. The IoT devices being resource-constrained, the efficiency of the IoT-fog-cloud while offloading computationally intensive tasks to the servers are required to be maintained. To achieve this, Wu et al. [21] and Zhiheng and Jianhua [22] designed a fuzzy logical offloading strategy with the aim of optimizing the robustness and agreement index. The offloading strategy was optimized using a multi-objective estimation of distribution algorithm (EDA) from different applications. That algorithm transformed the applications into independent clusters where every cluster was allocated to an appropriate tier for further processing. Therefore, the scheduling decisions were taken for the reduced search space. The overall simulations of the model proved efficiency of the proposed model over the existing models on real-time cases.
3 Proposed Methodology The proposed model aims to offload the tasks to the fog nodes to maintain proper execution of the IoT applications. The three-layered architecture used in the proposed model is displayed in Fig. 1. The first layer is the IoT layer where diverse multimedia data are collected from numerous sources. These data are then transferred to the fog layer that consists of numerous fog nodes for processing these applications. Initially, the system model is defined with different types of IoT devices where each device is capable of generating some resource requests. All the devices in the network are allowed to transmit the data to the fog nodes through a wireless channel. Thus, the IoT applications are assumed to arrive in a time-constrained manner to the fog layer. In the proposed system model, the offloading task is initiated when the total request rate exceeds the maximum acceptable rate of the fog node.
226
P. Thomas and D. V. Jose
Cloud layer
Fog
Fog
Fog
Multi-objective model Optimize the task offloading
Objective function
FSA
Sensor devices (IoT layer)
HBA
Fig. 1 Architecture of proposed fog structure for optimal task offloading
After defining the system model, the task offloading problem is formulated with the objectives to be optimized. The major parameters that are considered in the problem formulation phase includes network latency, energy consumption, resource utilization, arrival rate, total load and execution time. To solve the defined objective function, a metaheuristic based approach is used.
3.1 System Model This section discusses an effective task offloading model in fog computing architecture. The major aim is to reduce latency and task response times using FSA-HBA. By using FSA, the best solution is obtained and these solutions are fine-tuned by HBA for further refining the solution. IoT-fog-cloud Architecture The IoT-fog-cloud structure has three layers such as cloud layer, fog layer and IoT layer. IoT layer obtains several data from several applications like smart home, car and smart mobiles, etc. Every fog node has a local agent which is used to obtain the performance data like arrival and service data of sensors. The task of offloading requests is provided to fog nodes using sensors and the fog nodes send the requests to the leader fog node. This fog node is used to carry out offloading tasks by gathering data from fog agents. At last, the cloud layer ensures robust storage capacity and computing.
Metaheuristics based Task Offloading Framework in Fog Computing …
227
Formulation of IoT-fog-cloud Model The IoT layer has several distributed sensors Si1 , Si2 , . . . Sim , where every sensor, Si j generates data at λi j . The fog layer has several fog nodes f gi1 , f gi2 . . . , f gin .. The communication cost of data among the Si j and fog node number f gi j is represented in Eq. (1). It is a sum of L jk (network latency) among Si j and f gi j and transfer time of data among Si j and f gi j . FgCCost jk = L jk +
Ds j Bl
(1)
where Ds j is a size of data produced by Si j , and Bl is a bandwidth of local network. The communication cost among f gi j and cloud is given in Eq. (2). It is a total of cloud latency L jc and transfer time of data among cloud and f gi j . CloudCCost jk = L jc +
Ds j Bc
(2)
where Bc is the bandwidth of cloud. The entire communication cost for offloading of Si j is given as: CCost jk = CloudCCost jk + FogCCost jk
(3)
Every sensor Si j produces data with Poisson distribution and arrival rate of data λi j . The produced data are forwarded to fog nodes for performing analysis, filtering and aggregation. At last, the fog nodes forward the data to the cloud. In the cloud, further analysis, costly processes and memory storage will be carried out. The task offloading service time of Si j to f gi j is computed on the basis of M M 1 queuing process and it is expressed as: ST jk =
1 μ jk − λ jk
(4)
where μ jk is a rate of service. For more sensors, in which every sensor belongs to the particular class d, then the time of service is computed as: ST jc =
1 μ jc − λ jc
(5)
where μ jc is a rate of service for particular class d, and λic is the sum of arrival data rate belongs to the particular class d, and it is computed by: λ jc =
k∈c
λ jk
(6)
228
P. Thomas and D. V. Jose
The resource utilization Ui j of f gi j , is a sum of Ui j of offloaded tasks from every sensor Si j which is computed as: Ui j =
c
k∈c
yi j ×
λi j μi j
(7)
where yi j = 1, when Si j tasks are offloaded to f gi j . The entire response time of Si j workload as the result of task offloading is a total of average service time, IoT-fog and fog-cloud communications. It is computed as: R jk = CCost jk + ST jc
(8)
The average response time (ART) of f gi j as the result of task offloading to node is computed as: Rk =
k
Rkc
1 − μ jc
Rk − Raverage Loadk = 1 − k Rk
(9) (10)
3.2 FSA-HBA for Task Offloading of IoT-fog The data produced from IoT devices should be managed by considering network latency, energy consumption, communication cost, resource utilization, arrival rate, total load and execution time and existing loads. Hence, the solution of task offloading should select the fog device that satisfies the defined QoS constraints. It is a NP-hard problem where the complexity improves exponentially when the total fog nodes and sensors increase making conventional greedy algorithms less feasible. Hence, two metaheuristic algorithms are used for task offloading of IoT-fog. This work initially executes the FSA for finding the optimal task offloading and then executes the HBA for fine tuning to obtain more accurate output. The output obtained from the HBA algorithm is termed as the optimized outcome. FSA for Finding the Optimal Task Offloading FSA [21] is a swarm intelligence optimization and it is influenced by the characteristics of foraging and migration of flamingos. The mathematical model for characteristics of flamingos is developed where FSA proves to have a better local exploitation and global exploration for the optimization. Flamingos are the social migrant birds that generally feed on worms, insects, clams and algae. They also eat by turning their heads and bending their large necks.
Metaheuristics based Task Offloading Framework in Fog Computing …
229
Foraging and migration are the two major behaviors of flamingos. The populations of flamingos mostly live in a place where there is more food. After the time of large foraging, the population of flamingo relocate to another location when the food in the place is minimized to the stage that it can’t satisfy the population. The major processes involved in FSA are given below: • Flamingos sing to each other to communicate their place and food availability in that place. • The flamingo population is unaware about where the food is abundant in the present place. Every flamingo location is updated for finding the area having more food than available known food. This characteristic of flamingos uses a swarm algorithm for seeking the global optimized solution. When the flamingo is searching for food and the present global optimal solution is unknown, then the flamingo uses the development and search space via information exchange among each other. • The position rules variations are on the basis of foraging and migrating characteristics of flamingos. The mathematical modeling of FSA is explained below: Foraging Characteristics Communication characteristics: FSA is an optimization that imitates flamingos to identify the optimal solution in search place. Let us consider that the search agent that has plenty of food in k th dimension is ybk . The communication among flamingos is through singing to each other. Beak sign characteristics: When the bill of flamingos is inverted in water, it behaves like a huge filter which drinks and filters water fast. When they forage for the food, their heads will dip down, mouths will be moved upside down and they eat food and discharge more water. Let the position of j th flamingo in the kth dimension of the population of the flamingo is y jk . Considering the variability of every individual flamingo, the impact of a sudden particular environment on foraging characteristics of flamingo’s has to be considered. For simulating this error, normal random distribution is developed and the beak sign has a large probability of alignment with location direction where the food is plenty. The flamingo’s beak sign maximum distance is given as J1 × ybk + γ2 × y jk , where γ2 is a random number and it ranges from − 1 or + 1. The maximum distance is considered for increasing the flamingo’s search space. J1 is a random number that follows a normal random distribution. For simulating the scan range of flamingos on beak sign characteristics, random distribution is used and the maximum distance is given as J2 × J1 × ybk + γ2 × y jk , where J2 is a random number. Bipedal mobile characteristics: When flamingos forage, their claws move to where the food is more. Then the distance traveled by flamingo is given as γ1 + ybk , where γ1 is a random number and it ranges from − 1 or + 1. The distance traveled by flamingo and the scan range of flamingos on break sign characteristics are added and it is given as:
230
P. Thomas and D. V. Jose
C tjk = (γ1 + ybk ) + J2 × J1 × ybkt + γ2 × y tjk
(11)
The initial location of foraging characteristics is given as: R jk Ravg
y tjk (0) =
(12)
The updated location is given as: y t+1 = jk
y tjk + γ1 + ybkt + J2 × J1 × ybkt + γ2 × y tjk I
(13)
th t th th where y t+1 jk is a j flamingo in k dimension of population in (t + 1) iteration, y jk is position of j th flamingo in k th dimension in t th iteration. I is a diffusion factor and it is a random number.
Migrating Characteristics The flamingo population relocate to the new place where there is plenty of food when there is lack of food in the current foraging location. The migration behavior is expressed as: t t t y t+1 jk = y jk + α × ybk − y jk
(14)
where α is Gaussian random number and it defines the necessity of task offloading quality. η jk (t) =
loadk R jk
(15)
where R jk is computed by Eq. (8) and loadk is a load on fog node k and it is computed by Eq. (10). It is seen that when R jk k increases, η(t) jk and loadk are decreased. Algorithm 1: FSA optimization for task offloading Initialize the parameters (number of nodes, iteration, flamingos, rate of data and arrival rate of data) Compute R jk using Eq. (8) Initialize the foraging characteristics Compute η(t) jk using Eq. (15) For 1 : 1 Max i do For Si j = 1 : Nnodes Compute foraging characteristics using Eq. (13) Flamingos selects f gi j and Si j (continued)
Metaheuristics based Task Offloading Framework in Fog Computing …
231
(continued) Algorithm 1: FSA optimization for task offloading End for Compute migration characteristics using Eq. (14) for task offloading End for loop Update the best solution End for loop
The FSA optimization for the task offloading process is given in Algorithm 1. At first, the algorithm is initialized by different initialization parameters. Response time is computed using Eq. (8), y tjk (0) and task offloading η jk (t) for every flamingo is calculated using Eqs. (12) and (15). During every iteration, each flamingo offloads the computation of Si j to f gi j . After completing the entire offloading task, the migration characteristics are updated using Eq. (14). This process continues until optimal task offloading criteria is met. HBA Optimization for Fine Tuning HBA [23] is an optimization algorithm which mimics the foraging characteristics of Honey badger (HB). It is an animal with black and white fur and it generally lives in rainforests and semi-deserts of countries like India’s subcontinent, Africa, and southwest Asia. The dynamic searching characteristics of HB for digging phase and honey identification are used in exploitation and exploration stages of HBA. The mathematical modeling of HBA is explained in this section. The solution for candidate’s population is given as: ⎡ ⎢ Y =⎣
y11 y12 . . . . . . y1D y21 y22 . . . . . . y2D
⎤
⎥ ... ... ... ... ... ... ... ... ... ... ⎦ yn1 yn2 . . . . . . yn D
(16)
Initializing stage: Initialize the population size M (number of HBs) and their corresponding positions using Eq. (17). y j = LL j + rand1 + UL j − LL j
(17)
where rand1 is a random number which ranges from 0 to 1, y j is j th position of HB with respect to population size M, L L j and U L j , are lower and upper limits of search space. Describing intensity, I : It is related to the focusing power and distance among prey of j th HB. I n is the smell prey intensity. The movement will be fast, when the smell is high and movement is slow, when the smell is low. It is represented using the inverse square law:
232
P. Thomas and D. V. Jose
In = rand2 ×
R 4π D 2j
(18)
where rand2 is a random number which ranges from 0 to 1. 2 R = y j − y j+1
(19)
D j = yprey − y j
(20)
Here, R is focusing power (prey location), yprey is a prey position (global best position) and D j is distance among prey and j th HB. Density factor (β) updations: This factor manages the time changing synchronization to provide better transition from exploration phase to exploitation. β minimizes with every iteration for minimizing randomization with time and it is given as: β = C j × exp
−t tm
(21)
where C j is constant and tm is the maximum iterations. Overcome form local optima: This phase and the following two phases are utilized to overcome the local optima. HBA utilizes Flag (F) which vary the searching direction to avail better opportunities for HB. F = {1 when rand3 ≤ 0.5 − 1 otherwise
(22)
where rand3 is a random number which ranges from 0 to 1. Updating the HB’s positions: The position of HBs ynew is updated based on the digging stage and honey stage as explained below. Digging Stage In this stage, HB carries out the action the same as cardioid shape. The movement of this shape is given as: ynew = yprey + F × α × I × yprey + F × α × rand4 × D j × |cos(2π rand5 ) × [1 − cos(2π rand6 )]|
(23)
where α is the capacity of HB for getting food, rand4 , rand5 and rand6 , are the various random numbers which range from 0 to 1. In a digging stage, HB mainly depends on smell intensity I of yprey , D j and α. The value F is used for finding better locations of prey.
Metaheuristics based Task Offloading Framework in Fog Computing …
233
Honey Stage In this case, HB follows honey guide bird for reaching bee-house and it is given as: ynew = yprey + F × α × rand7 × D j
(24)
where rand7 is a random number. From Eq. (24) it is seen that HB carried out a search near to yprey . Algorithm 2: HBA optimization for fine tuning Initialize parameters like C j , α, M and tm . Initialize HB population with random positions Evaluate fitness using Eq. (25) While t ≤ tm , do β is updated using Eq. (21) For j = 1 to M do Compute In using Eq. (18) When rand < 0.5, then The position ynew is updated using Eq. (23) Else: The position ynew is updated using Eq. (24) End if Evaluate the updated position for optimal solution End if End for End while Repeat the process until desired solution is obtained
The entire process of HBA optimization is given in algorithm 2. The process starts by initializing the parameters like C j , α, M and tm . Every HB is evaluated via an objective function that indicates the solution quality. The value of fitness is computed by Eq. (25). OB =
Nnodes loadk k=1 Rk
(25)
The larger the value of objective function, the solution is better. After computing fitness of every HB, the local optimum of every HB is fixed. The best value of fitness among the HB is considered as a global optimal value. For every iteration, every HB’s digging and honey phase is updated by using Eqs. (23) and (24). These phases enhance the local search capacity and manage the diversity of the task offloading process.
234
P. Thomas and D. V. Jose
4 Results and Discussion This section illustrates the experiments carried out for evaluating the proposed task offloading for three of their IoT-fog-cloud environments. The implementation is designed using the iFogSim tool. This simulator is used in IoT and fog environments for managing the IoT services in fog. The processor used is Intel (R) core (TM) i3-3220 CPU @ 3.30 GHz. The RAM memory used is 8 GB with a 64-bit operating system. Table 1 presents the parameters used for the simulations.
4.1 Evaluation Measures The experimentations are carried out for evaluating the proposed task offloading process on the basis of evaluation measures like total latency, execution time, degree of imbalance (DI), etc. Degree of imbalance (DI): It provides the imbalance between the fog nodes which are available and it is represented as: DI =
Max(Rk ) − Min(Rk ) Ravg
k = 1, 2, . . . Nnode
(26)
(a) First and second scenario In Fig. 2a, the proposed model is implemented with 10 fog nodes. The parameters considered are λi j = 15, 10, 5 and μi j = 50, 75, 100. Latency time of the proposed model is compared with the methods like modified particle swarm optimization (MPSO) [24], Bee life algorithm (BLA) [25], SACO [2], Throttle and round robin (RR). The latency of the proposed model is decreased when Table 1 Parameter setting
Parameters
Values
Bl
700 Mb/s
Bc
37 Mb/s
Number of sensor nodes
2500
Fog nodes
25
Number of iterations
50
d
3
λi j
35
Ds j
192
L jk
2–20 ms
L jc
20–30 ms
μ jk
200
Metaheuristics based Task Offloading Framework in Fog Computing …
235
Fig. 2 Latency time comparison for a scenario 1 and b scenario 2
compared to other traditional approaches. Further, when the number of sensors is increased, latency time also increases. When the number of sensors is 2500, the latency time of RR is 126 s, Throttle is 104 s, MPSO is 102 s, BLA is 101 s, SACO is 97 and the proposed model is 83 s, respectively. In the case of Fig. 2b, let the value of μi j = 50, 75, 100 which is the same as the first scenario and λi j = 50, 40, 30 with 10 fog nodes in the second scenario. From the graph, it is observed that when the data rate is increased, the latency is reduced. When there are 1750 sensors, the latency of RR is 107 s, Throttle is 98 s, MPSO is 97 s, BLA is 96 s, SACO is 94 and the proposed model is 69 s, respectively. (b) Third and fourth scenario In the third scenario, the number of fogs counted increases from 10 to 25, λi j = 50, 40, 30 and μ jk = 400, 350, 300. In Fig. 3a, when the values of λi j and μ jk are increased, there are some changes observed in the figure. The task of offloading latency is decreased, when the IoT sensors are increased. When the number of IoT sensors is 750, latency time of RR is 97 s, Throttle is 95 s, MPSO is 95 s, BLA is 94 s, SACO is 94 s and the proposed model is 73 s, respectively. Figure 3b shows the outcomes of the fourth scenario; in this
Fig. 3 Latency time comparison for a scenario 3 and b scenario 4
236
P. Thomas and D. V. Jose
case, 25 sensors are considered, λi j = 30, 25, 20 and μ jk = 400, 350, 300. It is observed that the number of IoT sensors is 1750, latency time of RR is 110 s, Throttle is 107 s, MPSO is 107 s, BLA is 106 s, SACO is 101 s and the proposed model is 73 s. It is proved from all three scenarios that the proposed model can be utilized in the task offloading process in fog computing. Figure 4 depicts the execution time comparison of the proposed and existing algorithms. The execution time is computed by taking an average of overall scenarios. For the offloading process, the average time taken by the approaches like SACO, MPSO, throttle, RR, BLA and proposed optimization are 106.5, 113.2, 115.5, 122.25, 110.4 and 71 ms, respectively. Figure 5 shows the degree of imbalance of the proposed and existing algorithms. The results are obtained by varying the number of IoT sensors from 250 to 2500 sensors. Degree of imbalance is less for the proposed optimization. Particularly, when the sensor is 500 the degree of imbalance is identified as 0.03 which is considerably less for the proposed model. When the sensors are increased, the degree of imbalance is also maintained in a balanced manner. It shows that the proposed optimization effectively balances the load over fog nodes. From the experimental analysis, it is noted that the latency time, time of execution and degree of imbalance is better for the proposed offloading technique. From this analysis, it is observed that by obtaining these better results, QoS parameters can be improved. Thus, the proposed model enhances the QoS in the IoT-fog-cloud environment with reduced complexity.
Fig. 4 Execution time comparison
Metaheuristics based Task Offloading Framework in Fog Computing …
237
Fig. 5 Degree of imbalance of the proposed and existing algorithms
5 Conclusion The IoT applications are increasing at a tremendous rate generating millions of data. The dependence of IoT applications on cloud offers numerous benefits to the IoT applications such as extended storage and processing. However, the integration is less suitable for latency-sensitive applications due to increased delay and response time. Fog computing offers less latency and improved response time. IoT-fog-cloud integration model overcomes the drawbacks of IoT-cloud integration. Offloading of tasks to fog nodes is crucial in the integration as it decides the performance of the application. This work uses a combined approach for solving the optimization problem using two efficient metaheuristic algorithms, Flamingo Search Algorithm (FSA) and Honey Badger Algorithm (HBA). Initially the FSA algorithm is used for task offloading and the results obtained by FSA are fine-tuned using the HBA algorithm. The performance evaluation of the proposed optimization technique is compared with other popular optimization techniques. The quantitative results show that the proposed model has better achievements in terms of latency, execution time and degree of imbalance. The experimental evaluations show that the proposed approach is fairly suitable for performing the task offloading and solving the optimization problem with reduced complexity. In future, this work will be extended to consider the hybridization of the proposed techniques and also other parameters like energy consumption, scalability, etc. needs to be considered for improving performance further.
238
P. Thomas and D. V. Jose
References 1. Keshavarznejad M, Rezvani MH, Adabi S (2021) Delay-aware optimization of energy consumption for task offloading in fog environments using metaheuristic algorithms. Cluster Comput 24(3):1825–1853 2. Kishor A, Chakarbarty C (2022) Task offloading in fog computing for using smart ant colony optimization. Wireless Pers Commun 127(8):1683–1704 3. Li X, Zang Z, Shen F, Sun Y (2020) Task offloading scheme based on improved contract net protocol and beetle antennae search algorithm in fog computing networks. Mobile Netw Appl 25:2517–2526 4. Adhikari M, Mukherjee M, Srirama SN (2020) DPTO: a deadline and priority-aware task offloading in a fog computing framework leveraging multilevel feedback queueing. IEEE Internet Things J 7(7):5773–5782 5. Vemireddy S, Rout RR (2021) Fuzzy reinforcement learning for energy efficient task offloading in vehicular fog computing. Comput Netw 199:108463 6. Sun H, Huiqun Y, Fan G, Chen L (2020) Energy and time efficient task offloading and resource allocation on the generic IoT-fog-cloud architecture. Peer-to-Peer Netw Appl 13(2):548–563 7. Jain V, Kumar B (2021)Optimal task offloading and resource allotment towards fog-cloud architecture. In: 2021 11th International conference on cloud computing, data science & engineering (confluence). IEEE, pp 233–238 8. Tran-Dang H, Kim D-S (2021) FRATO: fog resource based adaptive task offloading for delayminimizing IoT service provisioning. IEEE Trans Parallel Distrib Syst 32(10):2491–2508 9. Mahini H, Rahmani AM, Mousavirad SM (2021) An evolutionary game approach to IoT task offloading in fog-cloud computing. J Supercomput 77:5398–5425 10. Shahryari O-K, Pedram H, Khajehvand V, Fooladi MDT (2021) Energy and task completion time trade-off for task offloading in fog-enabled IoT networks. Pervasive Mobile Comput 74:101395 11. Ren Q,Liu K, Zhang L (2022) Multi-objective optimization for task offloading based on network calculus in fog environments. Digit Commun Netw 8(5):825–833 12. Misra S, Saha N (2019) Detour: Dynamic task offloading in software-defined fog for IoT applications. IEEE J Sel Areas Commun 37(5):1159–1166 13. Jindal R,Kumar N, Nirwan H (2020) MTFCT: a task offloading approach for fog computing and cloud computing. In: 2020 10th International conference on cloud computing, data science & engineering (Confluence). IEEE, pp 145–149 14. Zhou Z,Liao H, Gu B, Mumtaz S, Rodriguez J (2020) Resource sharing and task offloading in IoT fog computing: a contract-learning approach. IEEE Trans Emerg Top Comput Intell 4(3):227–240 15. Swain C, Sahoo MN, Satpathy A, Muhammad K, Bakshi S, Rodrigues JJPC, de Albuquerque VHC (2021) METO: matching-theory-based efficient task offloading in IoT-fog interconnection networks. IEEE Internet Things J 8(16):12705–12715 16. Baek J, Kaddoum G (2021) Heterogeneous task offloading and resource allocations via deep recurrent reinforcement learning in partial observable multi-fog networks. IEEE Internet Things J 8(2):1041–1056 17. Fan N,Wang X, Wang D, Lan Y, Hou J (2020) A collaborative task offloading scheme in D2Dassisted fog computing networks. In: 2020 IEEE Wireless communications and networking conference (WCNC). IEEE, pp 1–6 18. Li X,Zhang G, Zheng X, Hua S (2020) Delay optimization based on improved differential evolutionary algorithm for task offloading in fog computing networks. In: 2020 International conference on wireless communications and signal processing (WCSP). IEEE, pp 109–114 19. Meena V, Gorripatti M, Praba TS (2021) Trust enforced computational offloading for health care applications in fog computing. Wireless Pers Commun 119(2):1369–1386 20. Mazumdar N, Nag A, Singh JP (2021) Trust-based load-offloading protocol to reduce service delays in fog-computing-empowered IoT. Comput Electr Eng 93:107223
Metaheuristics based Task Offloading Framework in Fog Computing …
239
21. Wu C-g, Li W, Wang L, Zomaya AY (2021) An evolutionary fuzzy scheduler for multi-objective resource allocation in fog computing. Futur Gener Comput Syst 117:498–509 22. Zhiheng W, Jianhua L (2021) Flamingo search algorithm: a new swarm intelligence optimization algorithm. IEEE Access 9:88564–88582 23. Hashim FA, Houssein EH, Hussain K, Mabrouk MS, Al-Atabany W (2022) Honey badger algorithm: New metaheuristic algorithm for solving optimization problems. Math Comput Simul 192:84–110 24. Abdi S, Motamedi SA, Sharifian S (2014). Task scheduling using modified PSO algorithm in cloud computing environment. In: International conference on machine learning, electrical and mechanicalengineering (ICMLEME2014), vol 4, issue no 1, pp 8–12 25. Bitam S, Zeadally S, Mellouk A (2018) Fog computing job scheduling optimization based on bees swarm. Enterp Inf Syst 12(4):373–397
Empirical Evaluation of Microservices Architecture Neha Kaushik, Harish Kumar, and Vinay Raj
Abstract Microservices architecture has gained a lot of interest in recent times in designing large enterprise applications with coupling and scalability as its significant features. Microservices architecture is an architectural approach in which large applications are split into small, loosely coupled services that communicate with one another using lightweight protocols such as RESTful APIs. The identified microservices must be monitored and assessed in order to determine the areas that need to be improved. Therefore, it is required to study the tools and metrics that can help practitioners and developers to continuously evaluate the quality of microservice-based architectures and provide suggestions for refactoring them. This work adapted the empirically based architecture evaluation (EBAE) method for evaluating the microservices-based applications. A web-based application is chosen as a case study and metrics related to QoS attributes are considered. The results obtained are presented, and it is clear from the results that microservices architecture can be strongly recommended for the design of large enterprise applications. Keywords Microservices · Quality of services · Architectural metrics · Coupling
1 Introduction Microservices are a new software development architecture which aims to design complex applications as a collection of fine-grained, loosely coupled services called microservices. Each microservice communicates with other microservices using language independent APIs [1]. The word “microservices” was initially mentioned by James Lewis and Martin Fowler at a May 2011 software architecture workshop [2]. N. Kaushik (B) · H. Kumar Department of Computer Engineering, JC Bose University of Science and Technology, Faridabad, India e-mail: [email protected] V. Raj Department of Computer Applications, National Institute of Technology Tiruchirappalli, Tiruchirappalli, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 689, https://doi.org/10.1007/978-981-99-2322-9_18
241
242
N. Kaushik et al.
Each microservice is considered as a self-contained service with a single and welldefined goal. Infrastructure automation, componentization via services, decentralized data management and governance, and organization around business capabilities are some characteristics of microservices-based architecture [3]. The benefits of microservices architecture are widely acknowledged in academics and industry. Microservices architecture is an extension of service-oriented architecture (SOA) [4]. Service-oriented architectures become complex and difficult to manage with change in business requirements, whereas small granularity and independent nature of microservices architecture makes it easy to understand and maintain [4, 5]. Microservices are more scalable than monolithic architecture. If a bottleneck exists in one microservice, then that particular microservices can be deployed in a different container on different host instead of deploying whole system to a more capable machine [6]. Microservices architecture provides DevOps support by allowing development and deployment of each microservice independent of other teams [7]. Because of the loosely coupled nature of microservices, the failure of one service does not affect the execution of the whole system [8, 9]. Separation of software responsibilities, easy technology experimentation, delegation of team responsibilities, data isolation, and mix of technologies are some additional benefits associated with microservices architecture [10]. Many large organizations, such as Netflix, Amazon, and eBay, have switched to MSA because of its perceived benefits [5, 11]. Apart from several benefits, there are some challenges that need to be discussed. From an architecture point of view, quality assurance is considered as the key concern during the migration from monolithic architecture to microservices-based architecture [12]. However, there is a lack of systematic approaches to evaluate the quality of microservices architecture. Researchers are working on identifying microservices from legacy systems but there is also a need to evaluate whether the output services are good enough to perform well or if any improvement is required. In several studies, it is mentioned that there is a need for guidelines and metrics for assessing the quality of a microservices-based system [1]. To address the above issue, this paper adapted the empirically based architecture evaluation (EBAE) method for evaluating the microservices-based applications. The EBAE method consists of various metrics for assessing the coupling, cohesion, and total complexity of microservices architecture. For validation of proposed metrics, a case study of vehicle management system (VMS) has been considered. VMS is a microservices-based application [11]. Coupling, cohesion, and total complexity values of the VMS are evaluated using the proposed metrics. In order to perform well an application should have high cohesion and low coupling and the results show that proposed metrics are calculating the values correctly. The remaining part of the paper is structured as follows. Section 2 includes various quality attributes of microservices-based architecture. Section 3 presents the related work done to assess various quality attributes for different architectures including microservices. Section 4 includes methodologies where several metrics to assess the quality of microservice architecture are discussed. Section 5 includes a case study application with its service details. Evaluation is done in Sect. 6. In Sect. 7, the conclusion is drawn and future work is suggested.
Empirical Evaluation of Microservices Architecture
243
2 Quality Attributes Microservices architecture has the following quality attributes: 1. Granularity: The size of the microservice is known as granularity. It is one of the most important quality attributes of microservices architecture. Having microservices with different granularity impacts the performance of whole microservicesbased application significantly [13]. 2. Cohesion: The degree to which various operations of a particular microservice work together to achieve a single task is referred as cohesion [14]. 3. Coupling: Coupling of a microservice architecture can be defined as the dependencies and connections of a service to another service. Circular dependencies should be avoided to let the web application work properly [14]. 4. Availability: Availability refers to the ability of a system to correct defects in such a way that the time a service or a functionality is unavailable does not exceed a set limit. A minor decrease in the availability of single microservice with strong coupling reduces the overall availability of the whole microservicesbased system(MSA) considerably [15]. 5. Scalability: Scalability is the ability of microservices to function appropriately without degrading its performance and regardless of change in its size. Horizontal scaling and vertical scaling are its two types [12]. 6. Maintainability: This quality attribute can be defined as the degree of effectiveness and efficiency with which whole MSA can be modified [16]. 7. Performance: The ability of the target microservice system to handle requests under time restrictions is referred to as performance. The performance of a microservice-based system is typically measured in terms of response time and throughput [17]. In this paper, coupling, cohesion, and total complexity of microservices-based application has been evaluated using various metrics.
3 Related Work Several studies have suggested the need for metrics and tools for evaluating the quality of microservices-oriented systems that go beyond traditional architectural patterns like coupling and cohesion, and consider performance, security, maintaining, testing, and debugging microservices-based system variants Wang et al. [1]. Elhag et al. [18] developed basic and derived metrics for evaluating the quality of service-oriented design, with a focus on coupling, cohesion, and complexity. These metrics can be used as a starting point for developing a comprehensive serviceoriented design quality measurement model. The author has only conceptually validated the measures, and empirical confirmation will have to wait till the next study due to space limits. These measurements do not adequately capture several of the
244
N. Kaushik et al.
essential aspects of service-oriented architecture such as performance, dependability, and security. Feuerlicht [19] proposed a simple metric—data coupling index (DCI) which provides a measure of the quality of service design and also determines the degree of data coupling between services based on orthogonality of interface data structures .To get a more precise estimate of data coupling, a more extensive study will require flattening XML structures and replacing all complicated elements with basic data elements. Cojocaru et al. [14] provide a set of quality assessment criteria for microservices produced by semi-automated migration tools or procedures. The quality assessment criteria are provided based on industry standards and also include a case study that verifies the given set of quality features. Bogner et al. [20] have reviewed the metrics available for measuring the maintainability of service-based systems. After finding all such metrics, the author has validated their applicability for microservices-based systems. Due to time constraints, the author has not done a systematic mapping and could have referred to more digital sources. Doljenko et al. [21] have proposed a fuzzy production network model for measuring the quality of microservices-based system. The author has mainly focused on measuring the functional qualities of microservices-based application. In the future, the assessment results can be used to improve the scalability and availability of microservices applications. Mazlami et al. [22] have proposed three formal coupling methods to identify candidate microservices from monolithic applications. The author has used metrics in order to identify the quality of output microservices-based application. The result shows that the use of proposed methods leads to decrease the deployment time of microservices-based applications. Amiri [23] has proposed a method for identifying highly cohesive and loosely coupled microservices. The author has decomposed the monolithic application into microservices using various clustering techniques. For measuring the quality (in terms of coupling and cohesion) of microservices applications various metrics have been used by the author.
4 Methodology Empirically based architecture evaluation (EBAE) method has been used to assess the architectures that have already been constructed. This method proposes the following general process for evaluating software architectures [24]. • • • •
Decide on a point of view to evaluate. Determine the metrics that will be utilized in the review. Collect metrics. Analyze architectural designs.
Empirical Evaluation of Microservices Architecture
245
In this work, EBAE method has been used to assess the quality attributes of microservices-based architecture. Various architectural metrics are discussed below to achieve this task.
4.1 Architectural Metrics Architectural metrics are used to assess the quality of an application software and they also help in avoiding various architectural technical debts in early stages of application development. Basic Metrics: Basic metrics are the primary step in developing derived metrics. In this section, basic metrics of previous software development architectures like component-based computing and monolithic architectures have been extended with the characteristics of microservices-based architectures. Number of services(NS): NS metric is used to find the number of services existing in the system and is increased by one for each service that belongs to microservices system (MSA). With the increase of NS, complexity of the system also increases. Hence, complexity of a system is directly proportional to the number of services present in it. S (1) NS(SOS) = s∈SOS
Number of operations(NO): NO counts the number of operations performed by the services. NO(s) counts the number of operations a service is responsible to perform while NO(SOS) metric counts all system operations across whole MSA. The cohesiveness is calculated using this metric. NO(S) =
O
(2)
o∈SOS
NO(SOS) =
NO(S)
(3)
s∈SOS
Provider(P): A service providing functionality to other services is known as the provider. (P) metric is used to determine the total number of providers in MSA, which is later used by coupling and cohesion metric. P = {(s, o) ∈ P | (s ∈ S) ∧ (o ∈ O) ∧ (s ∧ o) = ∅ ∧ R is I n}
(4)
Consumer(C): Consumer is a service that uses the functionality of other services. (C) metric is used to determine the total number of consumers in a MSA, which are later used by coupling and cohesion metric.
246
N. Kaushik et al.
C( p) = {(s, o) ∈ C | (s ∈ S) ∧ (o ∈ O) ∧ ( p ∈ P) ∧ R is Out}
(5)
Importance of Provider (IP): This metric provides weight to different providers by counting all of the customers who rely on the service provider. IP( p) =
c
C( p)
(6)
i=1
Because many consumers use the provider p’s features, IP(p) has a high value, implying that the provider p is very valuable. Designers should be cautious while designing an essential provider because numerous services and processes rely on it. Derived Metrics: Coupling metric and cohesion metrics are known as Derived metrics. An application system with low coupling and high cohesion is always a target for practitioners. Coupling metrics: Coupling metric finds the dependencies among services. It is calculated by counting all direct relationships between the services existing in MSA. All indirect relationships will be excluded for this purpose. • Direct coupling(DC): It is defined as the direct relationships between consumers and providers. As shown in the Equation given below, DC is calculated by counting the total number of direct consumers (C) for a particular provider (p). DC( p) = C( p)
(7)
• Indirect coupling(IC): IC is calculated by counting all direct and indirect calls made by consumers to providers in microservices-based design. As shown in Eq. 8, this metric calculates coupling more precisely than direct coupling by considering both direct and indirect interaction taking place between consumer and provider. IC( p) = DC( p) +
IC(c( p))
(8)
c( p)∈P
• Coupling factor (CoupF): It is difficult to evaluate the value of IC(p) for the system of different sizes. To comprehend the value of this metric, a new measure that compares the result of (IC) with the scale of a microservices-based system is required. If f = NS(SOS) + NO(SOS) then CopF( p) =
IC( p) f2 − f
(9)
MSA is a collection of loosely coupled services and coupling will directly impact the complexity of the whole application. The value of coupling metrics will help the developer in designing a loosely coupled and less complex microservices-
Empirical Evaluation of Microservices Architecture
247
based application. This metric has considered both direct and indirect dependencies among the consumer and provider. Cohesion metrics: It is used to understand the relationships between operations of a service. • Cohesion metric (CM): CM is used to calculate the degree of cohesion for a particular service. CM(s) = {c( p) | (c ∈ C) ∧ ( p ∈ P) ∧ (c ∧ p) ∈ s}
(10)
For every service(s) belonging to a MSA, count all c(p) who use the services provided by (p). After that, count which (c) and (p) belong to the same (s). • Cohesion factor (CohF): CM produces a numeric value, which cannot be interpreted on its own because it may be a good indicator of degree of cohesion if the system is large or a terrible indicator if the system is tiny. To grasp the value of this metric, a new metric is required. If f = NS(s) + NO(s), then CohF(s) =
CM(s) f2 − f
(11)
Microservices-based application is a collection of highly cohesive services and cohesion will directly impact the complexity of the whole application. The value of cohesion metrics will help the developer in designing a highly cohesive and less complex microservices-based application. This metric has considered both direct and indirect dependencies among the consumer and provider. Total Complexity Metrics: TCM helps in understanding the relationships between services and operations of those services. • Total complexity metrics of the services: (a) TCM(s) determines the level of complexity, a particular service(s) has TCM(s) =
IC(s) + NS(s) + NO(s) CM(s)
(12)
• Complexity factor (ComF): ComF uses the cohesion and coupling factors to provide a better understanding of complexity metric for the system. ComF(s) =
CopF(s) CohF(s)
(13)
• Total complexity metric in a system: It measures the complexity of the whole MSA. TCM(SOS) =
s∈SOS
TCM(s) × ComF(s)
(14)
248
N. Kaushik et al.
Table 1 List of services in microservice architecture-based web application Service # Service name Interacting services 1 2 3 4 5 6 7 8 9 10 11 12
Config. service Part service Product service Compare service Incentive service Pricing service Dealer service Get-a-quote service Dealer locator service Inventory service Lead processor service User interface client
2, 3, 4, 5, 6, 7, 9, 10, 12 1, 4, 5, 6, 10, 12 1, 4, 5, 6, 10, 12 1, 2, 3, 10, 12 1, 2, 3, 6, 12 1, 2, 3, 5, 10, 12 1, 9, 10, 11, 12 11, 12 1, 7, 10, 12 1, 2, 3, 4, 6, 7, 9, 12 7, 8, 12 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
5 Case Study We have used vehicle management system (VMS), a standard microservices architecture-based web application [20] to customize, select, and purchase vehicles and their parts using a web interface. This application assists customers in selecting, customizing, comparing, locating dealers, and requesting a quote. The database stores all of the information about the automobiles, their parts, and their costs, and the user interface assists clients with the details. Customers can use the inventory data to find the vehicle they want and the dealer that sells it. Table 1 consists of service number, service name, and the number of all services with which a service is interacting to complete its predefined task in the web application.
6 Evaluation Using the Eq. 1, explained above in Sect. 4.1, we have found that the number of services NS(SOS) in VMS is 12. Table 2 contains the list of NO(s), P, C, and IPs. NO(s) represents the number of operations performed by each service calculated using Eq. 2. Provider represents the service which gives functionality to other service, Consumer represents the service which consumes the functionality provided by other service, and importance of provider is used to provide weight to a provider service. Equations 4, 5, and 6 are used to calculate provider (P), consumer (C), and IP(p), respectively.
Empirical Evaluation of Microservices Architecture
249
Table 2 Number of operations performed by each service Service # Service name NO(s) P 1 2 3 4 5 6 7 8 9 10 11 12
Config. service Part service Product service Compare service Incentive service Pricing service Dealer service Get-a-quote service Dealer locator service Inventory service Lead processor service User interface client
Table 3 List of coupling values Service DC(p) 1 2 3 4 5 6 7 8 9 10 11 12
9 6 6 5 5 6 5 2 4 8 3 11
2 1 1 1 2 2 1 3 1 1 3 1
9 6 6 5 5 6 5 2 4 8 3 11
C
IP(p)
9 6 6 5 5 6 5 2 4 8 3 11
9 6 6 5 5 6 5 2 4 8 3 11
IC(p)
f
CopF(p)
11 11 11 11 11 11 11 11 11 11 11 11
14 13 13 13 14 14 13 15 13 13 15 13
0.06 0.07 0.07 0.07 0.06 0.06 0.07 0.05 0.07 0.07 0.05 0.07
f stands for sum of number of services in the system and number of operations
Table 3 contains the values of DC, IC, and CopF for each service. CopF(p) value depends upon two parameters, one is indirect coupling value and second is “f ” value, calculated using the formula f = NS(SOS) + NO(SOS). Equations (7), (8), and (9) mentioned above in Sect. 4.1 are used to calculate DC(p), IC(p), and CopF(p), respectively. Figure 1 represents the graphical representation of the coupling factor for each service. Coupling factor (CopF(p)) represents the linkage between two or more services. Table 4 contains the cohesion factor (CohF(s)) value of each microservice. CohF(s) value depends upon two parameters, one is cohesion metric (CM) which is
250
N. Kaushik et al.
Fig. 1 Graphical representation of coupling associated with each service
Table 4 List of cohesion factor for each service Service CM(s) f 1 2 3 4 5 6 7 8 9 10 11 12
9 6 6 5 5 6 5 2 4 8 3 11
3 2 2 2 3 3 2 4 2 2 4 2
CohF(s) 1.5 3 3 2.5 0.833 1 2.5 0.166 2 4 0.25 5.5
CohF(s) stands for cohesion factor of each service
used to calculate the degree of cohesion of a single service, and second is “f ” value calculated using the formula f = NS(s) + NO(s) where NO(s) represents number of operations performed by each service and NS(s) represents the total number of services the application system has Equation 10 mentioned above in Sect. 4.1 is used to calculate the cohesion metric and Eq. 11 is used to find CohF(s). Figure 2 represents the graphical representation of cohesion factor (CohF(s)) of each service. CohF(s) represents the degree by which various service operations work together to achieve a single task assigned to that service. It is clear from Figs. 1 and 2 that we have low coupling and high cohesion for VMS after using coupling and cohesion metrics. Table 5 is representing the total complexity of the system, i.e., TCM(SOS) calculated using Eq. 14, mentioned in Sect. 4.1. The results show that total complexity of the VMS is 4.054.
Empirical Evaluation of Microservices Architecture
251
Fig. 2 Graphical representation of cohesion factor
Table 5 List of cohesion factor for each service Service TCM(s) ComF(s) 1 2 3 4 5 6 7 8 9 10 11 12
1.555 2.166 2.166 2.6 2.8 2.333 2.6 7.5 3.25 1.625 5 1.181
0.04 0.023 0.023 0.028 0.072 0.06 0.028 0.301 0.035 0.017 0.2 0.012
TCM(s) * ComF(s) 0.062 0.049 0.049 0.072 0.201 0.139 0.072 2.256 0.113 0.027 1 0.014
TCM(s) * ComF(s) is the product of total complexity of each service and complexity factor Fig. 3 Total complexity of each service
Figure 3 shows the graphical representation of total complexity corresponding to each service in the system. Service number 12 has the minimum complexity among all the services and service 8 has highest complexity. These metrics are non-negative and CoupF metric is normalized between 0 and 1 (non-negative, normalization). DC, ID, and CoupF are null if there are no consumers (c) for each of the providers (p) (Null Value).
252
N. Kaushik et al.
7 Conclusion Microservices are services that are deployed individually and have a single, welldefined purpose. Many world-leading Internet companies, such as Netflix, Amazon, and eBay, are migrating from monolithic architecture to microservices architecture for a variety of reasons, including maintainability, scalability, the delegation of team responsibilities, fault tolerance, and ease of technological experimentation. The architectural metrics to obtain the QoS features for microservices architecture are used in this work. A web-based application is considered for evaluation of its QoS attributes using the metrics and the results obtained are presented. It is clear with the results that applications designed using microservices architecture exhibit better QoS values and microservices architecture can be used to designed large enterprise applications. Metrics for analyzing QoS features such as maintainability, performance, and reliability can be considered as future work.
References 1. Wang Y, Kadiyala H, Rubin J (2021) Promises and challenges of microservices: an exploratory study. Empirical Softw Eng 26(4):63 2. Jamshidi P, Pahl C, Mendonça NC, Lewis J, Tilkov S (2018) Microservices: the journey so far and challenges ahead. IEEE Softw 35(3):24–35 3. Raj V, Ravichandra S (2022) A service graph based extraction of microservices from monolith services of service-oriented architecture. Softw Pract Exper 52(7):1661–1678 4. Raj V, Sadam R (2021) Patterns for migration of SOA based applications to microservices architecture. J Web Eng: 1229–1246 5. Raj V (2021) Framework for migration of SOA based applications to microservices architecture. J Comput Sci Technol 21 6. Avritzer A, Ferme V, Janes A, Russo B, van Hoorn A, Schulz H, Rufino V (2020) Scalability assessment of microservice architecture deployment configurations: a domain-based approach leveraging operational profiles and load tests. J Syst Softw 165:110564 7. Bertolino A, Angelis GD, Guerriero A, Miranda B, Pietrantuono R, Russo S (2020) DevOpRET: continuous reliability testing in DevOps. J Softw Evol Process: e2298 8. Wang T, Zhang W, Xu J, Gu Z (2020) Workflow-aware automatic fault diagnosis for microservice-based applications with statistics. IEEE Trans Netw Service Manage 17(4):2350– 2363 9. Raj V, Srinivasa Reddy K (2022, Feb) Best practices and strategy for the migration of serviceoriented architecture-based applications to microservices architecture. In: Proceedings of second international conference on advances in computer engineering and communication systems: ICACECS 2021. Springer Nature, Singapore, pp 439–449 10. Taibi D, Lenarduzzi V, Pahl C (2017) Processes, motivations, and issues for migrating to microservices architectures: an empirical investigation. IEEE Cloud Comput 4(5):22–32 11. Raj V, Sadam R (2021) Performance and complexity comparison of service oriented architecture and microservices architecture. Int J Commun Netw Distrib Syst 27(1):100–117 12. Li S, Zhang H, Jia Z, Zhong C, Zhang C, Shan Z, Babar MA (2021) Understanding and addressing quality attributes of microservices architecture: a systematic literature review. Inf Softw Technol 131:106449 13. Homay A, Zoitl A, de Sousa M, Wollschlaeger M, Chrysoulas C (2019, July) Granularity cost analysis for function block as a service. In: 2019 IEEE 17th international conference on industrial informatics (INDIN), vol 1, pp 1199–1204
Empirical Evaluation of Microservices Architecture
253
14. Cojocaru MD, Uta A, Oprescu AM (2019, June) Attributes assessing the quality of microservices automatically decomposed from monolithic applications. In: 2019 18th international symposium on parallel and distributed computing (ISPDC). IEEE, pp 84–93 15. Habbal N (2020) Enhancing availability of microservice architecture: a case study on Kubernetes security configurations 16. Bogner J, Wagner S, Zimmermann A (2017, Octr) Automatically measuring the maintainability of service-and microservice-based systems: a literature review. In: Proceedings of the 27th international workshop on software measurement and 12th international conference on software process and product measurement, pp 107–115 17. Camilli M, Russo B (2022) Modeling performance of microservices systems with growth theory. Empirical Softw Eng 27(2):39 18. Elhag AAM, Mohamad R (2014, Sept) Metrics for evaluating the quality of service-oriented design. In: 2014 8th Malaysian software engineering conference (MySEC). IEEE, pp 154–159 19. Feuerlicht G (2011) Simple metric for assessing quality of service design. In: Service-oriented computing: ICSOC (2010) international workshops, PAASC, WESOA, SEE, and SOC-LOG, San Francisco, CA, USA, December 7–10, 2010, revised selected papers 8. Springer, Berlin Heidelberg, pp 133–143 20. Bhallamudi P, Tilley S, Sinha A (2009, Sept) Migrating a Web-based application to a servicebased system-an experience report. In: 2009 11th IEEE international symposium on web systems evolution. IEEE, pp 71–74 21. Doljenko AI, Shpolianskaya IY, Glushenko SA (2020) Fuzzy production network model for quality assessment of an information system based on microservices 14(4(eng)):36–46 22. Mazlami G, Cito J, Leitner P (2017, June) Extraction of microservices from monolithic software architectures. In: 2017 IEEE international conference on web services (ICWS). IEEE, pp 524– 531 23. Amiri MJ (2018, July) Object-aware identification of microservices. In: 2018 IEEE international conference on services computing (SCC). IEEE, pp 253–256 24. Raj V, Sadam R (2021) Evaluation of SOA-based web services and microservices architecture using complexity metrics. SN Comput Sci 2:1–10
Thermoelastic Energy Dissipation Trimming at High Temperatures in Cantilever Microbeam Sensors for IoT Applications R. Resmi , V. Suresh Babu , and M. R. Baiju
Abstract In IoT, energy dissipation reduction is an essential requirement for achieving high-performance sensors and communication devices. Among the various energy dissipations that limit the performance of IoT devices, thermoelastic energy dissipation is a crucial one which seems to be laborious to control. In thermoelastic damping (TED), due to the coupling between temperature and strain fields in the vibrating structures, irreversible energy dissipation occurs. In this paper, the energy dissipation of a cantilever-type micro/nanobeam resonator at high temperature is analysed and found to be increased as the temperature elevates. Consequently, to reduce thermoelastic energy dissipation Q −1 at high temperatures, nonzero values for dimensionless length scale parameter are chosen and verified to be effective at all temperatures. The analysis is conducted by MATLAB 2015 and five diverse structural materials (Si, polySi, GaAs, diamond and SiC) are selected. The thermoelastic energy losses of micro/nanobeams are plotted by varying the temperature from 0 to 500 K. The increase in energy losses at elevated temperatures is minimised by the inclusion of the nonzero dimensionless length scale parameters. Keywords Internet of things (IoT) · IoT temperature sensor · Microbeam resonators · Thermoelastic damping · Thermoelastic energy dissipation · Dimensionless length scale parameter
R. Resmi (B) LBS Institute of Technology for Women, University of Kerala, Thiruvananthapuram 695012, Kerala, India e-mail: [email protected] V. S. Babu College of Engineering Trivandrum, APJ Abdul Kalam Technological University, Thiruvananthapuram, India e-mail: [email protected] M. R. Baiju University of Kerala, Thiruvananthapuram, Kerala, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 689, https://doi.org/10.1007/978-981-99-2322-9_19
255
256
R. Resmi et al.
1 Introduction More automated systems are implemented with Internet of Things leading to revolutionary changes in the world. In recent times, IoT has grown rapidly and a large number of connected devices like sensors, actuators and communicating devices are part of the system [1]. The diverse IoT devices can be MEMS/NEMS-based resonators, switches, micromirrors, etc. [2–4]. The potentials of IoT are identified by the various industries and found applications in several fields. IoT sensors used for high-temperature applications are a growing need in temperature monitoring systems [5]. MEMS/NEMS-based beam resonating structures are tremendously used in essential applications such as in healthcare industry, communication systems and automotive industry. Microbeams are widely used in military applications such as in defence systems, national security, avionics. The application of microbeams is extended to environmental monitoring also. MEMS/NEMS-based minute beam sensors are significantly used in IoT applications primarily due to their ultra-low mass as well as small size [6]. When MEMS /NEMS-based temperature sensors are used, the nominal thermal impact is another advantage. The microbeam resonators will be independent of temperature rise/fall rate which seems to be an added benefit. Hence, micro/nanobeams are essential components in today’s developments having wide applications. Beam-based MEMS/NEMS structures are manufactured based on either surface micromachining generating resonating structures or bulk micromanufacturing yielding resonators like bulk acoustic wave (BAW) resonators and surface acoustic wave (SAW) resonators [7, 8]. The surface-to-volume ratio and sensitivity of microbeam-based sensors are high compared to its other MEMS counterparts. In the current analysis, a cantilever-type microbeam resonator is taken for studying the impact of thermoelastic damping-related energy dissipation at high temperature. The most significant factor deciding the sensitivity and resolution of sensors is its quality factor (QF). The QF is determined by the various damping mechanisms existing in the resonator which leads to energy losses. The external damping mechanisms like squeeze film damping and anchor damping can be lessened easily by proper geometric design [9, 10]. The reduction of energy losses due to thermoelastic damping is also very significant in micro/nanosensors. The study of thermoelastic damping (TED) in various structures has been reported in the literature [11, 12]. The analysis of TED and its modelling is indispensable in micro/nanobeams owing to the decline of quality factor due to thermoelastic damping (Q TED ). In resonators, due to the vibrating nature, regions of compression and elongation are established leading to strain and temperature gradients. The effect is predominantly affected by the thermomechanical properties of the structural materials. The important thermal properties like thermal conductivity and coefficient of thermal expansion (CTE) decide the thermal flux to proceed from regions of higher temperature to lower causing non-uniform temperature distribution and energy losses. The mutual coupling of the two fields depends on the coefficient of thermal expansion,
Thermoelastic Energy Dissipation Trimming at High Temperatures …
257
and if it is zero, the two fields are decoupled and the energy loss becomes zero, causing the quality factor to reach its maximum value. Thermoelastic damping is first explained in microbeams by Zener [13, 14]. He derived an analytical expression for thermoelastic energy dissipation in microbeams. An exact expression for losses due to damping in microbeams is derived by Lifshitz and Roukes [15]. The thermoelastic energy losses in various microstructures like microplates, microdisks, microcircular plates and microshells have been reported [16–18]. In order to investigate the size scaling effects, higher order elasticity theories are applied since classical theories are inadequate. Modified Couple Stress Theory (MCST) developed by Yang et al. [19] seems to be the most efficient non-classical elasticity theory for incorporating the scaling effects. MCST utilises a single material length scale parameter and hence computationally efficient. Several studies were conducted to analyse scaling effects, and the experimental observation in microstructures is available in [20]. The significance of material length scale parameter in a rectangular microplate is explored by Razavilar et al. [21] applying MCST. The thermoelastic damping effects and associated attenuation and performance limiting parameters in rectangular plates were investigated by Resmi et al. [22, 23]. The analysis of thermoelastic damping and the influencing factors in microplates is conducted by Zhong et al. [24]. Analytical solutions for TED in microplates with three boundary conditions using Rayleigh’s method and applying dual-phase lag heat conduction models were investigated by Fang et al. and Borjalilou and Asghari, respectively [25, 26]. In this paper, Sect. 2 gives the basic equations of thermoelastic energy dissipation in microbeams with size effects applying MCST. Section 3 presents the results of the inclusion of material length scale on TED for varying temperature. The parameter variations of energy dissipation Q −1 with temperature, T, for five diverse materials are graphically also shown in Sect. 3. The future scope of the work and concluding remarks are given in Sect. 4.
2 Expression for Energy Dissipation Related to Thermoelastic Damping of a Cantilever Microbeam In the vibrating Euler–Bernoulli beams under consideration, the equations of motion are derived from variational theorems and Hamiltonian Principle. To derive the expression for thermoelastic energy dissipation with size effects, applying MCST, the strain energy related to thermal strain and mechanical strain is considered for plane stress condition (i.e. b > 5 h). The energy dissipation due to thermoelastic damping in a microcantilever beam is derived from coupled heat conduction and the equations of motion [23]. For a microcantilever beam, the isothermal value of s in the complex frequency domain is
258
R. Resmi et al.
siso = ±iωiso ,
(1)
where ωiso =
a 2
n
L
(EI)eq , ρA
(2)
where n represents the vibrating mode number and an is a boundary condition constant. The microcantilever beam under consideration is vibrating in first mode, and hence, an = 3.52. The inverse of the QF of the TED, K(s) . (3) Q −1 = 2 (s) The inverse of quality factor in classical form, Q −1 CT
6 R 6 sinh(K ) + sin(K ) , = − 3 K cosh(K ) + cos(K ) (1 + ) K 2
(4)
v) where R = EαρcvT0 is the relaxation strength, and = 2R (1(1−+2v) ; υ is the Poisson’s ratio. Under plane stress condition, 2
Q −1 MCST =
6 R 6 sinh(K ) + sin(K ) − λ(1 + ) K 2 K 3 cosh(K ) + cos(K )
(5)
in which (EI)eq λ = and (EI)eq = EI K = h
EI + μAl 2 1 − v2
(1 + )ωiso , 2D
where λ represents the rigidity ratio, μ denotes the shear modulus of the vibrating beam and (EI)eq is the equivalent stiffness.
Thermoelastic Energy Dissipation Trimming at High Temperatures …
259
3 Results and Discussions The dimensions of micro/nanocantilever (CL) beam structures under consideration are length L = 200 µm and width W = 10 µm. The CL is vibrating in first mode and five diverse materials—Si, diamond, polySi, GaAs and SiC—are analysed and the structural material properties are selected [27]. MATLAB 2015 is used for conducting the numerical investigations. In the current analysis, the size dependency in microcantilever beams is incorporated by applying a non-classical elasticity theory like MCST as mentioned before. It consists of a single material length scale parameter, l, and by varying it, the impact of size effect can be investigated.
3.1 Trimming of Thermoelastic Energy Dissipation at High Temperatures In microcantilevers, thermoelastic damping (TED) is proved to be a prominent energy loss mechanism even at room temperature and energy loss associated to TED increases with temperature. The material properties are much conclusive in determining the thermoelastic damping and allied energy dissipation. The maximum ther moelastic energy dissipation, Q −1 max = 0.494r , where r is a temperature-dependent parameter including thermal conductivity, specific heat capacity at constant temperature, coefficient of thermal expansion and Young’s modulus. The temperature dependence of the energy dissipation is investigated by performing numerical simulations using MATLAB 2015. The size effects have been included in the analysis by assuming nonzero values for the material length scale parameter, l. For exploring the dependence of temperature on thermoelastic energy dissipation, the graphs are plotted for three different values of l. Figures 1, 2 and 3 illustrate the dependence of thermoelastic energy dissipation, Q −1 with temperature for l = 0.2, 0.5 and 1, respectively. In all the analyses, the five structural materials are included and the graphs are plotted as depicted in Figures. The size effects and the related deformation behaviour of the resonating cantilevered microbeams are expressed with varying values of the material length scale parameter. The thermoelastic energy dissipation is concluded to be diminished for increasing values of the material length parameter. To exhibit the modulation of thermoelastic energy dissipation with temperature, the plots of energy dissipation for all the five materials applying both Modified Couple Stress Theory (MCST) and Classical Theory (CT) are depicted. The size effect study is characterised by including the material length scale parameter with nonzero values, and the classical study is conducted by assuming l = 0.The comparison of the thermoelastic energy dissipation for varying temperature with and without size effects is vividly illustrated in Figs. 1, 2 and 3 for all the five structural materials. The cantilever-based microbeam is considered to be vibrating under first mode in all the stipulated conditions.
260
R. Resmi et al.
Fig. 1 Q −1 versus T of a cantilever microbeam in first mode of vibration for different structural materials l = 0.2; applying Classical Theory (CT) and Modified Couple Stress Theory (MCST)
Fig. 2 Q −1 versus T of a cantilever microbeam in first mode of vibration for different structural materials l = 0.5; applying Classical Theory (CT) and Modified Couple Stress Theory (MCST)
Thermoelastic Energy Dissipation Trimming at High Temperatures …
261
Fig. 3 Q −1 versus T of a cantilever microbeam in first mode of vibration for different structural materials l = 1; applying Classical Theory (CT) and Modified Couple Stress Theory (MCST)
Figure 1 illustrates the variation of thermoelastic energy dissipation with temperature for a cantilever microbeam with l = 0.2, and the energy dissipations for MCST and CT exhibit only a very minute change. In Fig. 2, the thermoelastic energy dissipation applying MCST deviates from that of CT apparently, verifying the reduction in energy losses with a non-classical elasticity theory. The maximum deviation in energy losses is achieved with l = 1 as depicted in Fig. 3. Hence, it is substantiated that, even at higher temperatures by including higher values of l, the deteriorating performance of beam resonators at elevated temperatures can be abated. When temperature thermoelastic losses also increase. The increases, thermoelastic energy variation Q −1 with temperature for various materials is in the order of SiC < GaAs < diamond < polySi < Si for all values of material length scale parameters. The energy dissipation diminishes as the value of material length scale parameter increases, i.e. as l increases, Q TED also increases, and even at high temperatures, a high-quality factor is obtained for all the five structural materials.
4 Conclusion In microcantilevered resonating beams, different energy dissipations are limiting the maximum attainable quality factor and measures must be taken to control losses. Thermoelastic damping is proved to be a crucial energy loss mechanism at room and
262
R. Resmi et al.
higher temperatures and needs to be curtailed as much as possible. As the temperature rises, due to the enhanced coupling between temperature and strain fields, thermoelastic energy dissipation increases. In this paper, by including nonzero values for material length scale parameter, the impact of TED is lessened even at higher temperatures. As a result, the quality factor (Q TED ) gets enhanced even at very high temperatures. The thermoelastic energy dissipation is investigated by applying MCST and CT for evaluating the size effects. The comparison of impacts of material length scale parameter is well depicted graphically. The energy dissipation with temperature for differing structural materials is descending in the order, SiC > GaAs > diamond > polySi > Si for the three chosen values of material length scale parameters (l = 0.2, l = 0.5 and l = 1). The attained results can be implied for designing microbeam resonators with over the top quality factors even at very high temperatures.
References 1. Samaali H, Najar F, Choura S (2010). Dynamic study of a capacitive MEMS switch with double clamped-clamped microbeams. Shock Vib 2014(807489):1–7 2. Arathy US, Resmi R (2015) Analysis of pull-in voltage of MEMS switches based on material properties and structural parameters. In: 2015 International conference on control, instrumentation, communication and computational technologies (ICCICCT), Kumaracoil, India. IEEE, pp 57–61. https://doi.org/10.1109/ICCICCT.2015.7475249 3. Finny S, Resmi R (2016) Material and geometry optimization for squeeze film damping in a micromirror. In: 2016 International conference on emerging technological trends (ICETT), Kollam, India. IEEE, pp 1–5. https://doi.org/10.1109/ICETT.2016.7873698 4. Rebeiz GM (2003) RF MEMS: theory, design, and technology. Wiley-Blackwell, Hoboken 5. Unlu M, Hashemi MR, Berry CW, Li S, Yang S-H, Jarrahi M (2014) Switchable scattering meta-surfaces for broadband terahertz modulation. Sci Rep 4:5708 6. Srikar VT, Swan AK, Unlu MS, Goldberg BB, Spearing SM (2003) Micro-Raman measurement of bending stresses in micromachined silicon flexures. J Microelectromech Syst 12(6):779–787 7. Gayathri KS, Resmi R (2018) Q factor enhancement of Baw resonator using electrode optimization. In: 2018 2nd International conference on trends in electronics and informatics (ICOEI), Tirunelveli, India. IEEE, pp 1298–1302. https://doi.org/10.1109/ICOEI.2018.8553812 8. Ameena A, Resmi R (2018) Electrode optimization for enhancement of Q-factor in SAW resonators. In: 2018 2nd International conference on trends in electronics and informatics (ICOEI), Tirunelveli, India. IEEE, pp 1294–1297. https://doi.org/10.1109/ICOEI.2018.855 3824 9. Finny S, Resmi R (2016) Analysis of squeeze film damping in piston mode micromirrors. In: 2016 International conference on inventive computation technologies (ICICT), Coimbatore, India. IEEE, pp 1–5. https://doi.org/10.1109/INVENTIVE.2016.7830210 10. Mol S, Resmi R (2017) Anchor loss limited Q factor analysis of disk resonator for varying disk geometry. In: 2017 International conference on intelligent computing, instrumentation and control technologies (ICICICT), Kerala, India. IEEE, pp 1033–1037. https://doi.org/10.1109/ ICICICT1.2017.8342710 11. Duwel A, Candler RN, Kenny TW, Varghese M (2006) Engineering MEMS resonators with low thermoelastic damping. J Microelectromech Syst 15(6):1437–1445 12. Kim S-B, Kim J-H (2011) Quality factors for the nano-mechanical tubes with thermoelastic damping and initial stress. J Sound Vib 330(7):1393–1402 13. Zener C (1937) Internal friction in solids. I. Theory of internal friction in reeds. Phys Rev J Arch 52(3):230–235
Thermoelastic Energy Dissipation Trimming at High Temperatures …
263
14. Zener C (1938) Internal friction in solids II. General theory of thermoelastic internal friction. Phys Rev 53(1):90–99 15. Lifshitz R, Roukes ML (2000) Thermoelastic damping in micro-and nanomechanical systems. Phys Rev B 61(8):5600 16. Zuo W, Li P, Zhang J, Fang Y (2016) Analytical modeling of thermoelastic damping in bilayered microplate resonators. Int J Mech Sci 106:128–137 17. Resmi R, Babu VS, Baiju MR (2021) Analysis of thermoelastic damping limited quality factor and critical dimensions of circular plate resonators based on axisymmetric and nonaxisymmetric vibrations. AIP Adv 11(3), 035108-1–035108-14. https://doi.org/10.1063/5.003 3087 18. Nayfe AH, Younis MI (2004) Modeling and simulations of thermoelastic damping in microplates. J Micromech Microeng 14(12):1711–1717 19. Yang F, Chong ACM, Lam DCC, Tong P (2002) Couple stress based strain gradient theory for elasticity. Int J Solids Struct 39(10):2731–2743 20. Park SK, Gao X-L (2006) Bernoulli-Euler beam model based on a modified couple stress theory. J Micromech Microeng 16(11):2355–2359 21. Razavilar R, Alashti RA, Fathi A (2016) Investigation of thermoelastic damping in rectangular microplate resonator using modified couple stress theory. Int J Mech Mater Des 12(1):39–51 22. Resmi R, Babu VS, Baiju MR (2021) Impact of dimensionless length scale parameter on material dependent thermoelastic attenuation and study of frequency shifts of rectangular microplate resonators. IOP Conf Ser Mater Sci Eng 1091:012067-1–012067-8 23. Resmi R, Babu VS, Baiju MR (2022) Material-dependent thermoelastic damping limited quality factor and critical length analysis with size effects of micro/nanobeams. J Mech Sci Technol 36(6):3017–3038. https://doi.org/10.1007/s12206-022-0533-8 24. Zhong Z-Y, Zhang W-M, Meng G, Wang M-Y (2015) Thermoelastic damping in the sizedependent microplate resonators based on modified couple stress theory. J Microelectromech Syst 24(2):431–445 25. Fang Y, Li P, Zhou H, Zuo W (2017) Thermoelastic damping in rectangular microplate resonators with three–dimensional heat conduction. Int J Mech Sci 133:578–589 26. Borjalilou V, Asghari M (2018) Small-scale analysis of plates with thermoelastic damping based on the modified couple stress theory and the dual-phase-lag heat conduction model. Acta Mech 229:3869–3884. https://doi.org/10.1007/s00707-018-2197-0 27. Resmi R, Baiju MR, Babu VS (2019) Thermoelastic damping dependent quality factor analysis of rectangular plates applying modified coupled stress theory. AIP Conf Proc 2166:020029-1– 020029-8. https://doi.org/10.1063/1.5131616
Contactless Fingerprint Matching: A Pandemic Obligation Payal Singh and Diwakar Agarwal
Abstract Fingerprint is a ridge and valley pattern on the finger, and it is unique and different even on each finger of an individual. Fingerprint recognition system is an enormously used and reliable biometric technique. It has a wide area of applications like mobile authentication, civil identity system by government and in many other organizations to monitor time and attendance to prevent duplicate and false identities. Everywhere contact-based scanners are being used to get the fingerprint of the user. As covid-19 is a highly contagious disease and such practice may increase the spread of disease, so a novel approach of contactless fingerprint scanning is essential for making fingerprint-based biometric scanning safe and hygienic from covid-19 and other contagious diseases. Here, a comprehensive study is done on the available contactless datasets and cross-domain matching of contactless data with contactbased data in various fingerprint recognition studies. Keywords Biometric · Fingerprint · Contactless · Cross-domain matching · Minutiae points · Feature extraction
1 Introduction Biometric systems exercise physical or behavioral features to discriminate individuals for the identification and verification purpose [1]. Biometric systems are used in many areas like the travel sector, forensic and criminology, crowded access systems, immigration, E-commerce and many others [2]. Biological traits used in biometric system should have the following properties: 1. It should be universal. 2. It should be unique. P. Singh (B) · D. Agarwal GLA University, Mathura, India e-mail: [email protected] D. Agarwal e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 689, https://doi.org/10.1007/978-981-99-2322-9_20
265
266
P. Singh and D. Agarwal
3. Invariant with time. 4. And it should be collectable. Some most common biometric features are voice, face, ear, gait, iris, hand geometry, palmprint, fingerprint and retinal pattern of eye [3]. A unique pattern of ridge and valley on the finger is defined as a fingerprint. It is different even on each finger of the same person [3]. It has been used for over a century because of its unique, universal, invariant and easy to attain properties. That is why the fingerprint recognition system shows very high matching accuracy. All the biometric systems use scanners to get the biometric prints for identification and verification. Some of them use cameras and other contactless scanners, but in case of contact-based fingerprint recognition systems, users need to touch the scanner till the scanner reads and generates a biometric pattern [4]. Same scanner is used by many users in an organization every day, which leads to harmful infectious disease transmission like covid-19. Every single touch left latent prints on the scanner because of moisture present on the finger due to sweat and oil. Sterilizing the scanner in this tough time after every use is not possible and can also damage the scanner. To overcome all these complications, a new approach of contactless fingerprint has been introduced. Contactless fingerprint recognition is a novel approach which uses a digital camera instead of a contact-based scanner in the fingerprint recognition system to get the biometric fingerprint. Generating contactless fingerprint is comfortable, inexpensive, hygienic and fast [5]. Fingerprint matching is a widely used biometric recognition method in various applications from security to monitoring time and attendance in numerous organizations. As we are moving toward contactless fingerprint, our database is contact-based and huge. So, instead of changing the whole database, we need to do cross-domain matching between contactless and contact-based fingerprint [6]. Figure 1 shows the basic fingerprint recognition illustration for cross-domain fingerprints’ matching, which work in two phases [7]. First one is the enrollment phase where a fingerprint template from an individual is captured and stored. Second is identification and verification, where contactless fingerprint is matched with contact-based fingerprint template already stored in the database. In verification mode, a person’s identity is validated by comparing an already stored template in the system database of the same person. In the case of identification mode, the system recognizes the individual person by comparing his fingerprint image with already stored templates of all users in the database [3]. A fingerprint recognition system extracts the features from the fingerprint and compares them with existing templates for matching. The most common feature points used to match are minutiae points. Minutiae points are properties of ridges, like ridge ending and bifurcation.
Contactless Fingerprint Matching: A Pandemic Obligation
267
Fig. 1 Basic block diagram of fingerprint recognition system
2 Image Acquisition In the biometric system, the primary job is to get the biometric print. Contact-based fingerprints are achieved through touch-based scanners. Figure 2 represents the three fundamental types of touch-based scanners [8, 9]. 1. Optical scanner. 2. Capacitive scanner. 3. Ultrasonic scanner. Optical fingerprint scanner as shown in Fig. 2a [9] uses light to scan the fingerprints on the device and capture a 2D picture of the finger. These scanners are rarely used nowadays because they are bulky and provide low level of security. They can be tricked easily using high-resolution fingerprint images, prosthetic and artificial fingers [8]. Capacitive scanner as shown in Fig. 2b [9] uses a collection of the small capacitors to detect the capacitance of ridge and valley with capacitor plate. The capacitance between ridges and capacitor plate is slightly less than the capacitance
Fig. 2 Example of a optical [9] b capacitive [9] and c ultrasonic scanner [8]
268
P. Singh and D. Agarwal
Fig. 3 Models of contactless fingerprints devices TBS 3D ENROLL [10], finger-on-the-flymorphotrak_11621570 [11], digital camera and mobile phone
between valley and capacitor plate, because the distance between ridge and capacitor plate is smaller than the distance between valley and the capacitor plate. Capacitance in each capacitor is directed to the operational amplifier and noted through the analogto-digital convertor [8]. Ultrasonic scanners produce 3D scans of the finger, presently being used in highend smartphones. It includes a set of ultrasonic transmitters and receivers. A set of transmitters emit ultrasonic pulses, which reflect from pores, ridge and valley of the finger. And then, the reflected pulses are sensed by receivers. An image of a fingerprint is created by measuring the mechanical stress because of the intensity of reflected pulses. They are mostly used as in-display scanners in phones as shown in Fig. 2c [8]. Both ultrasonic and capacitive scanners are not easily fooled, the only way to forged is to hack the hardware or software. Contactless fingerprint acquisition for the cross-domain fingerprint recognition requires digital cameras to get the image of the finger, for which any kind of digital camera, phone and 3D scanners can be used, which do not involve touching. They capture the image or video of fingers by moving the hand over the scanner without touching anything. Figure 3 shows the few such devices used for contactless fingerprint acquisition [10, 11].
3 Preprocessing Both contact-based and contactless fingerprint techniques have their own limitations. In contact-based pressure on the imaging surface may degrade the quality of the fingerprint image. And latent print left from the user can cause the risk of the imposter gaining the access. Another limitation is hygiene issues after the emergence of covid19.
Contactless Fingerprint Matching: A Pandemic Obligation
269
Fig. 4 Stages of preprocessing
Contactless fingerprints are distortion free, but there are some challenges also like low ridges and valley contrast, illumination and some part in focus. So, to overcome these problems, preprocessing of fingerprint image is essential. Preprocessing is an important part before feature extraction which comprises image segmentation, grayscale image conversion, image enhancement, noise reduction and binarization as illustrated in Fig. 4. Image acquired from a digital camera is a colored image which is converted into grayscale. Then, the resulting grayscale image is enhanced to improve the contrast between ridge and valley. It can be accomplished using histogram equalization techniques. Further noise reduction is done to remove noise from the image, and then, the image is normalized [12]. Before feature extraction, grayscale image is binarized to transfer image into (0 and 1) binary form. Preprocessing is an extensive and multi-layered process. Grosz et al. [13] proposed a method that has a complex and extensive preprocessing pipeline which consists of various deep networks of normalization, segmentation, enhancement, deformation correction and utilized a variation of DeepPrint for crossdomain matching [13].
3.1 Segmentation Region of interest (ROI) extraction of the fingerprint image is necessary to remove unwanted background. Contactless fingerprints obtained from various digital cameras comprise unsegmented images with different illuminations, resolutions and backgrounds [13]. Improper segmentation leads to a huge amount of background portion instead of desired finger section. So, failure in segmentation and scaling is the major reason for false rejection and false acceptance error [14]. Grosz et al. [13] have given a U-net segmentation network which takes an unsegmented image of (m × n) dimension and produces a segmented mask of dimension (m × n) as shown
270
P. Singh and D. Agarwal
Fig. 5 Segmented output for input image [13]
in Fig. 5 [13]. They used a trained autoencoder to cut the distal phalange of input contactless image [13]. Also, Malhotra et al. [15] proposed an algorithm which combines the region covariance-based salience with skin-color measurements of actual finger-selfie segmentation [15]. Two main characteristics of finger region are skin tone and the prominent nature of the finger section. This approach is computationally efficient and outperforms deep architecture-based segmentation. Lin and Kumar [6] suggested that ROI extraction and enhancement of both contact and contactless fingerprint images are essential in fingerprint matching, especially when data are insufficient. For this, they subsampled and cropped both contact-based and contactless images, to standardize them to be of similar scale and size [6].
3.2 Enhancement Enhancement of the image is to improve the image quality by filtering techniques, contrast enhancement and noise removal. Sheregar et al. [5] emphasize on enhancing the input images, because both feature extraction and fingerprints’ matching performance depend on the quality of images [5]. Figure 6 [15] shows the original finger image and corresponding segmented and enhanced image. Segmentation and enhancement both are important to remove the background and to get the clear ridge and valley print with enhanced contrast as shown in figure. Contactless image has drawbacks like varying illumination, magnification and contrast. So, it is very complex to attain the desired features for matching from the touchless fingerprint. Birajadar et al. [16] projected a novel monogenic-waveletbased enhancement approach for touchless fingerprints [16]. The log-Gabor filter is used to get the illumination-invariant phase congruency features, and the chief expertise of monogenic wavelets is to extract the local phase and orientation simultaneously [16].
Contactless Fingerprint Matching: A Pandemic Obligation
271
Fig. 6 a Finger image b segmented image c enhanced image [15]
Lin and Kumar [6] included histogram equalization operation along with conventional fingerprint enhancement method to enhance the contrast between ridge and valley patterns and to increase the resemblance between the two types of fingerprint images [6]. Grosz et al. [13] applied a series of enhancement techniques to raise the contrast in ridges and valleys of the contactless image. First one is Adaptive Histogram Equalization for contrast enhancement between ridges and valleys. And furthermore, pixel gray-level inversion is implemented to invert the ridges between contactless and contact-based [13]. They also tried out SOTA super resolution and deblurring approach like RDN to enhance the contactless fingerprint, but negligible improvement is attained at the cost of expense. Malhotra et al. [15] emphasize the challenges in the recognition of finger-selfies obtained from smartphone cameras. Ridges and valley contrast got affected by the noise produced by illumination variation, which interrupts the ridge extraction procedure. First, segmented images are converted into the grayscale image and speckle noise is eliminated with a median filter, and to lessen the effect of illumination variation, histogram equalization is used. The resultant image contains two frequency components, first is the high-frequency component formed by ridge information and low-frequency component established by valley and noise information [15]. Enhancement algorithms on contactless fingerprint images show remarkable change in performance of fingerprint recognition systems by improving the accuracy (R1 and EER).
272
P. Singh and D. Agarwal
3.3 Distortion Correction Next important part in the preprocessing is to correct distortion and removal of noise. Grosz et al. [13] explained two types of distortions. 1. Perspective distortion: induced by unpredictable distance between finger and camera [13]. 2. Nonlinear distortion: produced by elasticity of skin when pressed on a flat platform [13]. To remove these distortions, Grosz et al. [13] used a STN model where the ridge frequency of the contactless image is normalized to get the ridge spacing of 500 ppi in the contactless image. Deformation warping is also performed on contactless images to achieve the distortion existing in contact-based images because of the elasticity of the human skin [13].
4 Database In contactless fingerprint matching, we are doing cross-domain matching between contactless and contact-based datasets. Table 1 shows the database used in various contactless and contact-based cross-domain fingerprint matching approaches. As given in the table, all databases have wide variety of variation in size and resolution. For example, PolyU database [19] is highly constrained, high-resolution contactless database, and on the other hand, IIT Bombay database [16] has varying resolutions and illuminations. So, selecting both types of database can improve the performance of the biometric system.
5 Feature Extraction In a biometric system, extracting the features is the course of picking up the meaningful information from the raw fingerprint by selecting and combining similar variables into a feature. It reduces the computational time and redundancy in the data and makes classification easier for big data. In fingerprint biometric systems, minutiae points are major features measured to define the uniqueness in the fingerprints [5] and are invariant in nature with time, which involves two main steps, thinning and minutiae points’ extraction. Thinning is a morphological operation, which involves reducing the ridge width size to one pixel wide. Figure 7 [20] shows the binarized and thinned output for corresponding original enhanced fingerprint. It reduces the complexity of the fingerprint matching because it is a sort of point pattern matching method [21]. Figure 8 represents the type of minutiae points on enhanced fingerprint image with white ridges and black valley, and further explained in Table 2 [20].
Contactless Fingerprint Matching: A Pandemic Obligation
273
Table 1 Dataset used in contactless to contact-based fingerprint matching studies Dataset
Subject Image Unique Contactless/ Contactless size CL/ fingers contact-based device CB images
Contact-based device
UWA benchmark 3D fingerprint database, 2014 [17]
150
1024 × 1280/ 640 × 480
1500
3000/6000
3D scanner (TBS S120E) [17]
Crossmatch verifier 300 LC2.0 [17]
Mantech phase2, 2015 [18]
496
NA
4960
NA
MorphoTrak finger-on-the-fly (FOTF), IDair innerID (on iPhone4) optical systems AOS ANDI On-The-Go (OTG), etc. [18]
Morphotrust touchprint 5300, northrop grumman biosled, CMR2 (CFPv1)—rolled g. TouchPrint 5300 (CFPv1)—rolled h. SEEK II (CFPv1), CMR2, SEEK, etc. [18] URU 4000
PolyU 2D NA contactless and 2D contact-based database, 2018 [19]
1400 × 336 900/328 × 356
2976/2976
NA
IIT Bombay database (touchless and touch-based fingerprint), 2019 [16]
170 × 260/ 260 × 330
200
800/800
Lenovo vibe k5 eNBioScan-C1 smartphone [16] (HFDU08) scanner [16]
ISPFDv2, 76 2020 [13, 15]
4, 208 × 3, 120
304
17,024/2432
OnePlus 1, Micromax Canvas smartphones [13]
Secugen Hamster IV [13]
ZJU finger photo and touch-based fingerprint database, 2022 [13]
4208 × 3120
824
9888/9888
HuaWei P20, OnePlus eight, Samsung s9 + , smartphones [13]
URU 4500 [13]
200
206
274
P. Singh and D. Agarwal
Fig. 7 a Enhanced image b binarized image c thinned image Fig. 8 Minutiae points in fingerprint-enhanced image where ridges are white [20]
Table 2 Minutiae points Minutiae points
Description
Ridge ending End points of ridges Ridge bifurcation
Where single ridge split into two ridges [20]
Dots
A small point-type ridge
Islands
A slightly longer ridge than dot
Ridge enclosure
Small empty space between two ridges where one ridge bifurcates in two and meets again to form one ridge again
Bhattacharya and Mali [12] explained basic steps of minutiae extraction: 1. Checking whether the pixel is from the ridge or not. 2. If yes, then check if it is a split (bifurcation), start, ending or something else, and a minutiae group is formed accordingly.
Contactless Fingerprint Matching: A Pandemic Obligation
275
Fig. 9 a Original image b marked minutiae [19]
3. Next step is to remove minutiae points which exist at the border of the region of interest and are caused by spurious noise of input image. According to Anil Jain et al. [21] for all remaining minutiae points, three parameters are noted for the further assessment [21]: 1. X-coordinate. 2. Y-coordinate. 3. Orientation of the associated ridge [22]. According to above-mentioned steps and parameters, minutiae points are mined from the binarized outcome. Figure 9, [19] displays detected minutiae points in the minutiae extraction phase, where red circles represent ridge endings and blue circles represent ridge bifurcations’ minutiae points with original contactless image [19]. Accurately minutiae feature extraction from both fingerprint images is very difficult, and missing and false extracted minutiae points result in false minutiae extraction and reduce the matching accuracy of the system. Lin and Kumar [6] used the same approach by Jain et.al. [21], a traditional minutiae extraction algorithm to generate the ridge image superimposed with minutiae points and direction of related ridge. Where minutiae points are indicated using solid circle and direction of ridge by short line [6]. Then again, Lin and Kumar [19] used the same traditional approach, where they used minutiae points for orientation, and minutiae linked ridges are acknowledged as an added feature throughout the cross-matching process [19]. Closed ridges are detached, short ridges are rejected and split minutiae-related ridges are divided into three ridges. Ridge containing both minutiae ending and split is considered as one ridge [19]. Malhotra et al. [15] applied a finger-selfie representation based on deep scattering network (DSN) for feature extraction, and given a steady translation representation for finger-selfies which is rotational invariant [15]. DSN a local descriptor includes both multi-dimensional and multi-scale information, and computed through a group of wavelet decompositions with complex modulus [15]. Similarly, Grosz et al. [13] applied a DeepPrint model which is a deep network explicitly modeled for fingerprint depiction and extraction through an in-built alignment module and minutiae
276
P. Singh and D. Agarwal
information [13]. This extraction algorithm extracts both textural and minutiae sets, then scores are compared and both representations are merged together via a sum score fusion. Likewise, Birajadar et al. [16] used phase congruency with monogenic wavelet enhancement for feature extraction, and it is an illumination-invariant phase-based image processing model [16].
6 Matching Matching is the vital operation in the fingerprint recognition system and it includes following steps: 1. Find the alignment which has maximum number of minutiae pairing between input image minutiae set and stored templet in the database. 2. Similarity matching between the input fingerprint and stored template [3]. 3. If the similarity (or matching score) between them is greater than the threshold value, then both fingerprint templates belong to the same person. Otherwise, continue matching among other minutiae templates. As explained by Sheregar et al. [5], matching score is calculated from the following formula: Matching Score =
Matching Minutiae , Max(NT, NI)
(1)
where NT is overall minutiae points in the stored template and NI is overall minutiae points in input fingerprint image, respectively [5]. Match score 1 means input image matched perfectly and Match score 0 means completely mismatched. Lin and Kumar [6] have presented a multi-Siamese CNN architecture which stressed on creating a deep feature representation for matching. For this fingerprint images and extracted features are used to train the model [6]. This approach outperforms many CNN-based methods, but multiple CNNs’ training makes this model very complex. Malhotra et al. [15] applied a Random Decision Forest (RDF) for validating the finger-selfies. RDF is used as a binary classifier with multiple decision trees; it is a nonlinear ensemble-based classifier [15]. Where each binary tree is assigned with leaf nodes as genuine and imposter, accordingly each decision tree labels the input images as matched or non-matched. And eventually, the result is figured out through vote of all the trees [15]. Lin and Kumar [19] utilized a deformation correction model (DCM) for matching in another study. In this approach, fingerprint matching depends on its impression type and intensity assessment for contact-based images, and an automatically varied deformation model is chosen [19]. Minutiae correction is done for sample minutiae mined from contact-based image and proposed model transform of all minutiae points on the associated ridge. Then for matching, these transformed points and ridges are
Contactless Fingerprint Matching: A Pandemic Obligation
277
used [19]. And, ridge matching score is being computed by the ridges, related to matched minutiae pairs.
7 Comparison Among Approaches There are various approaches for preprocessing, extraction and matching in fingerprint matching. Table 3 elaborates the different approaches used in various crossdomain matching schemes. As it is clear from the table that some approaches are used in more than one. Some show good results on contact-based data but not on contactless. We can use one or a combination of more approaches according to our database and requirements. Performance of contactless to contact-based systems is evaluated using cumulative match characteristic (CMC) along with Receiver Operating Characteristic (ROC). Figure 10 shows the examples of ROC and CMC curves [22]. ROC curve represents genuine acceptance rate with FMR in verification mode, and CMC curve represents identification accuracy with rank in identification phase. In ROC, fingerprint samples are compared with their own stored image in the database, and a genuine or impostor score is generated. For this, False Matching Rate and False Non-Matching Rate are computed for multiple thresholds. In identification mode CMC is evaluated by comparing fingerprint samples against all the gallery samples stored in the database. Firstly, the rank at which a true match occurs should be determined. Equal error rate (EER) can be described as the general accuracy of the fingerprint matching [23], and it is calculated using False Matching Ratio along with False Non-Matching/Rejection Ratio [24]. False Matching Ratio [24] (FMR): When input image is matched with someone else’s template, we can say that an unauthorized person gets the acceptance, also known as False Acceptance Ratio [5]. FMR =
False Matches . Imposter Attempts
(2)
False Non-Matching Ratio (FNMR): When input image does not match with its own stored template in the database, the authorized person does not get the access/ acceptance, also known as False Rejection Ratio [5]. FNMR =
False Non Matches . Enrolle Attempts
(3)
The relationship between FMR, FNMR and EER is shown in Fig. 11 [26]. As shown in the figure, the False Matching Ratio decreases, as the False Non-Matching Ratio increases and vice versa [27]. Both FMR and FNMR curves intersect at point
Year of publication
2017
2018
2019
2020
2020
2021
Approach
Lin and Kumar [6]
Lin and Kumar [19]
Birajadar et al. [16]
Malhotra et al. [15]
Tan and Kumar [25]
Grosz et al. [13]
Segmentation—U-net Enhancement—Adaptive Histogram Equalization
Finger poses compensation
–
Enhancement—monogenic wavelet
Enhancement—Gabor filter
Enhancement—histogram equalization
Preprocessing
Matching
Spatial transformer network (STN) produces ridge structure
Deep neural network
Deep scattering network (DSN)
Phase congruency
Verifinger minutiae representation, compute match score as a weighted fusion
Random Decision Forest (RDF)
Source—AFIS, NBIS-NIST, Verifinger-SDK
Ridge extraction Deformation correction model (DCM)
Ridge extraction Siamese network
Feature extraction
Table 3 Comparison of approaches for contactless and contact-based fingerprint recognition systems
Less than 1%
10.85, 2.27, 0.63 of three different database
2.11–10.35
–
4.46
9.89
EER
A quality assurance algorithm should be used at the time of capturing contactless fingerprint images
Ground truth (labeled) contactless images are important to improve matching specifically for the low-quality areas
Improvement in performance of both finger-selfie to finger-selfie and finger-selfie to live scan fingerprint matching is achieved
Performance can be improved by including quality assessment of touchless fingerprint
Exploring the deep learning-based approaches is suggested
Using larger fingerprint databases is suggested
Remark
278 P. Singh and D. Agarwal
Contactless Fingerprint Matching: A Pandemic Obligation
279
Fig. 10 a ROC and b CMC curve [22] Fig. 11 Relation between FMR, FNMR and EER [26]
EER. From Fig. 11, EER can be described as the point where percentage of FMR and FNMR are same. Configuration of FMR and FNMR depends upon the type of application for which fingerprint recognition system is going to be used. Reducing FMR value will increase the FNMR sharply. And, the system will be more secure but less convenient because some users will be rejected falsely by the system [27]. But making it user-friendly by reducing FNMR, the system will be less secure.
8 Result and Conclusion This paper gives a comprehensive review on contactless fingerprint matching with contact-based fingerprint. Here, comparison among different approaches for crossdomain matching is done. Some studies focus on feature extraction and matching. And some emphasize on enhancing the input image to increase the compatibility between contactless and contact-based images for better cross-domain matching.
280
P. Singh and D. Agarwal
The primary concern in contactless and contact-based fingerprint matching should be preprocessing, because both contactless and contact-based have different limitations. So, the same preprocessing method cannot be applied on both types of fingerprint. Gabor filters’ bank tuned with ridge frequency and orientation shows good result in contact-based fingerprint. An adaptive histogram provides better contrast enhancement in contactless fingerprints. So, merging two or more approaches according to the requirement of application and database is suggested. To get a successful cross-domain contact-based and contactless fingerprint matching, using a larger and more challenging database is required. As explained in Table 1, we have both highly constrained and challenging databases available. Considering both types of data to train and test, the performance of the model is recommended for a more realistic and reliable biometric system.
References 1. Mishra A, Agrawal R, Khan MA, Jalal AS (2019) A robust approach for palmprintbiometric recognition. Int J Biometrics 11(4):389–408 2. Agarwal R, Jalal AS, Arya KV (2020) A review on presentation attack detection system for fake fingerprint. Mod Phys Lett B 34(5): (2020):2030001 3. Jain AK, Ross A, Prabhakar S (2004) An introduction to biometric recognition. IEEE Trans Circ Syst Video Technol 14(1):4–20. https://doi.org/10.1109/TCSVT.2003.818349 4. Okereafor K, Ekong I, Markson IO, Enwere K (2020) Fingerprint biometric system hygiene and the risk of covid-19 transmission. JMIR Biomed Eng 5(1):e19623. https://doi.org/10.2196/ 19623 5. Sheregar N et al (2021) Touchless fingerprint recognition system. Int J Comput Eng Res (IJCER) 11(4):24–29 6. Lin C, Kumar A (2017) Multi-Siamese networks to accurately match contactless to contactbased fingerprint images. In: 2017 IEEE International joint conference on biometrics (IJCB). IEEE, pp 277–285. https://doi.org/10.1109/BTAS.2017.8272708 7. Lin C, Kumar A (2016) Improving cross sensor interoperability for fingerprint identification. In: 2016 23rd International conference on pattern recognition (ICPR). IEEE, pp 943–948. https://doi.org/10.1109/ICPR.2016.7899757 8. Wate Y (2019) Explained: different types of fingerprint scanners. Retrieved from https://tec hpp.com/2019/03/26/types-of-fingerprint-scanners-explained. Accessed on 18 Apr 2022 9. Hsu DS (2016) Fingerprint sensor technology and security requirements. Retrieved from https:/ /semiengineering.com/fingerprint-senor-technology-and-security-requirements/. Accessed on 8 Dec 2022 10. Touchless Biometric Systems (TBS) 3D ENROLL three-dimensional and contact-free sensor. Retrieved from https://www.sourcesecurity.com/touchless-biometric-systems-tbs-3d-enrollaccess-control-reader-technical-details.html. Accessed on 18 Apr 2022 11. Finger on the Fly. Retrieved from https://www.securityinfowatch.com/access-identity/biomet rics/product/11621563/idemia-formerly-otmorpho-finger-on-the-fly. Accessed on 18 Apr 2022 12. Bhattacharya S, Mali K (2011) Fingerprint recognition using minutiae extraction method. In: Proceedings of the International conference on emerging technologies (ICET-2011): international journal of electrical engineering and embedded systems, pp 975–4830 13. Grosz SA, Engelsma JJ, Liu E, Jain AK (2022) C2CL: contact to contactless fingerprint matching. IEEE Trans Inf Forensics Secur 17:196–210. https://doi.org/10.1109/TIFS.2021. 3134867
Contactless Fingerprint Matching: A Pandemic Obligation
281
14. Christopher JD, Strickland H, Morgan B, Dey M et al (2013) Chapter 8 Performance characterization of EDA and its potential to improve decision making in product batch release. In: Good cascade impactor practices, AIM and EDA for orally inhaled products. Springer, Boston, MA., pp 173–249 15. Malhotra A, Sankaran A, Vatsa M, Singh R (2020) On matching finger-selfies using deep scattering networks. IEEE Trans Biometrics Behav Identity Sci 2(4):350–362. https://doi.org/ 10.1109/TBIOM.2020.2999850 16. Birajadar P, Haria M, Kulkarni P, Gupta S, Joshi P, Singh B, Gadre V (2019) Towards smartphone-based touchless fingerprint recognition. S¯adhan¯a, 44(7):161. https://doi.org/10. 1007/s12046-019-1138-5 17. Zhou W, Hu J, Petersen I, Wang S, Bennamoun M (2014) A benchmark 3D fingerprint database. In: 2014 11th International conference on fuzzy systems and knowledge discovery (FSKD). IEEE, pp 935–940. https://doi.org/10.1109/FSKD.2014.6980965 18. Ericson L, Shine S (2015) Evaluation of contactless versus contact fingerprint data, phase 2 (Version 1.1). NCJ No. 249552. DOJ Office of Justice Programs, National Institute of Justice: Sensor, Surveillance, and Biometric Technologies (SSBT), Center of Excellence (CoE). Defense Biometrics & ForensicsOSD AT&L, ASD(R&E): ManTech Advanced Systems International 19. Lin C, Kumar A (2018) Matching contactless and contact-based conventional fingerprint images for biometrics identification. IEEE Trans Image Proc 27(4):2008–2021. https://doi. org/10.1109/TIP.2017.2788866 20. Thakkar D (2017) Minutiae based extraction in fingerprint recognition. Retrieved from https://www.bayometric.com/minutiae-based-extraction-fingerprint-recognition/#:~:text= Minutiae%20can%20be%20defined%20as,can%20be%20of%20many%20types.&text= Ridge%20ending%20is%20the%20point,into%20two%20or%20more%20ridges. Accessed on 26 Feb 2022 21. Jain A, Hong L, Bolle R (1997) On-line fingerprint verification. IEEE Trans Pattern Anal Mach Intell 19(4):302–314 22. DeCann B, Ross A (2013) Relating ROC and CMC curves via the biometric menagerie. In: 2013 IEEE Sixth international conference on biometrics: theory, applications and systems (BTAS). IEEE, pp 1–8. 23. Obaidat MS, Traore I, Woungang I (2019) Biometric-based physical and cybersecurity systems. Springer, Cham 24. Monika MK (2014) A novel fingerprint minutiae matching using LBP. In: Proceedings of 3rd international conference on reliability, infocom technologies and optimization. IEEE, pp 1–4. https://doi.org/10.1109/ICRITO.2014.7014742 25. Tan H, Kumar A (2020) Towards more accurate contactless fingerprint minutiae extraction and pose-invariant matching. IEEE Trans Inf Forensics Secur 15:3924–3937. https://doi.org/ 10.1109/TIFS.2020.3001732 26. Noviyanto A, Pulungan R (2011) A comparison framework for fingerprint recognition methods. In: Proceedings of the 6th SEAMS—UGM conference 2011: computer, graph and combinatorics, pp 601–614 27. FAR and FRR: security level versus user convenience. Retrieved from https://www.recogtech. com/en/knowledge-base/security-level-versus-user-convenience. Accessed on 4 Mar 2022
Deep Reinforcement Learning to Solve Stochastic Vehicle Routing Problems Sergio Flavio Marroquín-Cano, Elías Neftalí Escobar-Gómez, Eduardo F. Morales, Eduardo Chandomi-Castellanos, and Elizeth Ramirez-Alvarez
Abstract In recent years, Artificial Intelligence techniques, like Deep Reinforcement Learning (DRL), have been used to propose solutions to complex computational problems. Vehicle Routing Problems are clear examples of that. Sometimes these kinds of problems consider uncertain information on their formulations, in what is known as Stochastic Vehicle Routing Problems (SVRPs), which are particularly useful to model real environments. In this work, the generic aspects and some research work that allowed advances in the state-of-the-art of SVRP’s solutions with DRL are discussed. The study conducted shows that Multi-Layer Perceptrons, for action value function modeling, are the most widely used DRL setting. However, it is necessary to venture into the use of other state-of-the-art approaches, such as models based on Multi-Head Attention trained with recent Actor-Critic algorithms. Keywords Stochastic Vehicle Routing Problem · Intelligence transportation · Deep Learning · Reinforcement Learning · Deep Reinforcement Learning
S. F. Marroquín-Cano (B) · E. N. Escobar-Gómez · E. Chandomi-Castellanos Tecnológico Nacional de México, Instituto Tecnológico de Tuxtla Gutierrez, Tuxtla Gutierrez, Chiapas, Mexico e-mail: [email protected] URL: https://www.tuxtla.tecnm.mx/ E. F. Morales Instituto Nacional de Astrofísica Óptica y Electrónica, Tonantzintla, Puebla, Mexico URL: https://www.inaoep.mx/ E. Ramirez-Alvarez Tecnológico Nacional de México, Instituto Tecnolgico de Lázaro Cárdenas, Lázaro Cárdenas, Michoacán, Mexico URL: https://lcardenas.itlac.mx/ © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 689, https://doi.org/10.1007/978-981-99-2322-9_21
283
284
S. F. Marroquín-Cano et al.
1 Introduction The Vehicle Routing Problem (VRP) can be seen as a generalization of the canonical combinatorial optimization problem known as the Traveling Salesperson Problem (TSP) [5]. This particular name originated in its conception [8], which task was to optimize fuel dispatch from a depot to different service stations using land vehicles. Currently, there are many variants of the problem [31], among which Stochastic VRPs (SVRPs) stand out. As they consider random information, it is possible to adapt them to realistic scenarios. Mathematically, SVRPs can be modeled as a sequential decision-making process [40]. This feature is particularly relevant for Reinforcement Learning (RL), which solves sequential decision problems. In RL, through trial and error, software agents are trained to perform a specific task in a generally uncertain environment [4]. Unfortunately, the classical algorithms used in RL are not powerful enough to overcome the curse of dimensionality presented in routing problems. Because of this, the use of hand-crafted heuristics and metaheuristics proliferated in the solution of these decision processes. In the last decade, Machine Learning techniques have benefited from the inclusion of Deep Learning (DL) tools within their paradigms, and RL has not been the exception. With this, Deep Reinforcement Learning (DRL) techniques emerge, achieving great advances in fields such as robotics, control, and especially operations research. The formulation of Transformers [42] further accelerated the process, allowing the development of robust off-line models, with the ability to process large amounts of data efficiently and generate competitive solutions without the need for the use of hand-crafted heuristics. All these advances, in such a short time, have generated a dispersed body in the literature, so it is necessary to narrow them down and comprehensively present them, allowing researchers with interest in the area to acquire the bases, advances, and challenges of this exciting field of study. This paper is intended, on the one hand, to provide an overview of SVRPs, RL, and DRL, while on the other hand, to identify the most recent and relevant work on DRL models aiming to solve SVRPs. The organization of the text is quite intuitive. The theoretical aspects correspond to a formal extension of what has been discussed in this introduction. This is followed by a review of recent papers on the subject, ending with conclusions that reinforce the issues studied.
2 Stochastic Vehicle Routing Problems VRPs are a family of combinatorial optimization problems whose details depend strongly on their application and can be understood as extensions of the TSP. The latter is classified as NP-Hard, even in its two-dimensional Euclidean version [5], thus all variants of the VRP are also computationally classified in the same way [2]. An
Deep Reinforcement Learning to Solve Stochastic Vehicle Routing Problems
(a) TSP instance: A distribution of points
(b) TSP solution. (c) CVRP instance: One only closed the central point is called the “depot” path is gotten
285
(d) CVRP solution. Several closed paths are obtained
Fig. 1 Graph representations of the TSP and the CVRP
t=1
t=2
t=3
t=4
Fig. 2 Instantiation and sequential solution of a DVRP: at first, some requests are made (t = 1), so an initial schedule is planned (t = 2), but it could be changed with new requests (t = 3). The process is continued until the final step
overview of the subject is provided in [39]. The stochastic versions of these problems (SVRPs) arise in works such as [29]; their main characteristic is that some of the variables involved in the formulation may be random. Figure 1 shows a graphical representation of the TSP and the Capacitated VRP (CVRP), while the respective diagram for a type of SVRP is shown in Fig. 2. n , Formally, a TSP instance can be described as a set of points or nodes X = { xi }i=1 2 xi ∈ IR , like in Fig. 1a. The goal is to find a directed closed path L = (l(1), . . . , l(n)), connecting all the points, see Fig. 1b, such that its Euclidean distance d, defined in Eq. 1, is minimal. d(L | X ) =|| xl(1) − xl(n) || +
n−1
|| xl(i) − xl(i+1) ||
(1)
i=1
The closed path L, in Eq. 1, can be thought of as a permutation of the vector indexes [5], indicating which order to take to visit the nodes, so l(i) is the index i of the visit sequence; we use this notation for clarity. On the other hand, a CVRP instance is a multiple TSP version, where every node can be related to certain features, like the demand, or a temporal one. In Figs. 1c, d, a larger point represents a larger demand. All this information can be put together, getting a k-dimensional vector ξi ∈ IRk , an example of this is used in
286
S. F. Marroquín-Cano et al.
[33]. Sometimes, these features are manipulated independently or are taken directly from higher-dimensional data [17]. Besides, a fleet of vehicles {v j }mj=1 , with fixed capacity C, must be used to satisfy the demands, so a capacity constraint is also taken into account, i.e., ci ≤ C, where ci is the demand of the node i. The solution to the problem is obtained when an assignment of nodes to visit L j , for every vehicle v j , is done, such that the capacity constraint is met and the routing costs are minimized, which is associated with the traveled distances. In a stochastic routing variant, some features of the node vector ξi are random variables following certain probability distributions. The most common version of such problems is known as Dynamic VRP (DVRP), see Fig. 2, in which case the spatio-temporal coordinates of the nodes (xi , yi , ti ) are unknown beforehand. Generally, xi and yi , the spatial coordinates, are modeled with a Normal Distribution, while ti follows a Poisson Distribution. There are many possibilities for considering stochastic information such as travel times, vehicle availability or operators, disruptive changes, demanded amounts, and so on. Some good recent reviews of the subject are presented in [31, 36].
3 Reinforcement Learning It is not possible to understand DRL without a solid knowledge of RL. Due to the above, this section reviews the fundamental concepts of this framework. As a bibliographic reference, the main exponent to understand the basis of RL is found in [37]. There are two fundamental terms within the paradigm, the agent and the environment. An agent represents the abstract concept of a decision-maker. The environment, on the other hand, can be thought of as space, virtual or real, that possesses all the conditions to generate observations and present changes in them when the agent interacts with it. Therefore, the basis of RL is interaction. Mathematically, environments are modeled as Markov Decision Processes (MDPs), represented by a tuple: S, A, R, P, where: • S is the set of states, i.e., all the possible configurations of the environment. • A is a set of actions. Actions are the possible decisions that the agent can take. Commonly, a ∈ INn ∨ a ∈ IRn , n ∈ IN. • R is the reward function, the analytical element in charge of discriminating between good and bad actions by returning scalar values. Its design is specific to each problem and is the artifice that the programmer can take advantage of so that the agent can easily learn a certain task. • P is the transition function, and it describes how the environment evolves in response to the agent’s actions. If P can be established as a probability distribution, then the described environment is stochastic so that the transitions of the observations are uncertain, otherwise, the environment is deterministic. Commonly, this function is not explicitly known, which applies to routing problems. When
Deep Reinforcement Learning to Solve Stochastic Vehicle Routing Problems
287
Fig. 3 RL learning cycle applied to an SVRP: An observation of the environment is given to an agent that decides and takes a particular action to change the routes. During training, a reward signal is given to the agent
the function is known, the simplest way to solve the problem is using dynamic programming techniques [38]. In their general form, these systems must satisfy the Markov Property, defined in Eq. 2. P = P(st+1 | st , at ) ≈ P(st+1 | st , st−1 , . . . , s0 , at , at−1 , . . . , a0 )
(2)
Such that st , st1 , . . . , s0 ∈ S y at , at−1 , . . . , a0 ∈ A, the subindex t represents a decision epoch, also called time step. The above expression simply states that the immediate future depends only on the present. Mathematically, the ability of agents to exhibit “intelligent” behavior is modeled by a function π , called Policy. This is a criterion or set of rules by which the agent determines which action to execute at each decision epoch. Policies can also be deterministic or stochastic. In the first case, the agent must always perform the same action when observing a particular state, while in the second case there is a probability distribution over the possible actions. The latter being the most general case, in this way π = π(at | st ). The analytical formulation presented here makes it possible to formalize agent– environment interactions, see Fig. 3: At each decision epoch t, an agent observes the state of its environment st ∈ S, and influences it, by carrying out an action at ∈ A in line with its current Policy π(at | st ), which causes a transition st+1 and a reward
288
S. F. Marroquín-Cano et al.
rt+1 , whose value is determined by R. The objective is to maximize the rewards accumulated during an episode, called the return, and expressed in Eq. 3, where an episode is nothing more than a certain number of decision epochs, from the beginning of the decision-making process to the end. If the task to be performed has a natural end, then it is said to be episodic, whereby there is a fixed terminal state, otherwise, it is called non-episodic, in which case we speak of trajectories rather than episodes. Generally, agents will need several episodes or trajectories to learn how to solve a given task [38]. G t = rt+1 + γ rt+2 + γ 2 rt+3 + . . . + γ T −1r T =
T
γ j rt+ j+1
(3)
j=0
In Eq. 3, G t is the return from decision epoch t, T is the total number of decision epochs considered, and γ ∈ [0, 1] is called the discount factor, the value that determines the importance of immediate and future rewards. Therefore, the algorithms used to solve problems modeled as MDPs try to establish a Policy indirectly or directly, which allows them to be classified into Value-Based methods and Policy-Based methods, respectively. Value-Based methods make successive estimates of the expected return using both or one of the Bellman Eqs. 4 and 5, a process known as bootstrapping; so that those actions that maximize rewards can be determined. π(at | st ) P(st+1 | st , at )[rt+1 + γ Vπ (st+1 )] (4) Vπ (st ) = E[G t | st ] = A
S
Q π (st , at ) = E[G t | st , at ] =
P(st+1 | st , at )[rt+1 + γ Vπ (st+1 )]
(5)
S
Within the Policy-Based methods, there are Policy-Gradient techniques, whose foundation is the Policy-Gradient Theorem, Eq. 6. ∇θ Vπ (θ ) = Eτ
T −1
∇θ log π(at | st ; θ )G t
(6)
t=0
Usually, this Gradient is approximated by random sampling, obtaining the θ parameters of the Policy π(at | st ; θ ). Additionally, this equation often suffers from high variance so, a function called “baseline”, is used to reduce the return. A third solution approach combines the Policy-Gradients with the Value-Based methods, using a value function as the baseline and updating the Policy with bootstrapping. These are called Actor-Critic methods.
Deep Reinforcement Learning to Solve Stochastic Vehicle Routing Problems
289
In the work of Ulmer et al. [40], MDPs are proposed as the ideal theoretical models to formulate stochastic routing problems. This is even without considering DL models to enhance the solution methods. Some works that support this idea are [19, 36].
4 Deep Learning and Deep Reinforcement Learning The concept of Deep Learning (DL) refers to a set of computational tools that, by varying their parameters, allow modeling relationships between input and output data. This variation is achieved using optimization algorithms and processing a large amount of information, a procedure known as training. Its algorithmic structures allow stacking an arbitrary number of processing layers, hence such a designation of “deep”, learning data representations with various levels of abstraction (see [11, 26]). Conventionally, these tools are classified into architectures, whose main difference is associated with the type of data processing in which they specialize. The most wellknown ones are Multi-Layer Perceptrons (MLPs) for numerical data processing, convolutional networks for image processing, and recurrent networks for time series processing. However, as of 2017, the architecture known as Transformer [42], thanks to its Multi-Head Attention (MHA) mechanism, has shown remarkable performance in terms of throughput, adaptation, and generalization with respect to its predecessors. The great disadvantage of the latter is that the amount of data required for training is even greater compared to other architectures. Concerning the application of DL in the solution of routing problems, we found that some early works took advantage of the capability of neural networks as universal approximators combined with the use of heuristics and other conventional tools, see [3, 30]. With the introduction of Pointer Networks (PNs) [43], in 2015, the idea of generating end-to-end models to solve routing problems arose. PNs are architectures inspired by sequence-to-sequence models [44], whose attention mechanism is modified in such a way as to generate an output sequence, which represents a permutation of the input sequence; see Eq. 7. Originally, this idea was applied to the TSP in a supervised approach, reaching optimal solutions in instances of up to 100 nodes. P(L | X ; θ ) =
n
π(l(i) | l(1), . . . , l(i − 1), X ; θ )
(7)
i=1
The given statement, in Eq. 7, is one of the foundations for the DL models used in predictions of sequences like PNs and even Transformers, where X , n, L, and l(i) have the same meaning as those elements in Eq. 1. The idea is to approximate the products π , using DL models, that compute the probabilities of getting the next l(i) in the sequence L, given the elements already in it and the initial configuration.
290
S. F. Marroquín-Cano et al.
In the same year as the birth of the PNs, the work of Mnih et al. [23] revolutionized the field of RL by proposing to model the Q function (Eq. 5) using a convolutional network called DQN. This generated a software agent capable of playing Atari console video games at the human level, outperforming it in some cases. For the first time, RL agents were able to learn from higher-dimensional data thanks to the DQN algorithm. Moreover, subsequent research experimented with using deep architectures in tasks modeled as MDPs, obtaining promising results [35]. Thus, DRL is formalized as a version of RL where some of its elements are modeled by employing DL tools. The extension of the work in [43] was performed in [5], the research paper proposes to train a PN using the RL paradigm. For this purpose, several configurations of methods classified as Policy-Gradients are established, showing an efficient way to solve routing problems without the need to use labeled databases. This brings us to the last five years of research which will be discussed in the next section.
5 A Recent State-of-the-Art Systematic Review In this section, a quick review of papers published during the 2018–2022 period is done. The goal is to identify the DL models and types of RL algorithms that most proliferate in the consulted literature. To perform a systematic search, we follow the guidelines given by the 2020 Prisma Statement [27].
5.1 Methods The search engine used was Google Scholar. The initial filtering strategy consisted of the following series of steps: 1. Some research questions were formulated using a combination of the next sentences: Stochastic Vehicle Routing Problems, Deep Reinforcement Learning, Dynamic Vehicle Routing Problem, Real Time Routing, Attention Mechanisms, and Same Day Delivery, allowing to make a very targeted search. 2. Then the mentioned key concepts were used to reduce the number of results by eliminating irrelevant connectors in the question. 3. In the final step, the Boolean feature was used. This is putting the key concepts between quotes. Generating a list of 197 initial papers. The initial selection criteria consisted of giving priority to those published in scientific journals, including some papers presented at international congresses that were considered important because of their contribution or continuity with other research. All other resources such as surveys, reviews, dissertation theses, or books were not considered. After applying these criteria, the list of references was reduced to 51 papers.
Deep Reinforcement Learning to Solve Stochastic Vehicle Routing Problems
291
19 papers in the review
197 records identified
51 filtered papers
49 recovered papers
146 excluded
2 not recovered
30 excluded: · 13 not stochastic · 9 not RL · 8 hybrid methods
Fig. 4 Paper’s systematic selection Table 1 Explicit taxonomy of the reviewed articles Id Refs. Year DL RL Id
Refs.
Year
DL
RL
1. 2. 3. 4. 5. 6. 7. 8. 9.
[19] [25] [15] [34] [18] [6] [10] [9] [47]
2019 2019 2019 2019 2020 2020 2021 2021 2021
MHA PN PN MLP MLP MHA MLP MLP PN
PG-V A3C A3C DQN DQN AC-V PPO DQN-V PG-V
11. 12. 13. 14. 15. 16. 17. 18. 19.
[48] [49] [14] [16] [28] [7] [22] [12] [1]
2022 2022 2022 2022 2022 2022 2022 2022 2022
PG-V DQN-V DQN-V DQN-V A3C DQN-V A2C PG-V AC-V
10.
[20]
2022
MHAMLP
DQN
–
–
–
MHA MLP MLP MLP PN MLP MLP MHA MHAPN –
–
Finally, a careful analysis of the study’s content was carried out, discarding those that did not consider stochastic conditions, employed paradigms other than RL or used hybrid methods. The process described above allowed obtaining a sample of 19 papers focused on solving SVRPs through the implementation of end-to-end DRL models. Figure 4 shows the systematic selection of the articles.
5.2 Results Table 1 summarizes the results obtained. The given order, through the value of the “Id (identifier)” column, is executed in the first instance concerning the year of publication, column “Year”, and subsequently according to the number of citations. The remaining three columns of the table, “Ref”, “DL”, and “RL”, specify the research paper reference, the proposed DL model, and the RL algorithm used, respectively. In the last two columns, we propose a taxonomy that labels the configurations of models and algorithms consulted, reaching the goal of the review.
292
S. F. Marroquín-Cano et al.
(a) DL models
(b) RL algorithms
Fig. 5 Proportional representations of the reviewed papers
For clarity, the meaning of the labels found in Table 1, for the DL models, is described as follows: Multi-Head Attention [42] (MHA), Pointer Networks [43] (PN), and Multi-Layer Perceptron (MLP). When the terms mentioned are separated by a hyphen, it is understood that the proposed model is a combination of both architectures. On the other hand, the nomenclature for RL algorithms is conformed by: Policy-Gradient Variants (PG-V), in this case, different baselines are used in a configuration similar to the REINFORCE algorithm [46]; Deep Q-Network Variants (DQN-V), this includes techniques derived from traditional DQN [23] such as DDQN [41] and dueling DQN [45]; finally, the Actor-Critic Variants (AC-V) label refers to Actor-Critic methods that do not follow a standard or well-known structure such as A3C, A2C [24], or PPO [32]. In some works, more than one DL model was proposed, so we report that with the best performance according to their results; the same criterion is applied to the RL algorithms. Figure 5 gives a proportional representation of our results that we will discuss in the next section. For more details on the mentioned articles, we recommend consulting the given references.
5.3 Discussion As can be noticed in Fig. 5, MLPs are the most extensively used model type in the consulted articles, representing, for the most part, value functions. However, it can be noted that the use of settings using MHA mechanisms has increased from a couple of years to the present. In addition, the problem of the extensive amount of data for training these models is solved by the RL approach. MHAs and PNs are favorites for policy modeling, and only in one case, they were used as support for modeling the Q function [20]. Concerning the RL techniques, it is observed that almost half of the papers consulted use an approach related to the DQN algorithm; the rest opted for methods
Deep Reinforcement Learning to Solve Stochastic Vehicle Routing Problems
293
based on widely studied Policy-Gradients. Unexpectedly, only one of these investigations uses approaches such as PPO [10], which perform well and are somewhat more recent. Although no specific study was shown on the type of SVRPs solved in the articles, it is useful to mention that the applications of these problems are quite diverse, ranging from Ride-hailing, using autonomous-electric vehicles [20], to the schedule of material transport in the mining industry [9].
6 Conclusion This study aims to give a first insight into the DRL approaches being used to solve SVRPs, which are important formulations due to their wide spectrum of application and computational complexity. Based on it, we can argue that the use of contemporary DL models, such as those based on Transformers, are good alternatives to solve some issues related to robustness, raised by current research [31]. Another area of opportunity is direct learning from images or graphs besides the struct-to-vec process since no researchers have ventured into this task. There is also a need to explore more sophisticated RL techniques for model training, such as [13, 21]. It is an ideal time to use emerging Artificial Intelligence technologies to solve complex problems.
References 1. Alharbi MG, Stohy A, Elhenawy M, Masoud M, Khalifa HAEW (2022) Solving pickup and drop-off problem using hybrid pointer networks with deep reinforcement learning. Plos ONE 17(5):e0267199. https://doi.org/10.1371/journal.pone.0267199 2. Archetti C, Feillet D, Gendreau M, Speranza MG (2011) Complexity of the VRP and SDVRP. Transp Res Part C: Emerg Technol 19(5):741–750. https://doi.org/10.1016/j.trc.2009.12.006 3. Achamrah FE, Riane F, Limbourg S (2022) Solving inventory routing with transshipment and substitution under dynamic and stochastic demands using genetic algorithm and deep reinforcement learning. Int J Prod Res 60(20):6187–6204. https://doi.org/10.1080/00207543. 2021.1987549 4. Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA (2017) Deep reinforcement learning: a brief survey. IEEE Signal Process Magazine 34(6):26–38. https://doi.org/10.1109/MSP. 2017.2743240 5. Bello I, Pham H, Le QV, Norouzi M, Bengio S (2017) Neural combinatorial optimization with reinforcement learning. In: Proceedings on international conference of learning Represent. Toulon, France, Art no. 09940. https://doi.org/10.48550/arXiv.1611.09940 6. Bono G, Dibangoye JS, Simonin O, Matignon L, Pereyron F (2020) Solving multi-agent routing problems using deep attention mechanisms. IEEE Trans Intell Transp Syst 22(12):7804–7813. https://doi.org/10.1109/TITS.2020.3009289 7. Bozanta A, Cevik M, Kavaklioglu C, Kavuk EM, Tosun A, Sonuc SB, Basar A (2022) Courier routing and assignment for food delivery service using reinforcement learning. Comput Indus Eng 164:107871. https://doi.org/10.1016/j.cie.2021.107871
294
S. F. Marroquín-Cano et al.
8. Dantzig GB, Ramser JH (1959) The truck dispatching problem. Manage Sci 6(1):80–91. https:// doi.org/10.1287/mnsc.6.1.80 9. De Carvalho JP, Dimitrakopoulos R (2021) Integrating production planning with truckdispatching decisions through reinforcement learning while managing uncertainty. Minerals 11(6):587. https://doi.org/10.3390/min11060587 10. Feng J, Gluzman M, Dai JG (2021) Scalable deep reinforcement learning for ride-hailing. In: 2021 American control conference (ACC), IEEE, pp 3743–3748. https://doi.org/10.23919/ ACC50511.2021.9483145 11. Goodfellow, I., Bengio, Y., Courville, A.: Deep learning. MIT press (2016) 12. Hansuwa S, Velayudhan Kumar MR, Chandrasekharan R (2022) Analysis of box and ellipsoidal robust optimization, and attention model based reinforcement learning for a robust vehicle routing problem. S¯adhan¯a 47(2):1–23. https://doi.org/10.1007/s12046-022-01833-2 13. Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International conference on machine learning 1861–1870) 14. Jahanshahi H, Bozanta A, Cevik M, Kavuk EM, Tosun A, Sonuc SB, Ba¸sar A (2022) A deep reinforcement learning approach for the meal delivery problem. Knowl Based Syst 243:108489. https://doi.org/10.1016/j.knosys.2022.108489 15. James JQ, Yu W, Gu J (2019) Online vehicle routing with neural combinatorial optimization and deep reinforcement learning. IEEE Trans Intell Transp Syst 20(10):3806–3817. https:// doi.org/10.1109/TITS.2019.2909109 16. Kavuk EM, Tosun A, Cevik M, Bozanta A, Sonuç SB, Tutuncu M, Basar A (2022) Order dispatching for an ultra-fast delivery service via deep reinforcement learning. Appl Intell 52(4):4274–4299. https://doi.org/10.1007/s10489-021-02610-0 17. Khalil E, Dai H, Zhang Y, Dilkina B, Song L (2017) Learning combinatorial optimization algorithms over graphs. Adv Neural Inf Process Syst 30. https://doi.org/10.48550/arXiv.1704. 01665 18. Koh S, Zhou B, Fang H, Yang P, Yang Z, Yang Q, Ji Z (2020) Real-time deep reinforcement learning based vehicle navigation. Appl Soft Comput 96:106694. https://doi.org/10.1016/j. asoc.2020.106694 19. Kool W, Van Hoof H, Welling M (2019) Attention, learn to solve routing problems!, arXiv preprint. https://doi.org/10.48550/arXiv.1803.08475 20. Kullman ND, Cousineau M, Goodson JC, Mendoza JE (2022) Dynamic ride-hailing with electric vehicles. Transp Sci 56(3):775–794. https://doi.org/10.1287/trsc.2021.1042 21. Kuznetsov A, Shvechikov P, Grishin A, Vetrov D (2020) Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: International conference on machine learning, 5556–5566 22. Liu Z, Li X, Khojandi A (2022) The flying sidekick traveling salesman problem with stochastic travel time: a reinforcement learning approach. Transp Res Part E: Logistics Transp Rev 164:102816. https://doi.org/10.1016/j.tre.2022.102816 23. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Hassabis D (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533. https:// doi.org/10.1038/nature14236 24. Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: International conference on machine learning, 1928-1937 25. Nazari M, Oroojlooy A, Snyder L, Takác M (2018) Reinforcement learning for solving the vehicle routing problem. Adv Neural Inf Process Syst 31. https://doi.org/10.48550/arXiv.1802. 04240 26. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444. https://doi. org/10.1038/nature14539 27. Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, Moher D (2021) The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. Syst Rev 10(1):1–11. https://doi.org/10.1186/s13643-021-01626-4
Deep Reinforcement Learning to Solve Stochastic Vehicle Routing Problems
295
28. Pan W, Liu SQ (2022) Deep reinforcement learning for the dynamic and uncertain vehicle routing problem. Appl Intell 1–18. https://doi.org/10.1007/s10489-022-03456-w 29. Psaraftis HN (1980) A dynamic programming solution to the single vehicle many-to-many immediate request dial-a-ride problem. Transp Sci 14(2):130–154. https://doi.org/10.1287/ trsc.14.2.130 30. Qin W, Zhuang Z, Huang Z, Huang H (2021) A novel reinforcement learning-based hyperheuristic for heterogeneous vehicle routing problem. Comput Indus Eng 156:107252. https:// doi.org/10.1016/j.cie.2021.107252 31. Rios BHO, Xavier EC, Miyazawa FK, Amorim P, Curcio E, Santos MJ (2021) Recent dynamic vehicle routing problems: a survey. Comput Indus Eng 160:107604. https://doi.org/10.1016/j. cie.2021.107604 32. Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms arXiv preprint. https://doi.org/10.48550/arXiv.1707.06347 33. Sheng Y, Ma H, Xia W (2020) A pointer neural network for the vehicle routing problem with task priority and limited resources. Inf Technol Control 49(2):237–248. https://doi.org/ 10.5755/j01.itc.49.2.24613 34. Shi J, Gao Y, Wang W, Yu N, Ioannou PA (2019) Operating electric vehicle fleet for ride-hailing services with reinforcement learning. IEEE Trans Intell Transp Syst 21(11):4822–4834. https:// doi.org/10.1109/TITS.2019.2947408 35. Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hassabis D (2017) Mastering the game of go without human knowledge. Nature 550(7676):354–359. https://doi. org/10.1038/nature24270 36. Soeffker N, Ulmer MW, Mattfeld DC (2021) Stochastic dynamic vehicle routing in the light of prescriptive analytics: a review. Euro J Oper Res. https://doi.org/10.1016/j.ejor.2021.07.014 37. Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT press 38. Torres J (2021) Introducción al aprendizaje por refuerzo profundo. Watch this space 39. Toth P, Vigo D (eds) (2002) The vehicle routing problem. Society for industrial and applied mathematics 40. Ulmer MW, Goodson JC, Mattfeld DC, Thomas BW, On modeling stochastic dynamic vehicle routing problems. EURO J Transp Logistics 9(2):100008. https://doi.org/10.1016/j.ejtl.2020. 100008 41. Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. Proc AAAI Conf Artific Intell 30(1). https://doi.org/10.1609/aaai.v30i1.10295 42. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30. https://doi.org/10.48550/arXiv. 1706.03762 43. Vinyals O, Fortunato M, Jaitly N (2015) Pointer networks. Adv Neural Inf Process Syst 28. https://doi.org/10.48550/arXiv.1506.03134 44. Vinyals O, Bengio S, Kudlur M (2015) Order matters: sequence to sequence for sets. arXiv preprint. https://doi.org/10.48550/arXiv.1511.06391 45. Wang Z, Schaul T, Hessel M, Hasselt H, Lanctot M, Freitas N (2016) Dueling network architectures for deep reinforcement learning. In: International conference on machine learning, 1995–2003 46. Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8(3):229–256. https://doi.org/10.1007/BF00992696 47. Yuan Y, Li H, Ji L (2021) Application of deep reinforcement learning algorithm in uncertain logistics transportation scheduling. Comput Intell Neurosci 2021:9. https://doi.org/10.1155/ 2021/5672227 48. Zhang Z, Liu H, Zhou M, Wang J (2021) Solving dynamic traveling salesman problems with deep reinforcement learning. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/ TNNLS.2021.3105905 49. Zhang Y, Bai R, Qu R, Tu C, Jin J (2022) A deep reinforcement learning based hyper-heuristic for combinatorial optimization with uncertainties. Euro J Oper Res 300(2):418–427. https:// doi.org/10.1016/j.ejor.2021.10.032
Applying Machine Learning for American Sign Language Recognition: A Brief Survey Shashank Kumar Singh and Amrita Chaturvedi
Abstract Sign languages play a crucial role in enabling differently-abled people to communicate and express their feelings. The advent of newer technologies in machine learning and sensors has led us to build more sophisticated and complex human-computer interfaces. However, a cost-effective commercial sign language recognition system is still not available. In this paper, we have highlighted recent advancements and limitations related to sign language recognition systems. We have focused our study on American sign language (ASL), which is one of the most widely used sign languages. We have tried to classify most of the surveyed work into two broad areas based on data acquisition techniques: visual and sensor-based approaches. We have also listed publicly available datasets that could be used for further research. This paper aims to provide a clear insight into advancement in the field of sign language recognition systems using American sign language. Keywords Computer vision · Sign language recognition system · Machine learning · Wearable sensors
1 Introduction Sign languages are a form of non-verbal communication consisting of a set of gestures produced using different body parts. Different signs are represented using different orientations, shapes, and movements of body parts (hand) to express different meanings. Studies of sign languages are important as they greatly help differently-abled people communicate among themselves and with able-bodied people. Most of these sign languages have their own grammar, syntax, semantics, or morphology. Generating and interpreting sign language is not an easy task as they have a wide set of rules. For instance, a single verb may have more than five morphemes related to it S. K. Singh (B) · A. Chaturvedi Indian Institute of Technology (BHU), Varanasi, India e-mail: [email protected] A. Chaturvedi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 689, https://doi.org/10.1007/978-981-99-2322-9_22
297
298
S. K. Singh and A. Chaturvedi
[4]. Also, there are around 138–300 different sign languages around the world [21], but the lack of global sign language makes inter-regional communication difficult. Automatic sign language interpretation can resolve many of these issues. With the advent of new technological discoveries in the field of human-computer interface, researchers have tried to use computer-based approaches to assist in the generation, recognition, and interpretation of sign languages. Such a computer-assisted system is generally called a sign language recognition system (SLRS). Sign language recognition is a multi-disciplinary field consisting of natural language processing, pattern recognition, human-computer interface(HCI), signal processing, and computer vision-based techniques. The main objective of SLR is to provide suitable assistance to differently-abled people while communicating using sign language. Initially, traditional modeling approaches were applied to develop computerassisted sign language recognition systems that can automate the generation and understanding of these sign languages [2]. A major shift in this era occurred with the development of feasible machine learning libraries that can automate and help to develop more reliable and accurate SLRS. Various attempts have been made to enhance the performance of these SLRS by exploiting various supervised machine learning algorithms. However, initial ML techniques were based on manual feature extraction that requires manual labor and expertise. This limitation was removed by using recent deep learning algorithms that can automatically extract features and thereafter perform classification. In this article, we have tried to present a comprehensive survey covering all major research work done in sign language recognition utilizing machine learning techniques. However, we have focused only on a single sign language, i.e., American sign language (ASL). ASL is among the top sign languages used in the world. The ASL finger spelling mainly consists of 26 letters and ten digits, along with its grammar and syntax [49]. Most work on ASL recognition can be classified based on data acquisition, classification mechanism, and static or dynamic nature of the sign. We have considered creating two broad sections: visual and sensor-based sign recognition approaches to present recent advancements systematically. Recent computer vision (visual) approaches mainly depend on collecting camera-based input data and applying machine learning-based algorithms to build a suitable sign recognition system. Whereas the sensor-based approaches may collect input data from various sensors such as Gloves [1], accelerometers, gyroscopes, surface electrography sensors [11]. This paper systematically describes these approaches in detail, forming different sections. Section 2 summarizes computer vision-based methods along with the publicly available ASL datasets. Section 3 highlights the sensor-based ASL recognition approaches. Section 4 discusses the limitations and challenges in this field, followed by the conclusion in Sect. 5.
Applying Machine Learning for American Sign Language Recognition …
299
2 Computer Vision-Based ASL Recognition Computer vision is a field of science that includes all those techniques and methods that can automate and mimic human visual systems by computers. It comprises analyzing, acquiring, and exploiting the information present in digital images. The rise in the field of machine learning has greatly benefited computer vision tasks such as object recognition, image style transfer, image classification, image reconstruction, and many more. Sign language recognition also exploited the efficiency of computer vision and machine learning algorithms. In SLR, computer vision methods are applied to extract features from the images or videos recorded for sign languages. Later, based on these features, machine learning algorithms are trained to generate a model, which can further be used for recognizing sign languages. In this section, we have tried to cover some of the important work related to American sign language using computer vision and machine learning techniques. Karayılan et al. [26], proposed two classifiers based on the backpropagation neural network, which achieved 70 and 85% of classification accuracy. Their model needs 3072 raw features and 512 histogram-based features as input. Authors have used only three postures, A, B, and C, from Marcel Static Hand Posture Database [32] for their analysis. Munib et al. [34] applied Hough transform, trained a feed-forward back propagation network on ASL signs, and got an accuracy of 92.3% for the classification task. Their dataset consists of 20 signs of ASL with 15 samples each, collected from 15 subjects. The authors also claimed that their model was invariant to image transformation, such as rotation, and scaling. While in another work, Ragab et al. [41] predominantly used Hilbert space-filling curve to extract features. They used a support vector machine and random forest for classification and achieved a classification accuracy of 69% on the Pugeaulet and Bowden dataset. However, in various experiments, different versions of datasets achieved up to 94 and 99% classification accuracy. In another important work, Tangsuksant et al. [48] used gloves with six different color markers. They applied two cameras to extract the 3D coordinates corresponding to each of the colored markers. The authors extracted a novel feature based on the Circle Hough transform and achieved a classification accuracy of 95% using a feed-forward neural network. The dataset collected consists of 2100 images for the ASL alphabet. Whereas Zamani et al. [58] used saliency property to extract features from images followed by training a neural network to classify the instances. Data transformation, such as linear discriminate analysis and principal component analysis, was used to reduce dimensionality. Their proposed method achieved a classification accuracy of 99.88%. The authors used two datasets for the experiments; the first dataset was a collection of 2425 images collected from five people [7] while the second dataset consisted of 2550 images, 70 for each sign collected from five different people. Jangyodsuk et al. [23] used single and double-handed ASL signs and extracted features based on motion velocity vector, relative hand location with respective subject face, and right-hand position. Further, the authors used HOG and dynamic time wrapping for the recognition tasks. Based on their experimental results, they con-
300
S. K. Singh and A. Chaturvedi
cluded that using Hog features could significantly improve recognition accuracy by eight percent and reported a classification accuracy of 82%. Two datasets were used; the first consisted of videos in RGB format (three subjects with 1113 signs each), while the other consisted of videos collected from Kinect cameras (two subjects total 2226 signs). Aryanie et al. [5] focused on utilizing the k-nearest neighbor algorithm for classifying ASL signs. They studied the effect of dimensionality reduction on classification accuracy and achieved a classification accuracy of 99.8%. The dataset used for the experimentation was the subset of the original dataset [40] and contained 10 ASL signs with 5254 samples. Islam et al. [22] used images captured from mobile devices for recognizing ASL gestures. From these images, they captured five features that were based on eccentricity, rotation, pixels, elongatedness, and fingertip position. Moreover, they applied an algorithm that uses a combination of convex Hull and K curvature. The authors achieved a classification accuracy of 94.32% on the dataset of 37 ASL signs. Oyedotun et al. [37] used deep learning models such as convolution neural networks and stacked denoising autoencoder for the ASL recognition task. The autoencoder is based on unsupervised learning and is used to deal with corrupted input data. For their experiments, the authors used 24 signs from Thomas Moeslund’s gesture recognition database [8]. They trained different neural network architectures and achieved a maximum recognition accuracy of 92.83 %. In another similar work, Kasukurthi et al. [27] used RGB images to train a squeeze net model on the Surrey Finger dataset. They claimed their model to be compatible with mobile devices. Their architecture achieved a classification accuracy of 83.9%. Truong et al. [51] used web camera-based images to train AdaBoost and Haar classifiers. Their dataset consisted 28,000 images for all the letters of ASL. Their model achieved a precision of 98.7%. The development of depth cameras helped to enhance the accuracy of sign language recognition models. These cameras were able to capture the depth or the distance of the subject from the camera. Instead of recording RGB features, these cameras could store RGBD information. RGB is the common format of the traditional color camera, while in the acronym RGBD, ’D’ refers to depth information. We have listed below some of the major work done using depth information. Sun et al. [45] used Kinetic sensors and collected 2000 phases. Each phase contained information such as color, depth, and skeleton for ASL gestures and extracted 3D features to perform sign language recognition. The authors initially applied a latent support vector machine algorithm and claimed that its efficiency could be enhanced by the fusion of Kinetic sensors. Using features derived from Hog and Kinetic, they achieved recognition accuracy of 86.0%. Their dataset consisted of 73 ASL signs collected from 9 participants. In another work, Sun et al. [46] extracted relevant features to build an efficient classifier using Adaboost. The authors compare their recognition accuracy while including and excluding temporal information from data preprocessing. Their dataset consists of 1971 phases collected from Kinetic sensors. Usachokcharoen et al. [53] applied Microsoft Kinetic and color sensors to classify eight commonly used signs. They used features based on motion and colors. The authors achieved an accuracy of 95% using a multi-class support vector machine. The
Applying Machine Learning for American Sign Language Recognition …
301
dataset used in their experiment consisted of 2 s video clips collected from a single user. Pugeault et al. [40] utilized random forests to classify the ASL signs. The authors used Microsoft Kinetics to extract depth and appearance-related information for each sign. Their model was able to detect hand shapes in real-time. A dataset of 500 samples collected from 4 subjects was used for their experiments. They evaluated their model effectiveness using precision as an evaluation metric. Fusing images’ appearance and depth information as features, their model achieved a precision of 75%. Sun et al. [47] used a latent support vector machine to perform word and sentencelevel ASL gesture recognition. A new dataset consisting of 2000 word-level and 2000 sentence-level ASL gestures(approx.) was collected using Kinetic sensors. The dataset consists of 73 different ASL signs, each corresponding to an ASL word, along with 63 different sentences made from 2 to 4 words each. The authors extracted different features using a histogram of oriented gradients, Kinetic (pose, hand shape, motion), and optic flow information. They achieved a classification accuracy of 86.0 and 82.9% for word and sentence-level ASL recognition, respectively. Inspired by the efficiency of deep learning techniques, researchers applied these methods to ASL recognition tasks. Ferreira et al. [17] fused Kinetic and Leap motion sensors and proposed a multi-modal system for ASL gesture recognition. They used deep learning models such as convolution neural networks for the recognition task. The authors compared the efficiency of single and multi-modal learning paradigms in their work and achieved an accuracy of 97%. Their dataset consisted of 10 ASL signs collected from 14 people. Barbhuiya et al. [6] proposed a deep learning model based on AlexNet and VGG16 to recognize static ASL gestures. The pertained architecture was used for the feature extraction task. These extracted features were provided to SVM. Using this architecture, they achieved a classification accuracy of 99.86% for 36 ASL gestures.Whereas Mohammadi et al. [33] use deep neural networks to classify different static ASL signs. In their work, the authors evaluated the performance metric such as power consumption, accuracy, and energy on different hardware platforms, such as Intel neural compute stick 2 and Intel’s neuromorphic platform, Loihi. The proposed model achieved a classification accuracy of 99.93%.Tunga et al. [52] proposed a pose base deep learning architecture that used temporal as well as spatial information from the video streams for ASL gesture recognition. Using graph convolution network and Bidirectional Encoder Representations from Transformer (BERT), they achieved an accuracy of 88.67% on Word Level American Sign Language dataset [30]. Ikram and Dhanda [20] too applied a deep learning approach to make predictions about the ASL symbols using a CNN model. Their model could predict ASL symbols from the live stream using a web camera with 98% recognition accuracy.
2.1 Dataset Machine learning solutions heavily depend upon the amount and quality of data for a particular problem. The algorithms used in machine learning try to extract a pattern
302
S. K. Singh and A. Chaturvedi
Table 1 Summary of the American sign language vision-based datasets Dataset Content Signers
Color/depth information
NCSLGR [35] ASL-100-RGBD [25, 31]
1866 signs 42 videos of 100 ASL signs
4 persons 22 persons
Color Both
RWTHBOSTON-50 [15]
Video containing 50 signs, 83 pronunciations 201 sentences, 104 ASL signs 843 sentences and 400 ASL signs 10,000 videos, 3,300 ASL sign 3576 images and videos
3 persons
Color
3 persons
Color
5 persons
Color
6 persons
Color
14 persons
Color
809 Images collected from
6 persons
Color
10 hand postures
24 persons
Gray images
2425 images of the 36 ASL signs 24 static ASL signs
5 persons
Color
5 persons
Both
11 persons
Both
WTH-BOSTON-104 [13] RWTH-BOSTON-400 [14] ASLLVD [10, 36] Purdue RVL-SLLL ASL database [55] American sign language image dataset (ASLID) [18] Jochen Triesch static hand posture database [50] 2D static hand gesture color image dataset [7] ASL finger spelling dataset [18] How2Sign [16] Word-level American sign language (WLASL) [30]
80 h of sign language video Video dataset, of more than 2000 ASL words
119 persons Color
from the underlying dataset, which is further exploited for solving classification and regression problems. The larger the quality and quantity of data present, the better the probability of finding accurate results. To use machine learning for sign language recognition, we generally require a larger dataset consisting of most sign language gestures with their labels. Such a dataset can be used as a benchmark to evaluate the efficiency of different machine learning algorithms. Making these datasets publicly available would benefit ongoing research, as collecting these data is difficult and timeconsuming. Considering these issues, many people have published their datasets in the public domain. Some of the important datasets are summarized in Table 1.
3 Sensors-Based ASL Recognition Oz et al. [38] proposed two artificial neural networks; one was used as a velocity network, while the second was used for classification purposes. The dataset consisted
Applying Machine Learning for American Sign Language Recognition …
303
of 60 ASL words collected using Sensory gloves and 3D motion trackers (Flock of Birds). The proposed architecture was able to successfully translate ASL signs into English in real-time. The authors achieved a maximum classification accuracy of 95%. In a similar work, Oz et al. [39] exploited the hand shapes and trajectory of motion as features to recognize sign language. For this, they used a Flock of birds and Cyber gloves to collect features. The dataset consisted of 50 ASL words. Using artificial neural networks, their model achieved an accuracy of 90%. Ani et al. [3] used a 3-finger glove to predict the alphabets of ASL. The bending movement corresponding to each sign generated a specific resistance to these sensors. The bending position for the three fingers acted as features that were applied to the microcontroller for recognition. Using this approach, the authors achieved a classification accuracy of 70%. Savur et al. [42] applied SVM and classified ASL signs using surface electromyography signals. They achieved an online classification accuracy of 82.3% and an offline accuracy of 91.1%. The dataset consisted of 26 signs of ASL collected from 4 subjects. Savur et al. [43] utilized various time and frequency domain features to classify sEMG signals corresponding to ASL signs. The authors presented an online system capable of classifying ASL signs. Also, they highlighted the comparison of single vs. multiple users’ effect on the classification accuracy. Their model achieved a maximum classification accuracy of 79.35% on a single-user dataset using ensemble learning, while an accuracy of 60.85% was achieved with SVM on a multi-user dataset. Their dataset consisted of 27 ASL signs collected from ten subjects. Wu et al. [57] combined surface electromyography with inertial sensors for ASL sign recognition. The dataset consisted of 40 words collected from four subjects. They used a combination of two sensors and claimed that a single channel of electromyography signals is sufficient to classify different signs. The overall recognition rate for the system was 95.94%. Derr et al. [12] used surface electromyography signals along with an accelerometer to classify a dataset consisting of a subset of 50 ASL signs. The authors considered both single-handed and double-handed signs of ASL. They exploited the efficiency of the support vector machine for classification and achieved an accuracy of 59.96% for sEMG-based subject-independent evaluation for sign language recognition. Singh and Chaturvedi [44] applied an ensemble feature selection technique to recognize static ASL signs using surface Electromyography (sEMG)based sensors. The dataset consists of time series signals collected from 20 subjects. For 10 ASL signs, achieved a recognition accuracy of 99.99%. Jiang et al. [24] explored the ability of Forcemyography and surface electromyography to classify ASL signs. They used a total of 48 gestures consisting of 16 signs of ASL, which were collected from 12 people. They concluded that Forcemyography sensors are better than surface electromyography-based sensors for ASL gesture recognition. A maximum classification accuracy of 91.2% was achieved by their model based on Linear Discriminate analysis. Wu et al. [56] developed a real-time ASL sign recognition system. An automatic segmentation method was proposed in their work, and the best features from different modalities were selected using information gain. They used four algorithms Naïve Bayes (NB), decision tree (DT), neural network (NN), and LibSVM model, and obtained a maximum classification accuracy of 96.16%. The dataset consisted of 80
304
S. K. Singh and A. Chaturvedi
ASL signs collected from four people. While Kim et al. [28] used reflected IR signals for recognizing the generated ASL signs. These signals were collected at a distance of 15 cm from the antenna. The dataset consisted of six letters of ASL collected from 5 subjects. To classify these signals, they used one-dimensional convolution neural networks that achieved an accuracy of more than 90%, as mentioned in the paper. Wen et al. [54] use gloves and the deep learning module to recognize sentence-level sign gestures. Initially, each of the sentences is segmented into its constituent words. These segmented words are identified and then further processed to reconstruct the original sentence in text and audio form. The proposed architecture recognized the new sentence with a correct average rate of 86.67%. Lee et al. [29] used leap motion sensors to develop an ASL learning application prototype. In their work, the author used the deep learning model (RNN) along with the KNN algorithm to classify the input sequences. The trained model helps to achieve the classification accuracy of 91.82% with fivefold cross-validation.
4 Discussion Despite various research works on sign language recognition, an effective sign language interpreter is still unavailable. Basically, a sign language interpreter consists of two parts: sign language recognition and sign language translation. Sign language recognition is not trivial as it deals with various possibilities of sign representation. An ASL sign can be represented by its characteristics, viz., hand trajectory, hand shape, hand orientation, facial expression, and other manual and non-manual features. Different ASL can be generated using slight variations in these features. It is difficult for a trained model to understand these variations. Efforts must be made to create a model that accommodates and correctly discriminate these manual and non-manual features. Another area for improvement which is not explored is dealing with contextual or spatial information. Sign languages are natural languages rather than an encoding of verbal languages. Few authors have suggested that the study of sign languages must not be mixed with spoken languages as they have different properties, for example, the use of signing space in sign language. A signing space is an area in front of the signer where he /she could move hands to generate a sign [19]. Particular gestures at different positions may refer to different entities. This information is referred to as contextual information related to a sign. Most of the work in SLR has not focused on this area. However, an efficient SLR model must accommodate these contextual and spatial properties. Recent advancements in linguistics and artificial intelligence have helped to achieve better results in several Natural language processing tasks, such as machine translations for spoken languages. Machine translation is a process that allows one to convert words from one language to another. Building an application capable of converting one sign language to another would help signers with different sign languages and non-sign users to communicate efficiently. However, machine trans-
Applying Machine Learning for American Sign Language Recognition …
305
lations for sign languages have been very less explored. The best result for sign language translation was presented by Chen et al. [9], where the authors achieved a BLEU-4 score of 24.32, significantly less than the score obtained for spoken languages such as English. Hence, for sign languages, machine learning translation is still an open area of research. New algorithms and publically available datasets must be introduced to achieve acceptable results. Another issue is that only a limited number of subjects (less than 100) have been used in most of the publicly available datasets. Sign languages show high variability while representing a sign. Increasing the number of subjects, and replicating the real-world conversation(Continuous ASL gestures) for data collection, may lead to more generalized and effective techniques for ASL recognition. Comparing both computer vision and sensor-based approaches for ASL recognition, computer vision-based approaches are extensively explored and researched as compared to sensor-based approaches for ASL [16]. Also, a wide variety of datasets consisting of ASL videos with RGB, pose, multi-view angles, text annotation, gloss, and depth information are present in the public domain. At the same time, such a rich dataset for sensor-based ASL recognition is barely available. Moreover, computer vision-based devices are less costly and cause less discomfort to signers while producing gestures. The sensors, such as gloves, restrict Signer’s hand action and are less easily used in real applications. Also, they need regular and precise calibration. The primary difficulty while dealing with sign language recognition is that the sign languages are generated using manual and non-manual features. The manual features included hand movement, shape, location, and orientation, while non-manual features included facial expression, eye gaze, and the use of signing space. The sensors-based method can easily accommodate these manual features but struggles to process the non-manual features. However, computer vision-based methods can comparatively deal with manual and non-manual features but at an increased processing and computational cost. Computer vision-based methods rely upon video and images. Hence storing, processing, and interpreting sign languages for these methods requires higher computational overhead. On the other hand, sensor-based approaches such as Leap motion sensors have lower data processing and storing cost. These sensors consist of infrared cameras and can track hand movements precisely. However, these sensors have varying sampling frequencies that adversely affect their recognition ability [44]. Each of the approaches used for ASL recognition has certain limitations. Hence considering the inherent complexity of the sign language processing problem and the limits of the two methods, having only the sensor-based or computer vision-based solution can not solve the problem of building a reliable and efficient SLR system for specially-abled persons. A hybrid multi-modal approach, combining the capabilities of computer vision and sensors-based approaches, may greatly help.
306
S. K. Singh and A. Chaturvedi
5 Conclusion In this paper, we have tried to present a systematic review of the approaches used in the field of sign language recognition systems concerning American sign language. Based on the number of users, ASL is one of the most used sign languages in the world. We clustered the research work into two broad groups, viz., visual and sensorbased approaches. Initially, the modeling and classification of ASL were mostly done using Hidden Markov model and machine learning-based approaches. However, traditional machine learning algorithms are recently replaced by deep learning algorithms. Deep learning has been highly beneficial in this field as it has reduced the burden of manual feature extraction. Despite advancements in this field, an automatic sign language recognition system is still not in use. We still don’t have an efficient continuous sign language recognition system that could easily accommodate all manual and non-manual features, covering the massive vocabulary of sign languages. Also, most of the work is dominated by vision-based methods, and these methods have a limitation in that they are easily affected by surrounding lighting conditions. Simultaneously, sensors-based methods involve placing sensors on the signer’s body, which may cause discomfort and be impractical for real-time use. A feasible solution for building an automatic ASL recognition system is to use a multi-modular approach amalgamating both the vision and sensor-based modules. Later, deep learning models can be deployed to generate intermediate representations that can be used in several underlying tasks, such as recognition, production, translation, and segmentation. Efforts must be made to leverage the advancement in Natural language processing tasks for sign languages. Deep learning-based frameworks, advancements in natural language processing techniques, and new sensor fabrication technology can potentially resolve obstacles in achieving an efficient automatic sign language recognition system. Efforts must be made to increase the research and exploit the power of these newer technologies to build a reliable, accurate, and effective real-time sign language recognition system. However, for ASL, finding efficient segmentation and tokenization techniques, a linguistically-aware continuous recognition model, and a collection of well-annotated datasets covering real-world sign language conversation scenarios are still open areas for the research community.
References 1. Ahmed MA, Zaidan BB, Zaidan AA, Salih MM, Lakulu MMB (2018) A review on systemsbased sensory gloves for sign language recognition state of the art between 2007 and 2017. Sensors 18(7):2208 2. Al-Ahdal ME, Nooritawati MT (2012) Review in sign language recognition systems. In: 2012 IEEE symposium on computers and informatics (ISCI), IEEE, pp 52–57 3. Ani AIC, Rosli AD, Baharudin R, Abbas MH, Abdullah MF (2014) Preliminary study of recognizing alphabet letter via hand gesture. In: 2014 international conference on computational science and technology (ICCST), IEEE, pp. 1–5
Applying Machine Learning for American Sign Language Recognition …
307
4. Aronoff M, Meir I, Sandler W (2005) The paradox of sign language morphology. Language 81(2):301 5. Aryanie D, Heryadi Y (2015) American sign language-based finger-spelling recognition using k-nearest neighbors classifier. In: 2015 3rd international conference on information and communication technology (ICoICT), IEEE, pp 533–536 6. Barbhuiya AA, Karsh RK, Jain R (2021) Cnn based feature extraction and classification for sign language. Multimedia Tools Appl 80(2):3051–3069 7. Barczak A, Reyes N, Abastillas M, Piccio A, Susnjak T (2011) A new 2d static hand gesture colour image dataset for asl gestures 8. Birk H, Moeslund TB, Madsen CB (1997) Real-time recognition of hand alphabet gestures using principal component analysis. In: Proceedings of the scandinavian conference on image analysis. vol 1, pp 261–268. Proceedings published by various publishers 9. Camgoz NC, Hadfield S, Koller O, Ney H, Bowden R (2018) Neural sign language translation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 7784– 7793 10. Dai A, ASLLRP DAI. http://www.bu.edu/asllrp/av/dai-asllvd.html, [Online; accessed 10 Dec 2022] 11. De Luca CJ (1997) The use of surface electromyography in biomechanics. J Appl Biomech 13(2):135–163 12. Derr C, Sahin F (2017) Signer-independent classification of American sign language word signs using surface emg. In: 2017 IEEE international conference on systems, man, and cybernetics (SMC). IEEE, pp 665–670 13. Dreuw P, RWTH-BOSTON-104 Database. https://www-i6.informatik.rwth-aachen.de/aslr/ database-rwth-boston-104.php, [Online; accessed 10 Dec 2022] 14. Dreuw P, RWTH-BOSTON-400. https://www-i6.informatik.rwth-aachen.de/aslr/, [Online; accessed 10 Dec 20221] 15. Dreuw P, RWTH-BOSTON-50 Database. https://www-i6.informatik.rwth-aachen.de/aslr/ database-rwth-boston-50.php, [Online; accessed 10 Dec 2022] 16. Duarte A, Palaskar S, Ventura L, Ghadiyaram D, DeHaan K, Metze F, Torres J, Giro-i Nieto X (2021) How2sign: a large-scale multimodal dataset for continuous American sign language. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2735–2744 17. Ferreira PM, Cardoso JS, Rebelo A (2017) Multimodal learning for sign language recognition. In: Iberian conference on pattern recognition and image analysis. Springer, pp 313–321 18. Gattupalli S, Ghaderi A, Athitsos V (2016) Evaluation of deep learning based pose estimation for sign language recognition. In: Proceedings of the 9th ACM international conference on PErvasive technologies related to assistive environments, pp 1–7 19. Van der Hulst H, Mills A (1996) Issues in sign linguistic: phonetics, phonology and morphosyntax. Lingua 98(1–3):3–17 20. Ikram S, Dhanda N (2021) American sign language recognition using convolutional neural network. In: 2021 IEEE 4th international conference on computing, power and communication technologies (GUCON), pp 1–12 21. International S, Sign language. https://www.ethnologue.com/subgroups/sign-language, [Online; accessed 10 Dec 2022] 22. Islam MM, Siddiqua S, Afnan J (2017) Real time hand gesture recognition using different algorithms based on American sign language. In: 2017 IEEE international conference on imaging, vision and pattern recognition (icIVPR). IEEE, pp 1–6 23. Jangyodsuk P, Conly C, Athitsos V (2014) Sign language recognition using dynamic time warping and hand shape distance based on histogram of oriented gradient features. In: Proceedings of the 7th international conference on PErvasive technologies related to assistive environments, 1–6 24. Jiang X, Merhi LK, Xiao ZG, Menon C (2017) Exploration of force myography and surface electromyography in hand gesture classification. Medical Eng Phys 41:63–73
308
S. K. Singh and A. Chaturvedi
25. Jing L, Vahdani E, Huenerfauth M, Tian Y (2019) Recognizing American sign language manual signs from rgb-d videos. arXiv preprint arXiv:1906.02851 26. Karayılan T, Kılıç Ö (2017) Sign language recognition. In: 2017 international conference on computer science and engineering (UBMK). IEEE, pp 1122–1126 27. Kasukurthi N, Rokad B, Bidani S, Dennisan D et al (2019) American sign language alphabet recognition using deep learning. arXiv preprint arXiv:1905.05487 28. Kim SY, Han HG, Kim JW, Lee S, Kim TW (2017) A hand gesture recognition sensor using reflected impulses. IEEE Sensors J 17(10):2975–2976 29. Lee CK, Ng KK, Chen CH, Lau HC, Chung S, Tsoi T (2021) American sign language recognition and training method with recurrent neural network. Expert Syst Appl 167:114403 30. Li D, Rodriguez C, Yu X, Li H (2020) Word-level deep sign language recognition from video: A new large-scale dataset and methods comparison. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1459–1469 31. Jing L, Vahdani E, Tian Y, Recognizing American sign language manual signs from RGB-D videos. https://longlong-jing.github.io/ASL-100-RGBD/, [Online; accessed 10 Dec 2022] 32. Marcel S, Sebastien Marcel—hand posture and gesture datasets. https://www.idiap.ch/ resource/gestures/, [Online; accessed 10 Dec 2022] 33. Mohammadi M, Chandarana P, Seekings J, Hendrix S, Zand R (2022) Static hand gesture recognition for American sign language using neuromorphic hardware. Neuromorphic Comput Eng 2(4):044005 34. Munib Q, Habeeb M, Takruri B, Al-Malik HA (2007) American sign language (asl) recognition based on hough transform and neural networks. Expert Syst Appl 32(1):24–37 35. NCSLGR: National Center for Sign Language and Gesture Resources (NCSLGR) Corpus. https://www.bu.edu/asllrp/ncslgr-for-download/download-info.html, [Online; accessed 10 Dec 2022] 36. Neidle C, Opoku A, Dimitriadis G, Metaxas D (2018) New shared and interconnected asl resources: Signstream 3 software; dai 2 for web access to linguistically annotated video corpora; and a sign bank. In: 8th workshop on the representation and processing of sign languages: involving the language community, Miyazaki, language resources and evaluation conference 2018 37. Oyedotun OK, Khashman A (2017) Deep learning in vision-based static hand gesture recognition. Neural Comput Appl 28(12):3941–3951 38. Oz C, Leu MC (2007) Linguistic properties based on American sign language isolated word recognition with artificial neural networks using a sensory glove and motion tracker. Neurocomputing 70(16–18):2891–2901 39. Oz C, Leu MC (2011) American sign language word recognition with a sensory glove using artificial neural networks. Eng Appl Artific Intell 24(7):1204–1213 40. Pugeault N, Bowden R (2011) Spelling it out: Real-time asl fingerspelling recognition. In: 2011 IEEE international conference on computer vision workshops (ICCV Workshops), pp 1114–1119 41. Ragab A, Ahmed M, Chau SC (2013) Sign language recognition using hilbert curve features. In: International conference image analysis and recognition. Springer, pp 143–151 42. Savur C, Sahin F (2015) Real-time American sign language recognition system using surface emg signal. In: 2015 IEEE 14th international conference on machine learning and applications (ICMLA). IEEE, pp 497–502 43. Savur C, Sahin F (2016) American sign language recognition system by using surface emg signal. In: 2016 IEEE international conference on systems, man, and cybernetics (SMC). IEEE, pp 002872–002877 44. Singh SK, Chaturvedi A (2022) A reliable and efficient machine learning pipeline for American sign language gesture recognition using emg sensors. Multimedia Tools Appl 1–39 45. Sun C, Zhang T, Bao BK, Xu C (2013) Latent support vector machine for sign language recognition with kinect. In: 2013 IEEE international conference on image processing. IEEE, pp 4190–4191
Applying Machine Learning for American Sign Language Recognition …
309
46. Sun C, Zhang T, Bao BK, Xu C, Mei T (2013) Discriminative exemplar coding for sign language recognition with kinect. IEEE Trans Cybernetics 43(5):1418–1428 47. Sun C, Zhang T, Xu C (2015) Latent support vector machine modeling for sign language recognition with kinect. ACM Trans Intell Syst Technol (TIST) 6(2):1–20 48. Tangsuksant W, Adhan S, Pintavirooj C (2014) American sign language recognition by using 3d geometric invariant feature and ann classification. In: The 7th 2014 biomedical engineering international conference. IEEE, pp 1–5 49. Tennant RA, Gluszak M, Brown MG (1998) The American sign language handshape dictionary. Gallaudet University Press 50. Triesch J, Von Der Malsburg C (1996) Robust classification of hand postures against complex backgrounds. In: Proceedings of the second international conference on automatic face and gesture recognition. IEEE, pp 170–175 51. Truong VN, Yang CK, Tran QV (2016) A translator for American sign language to text and speech. In: 2016 IEEE 5th global conference on consumer electronics. IEEE, pp 1–2 52. Tunga A, Nuthalapati SV, Wachs J (2021) Pose-based sign language recognition using gcn and bert. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 31–40 53. Usachokcharoen P, Washizawa Y, Pasupa K (2015) Sign language recognition with microsoft kinect’s depth and colour sensors. In: 2015 IEEE international conference on signal and image processing applications (ICSIPA). IEEE, pp 186–190 54. Wen F, Zhang Z, He T, Lee C (2021) Ai enabled sign language recognition and vr space bidirectional communication using triboelectric smart glove. Nat Commun 12(1):1–13 55. Wilbur R, Kak AC (2006) Purdue rvl-slll American sign language database 56. Wu J, Sun L, Jafari R (2016) A wearable system for recognizing American sign language in real-time using imu and surface emg sensors. IEEE J Biomed Health Inf 20(5):1281–1290 57. Wu J, Tian Z, Sun L, Estevez L, Jafari R (2015) Real-time American sign language recognition using wrist-worn motion and surface emg sensors. In: 2015 IEEE 12th international conference on wearable and implantable body sensor networks (BSN). IEEE, pp 1–6 58. Zamani M, Kanan HR (2014) Saliency based alphabet and numbers of American sign language recognition using linear feature extraction. In: 2014 4th international conference on computer and knowledge engineering (ICCKE). IEEE, pp 398–403
Identification of Promising Biomarkers in Cancer Diagnosis Using a Hybrid Model Combining ReliefF and Grey Wolf Optimization Sayantan Dass , Sujoy Mistry , and Pradyut Sarkar
Abstract Selection of informative genes or biomarkers is pivotal in designing cancer diagnosis and prognosis models. Prior researches have primarily been on wrapper or filter methods. While in most cases, the filtering strategy’s wretched performance mitigates computational cost and speed benefits. Similarly, a computational burden in the wrapper procedure does not allow it to perform as well. Limitations of the above methods have become a concern for researchers in gene or biomarker selection. As a result, hybrid approaches had introduced by combining both methods to reap the benefits while minimizing their downsides. However, the best gene or biomarker selection solutions are still difficult due to the conflict between computing efficiency and classification correctness. In the current study, we have built a novel hybrid model by integrating ReliefF and Grey Wolf Optimization (GWO) techniques for the prediction of cancer. The proposed technique is evaluated on Colon cancer, DLBCL, and Leukaemia gene expression datasets and achieves an efficiency of 95%, 98.8%, and 100%, respectively. The experimental results show that in contrast to earlier existing techniques, the proposed strategy has attained a minimal set of marker genes with better accuracy. Keywords Grey wolf optimizer · ReliefF · Hybrid gene selection · Microarray data · Classification
S. Dass (B) · S. Mistry · P. Sarkar Department of Computer Science and Engineering, Maulana Abul Kalam Azad University of Technology, Nadia, West Bengal, India e-mail: [email protected] S. Mistry e-mail: [email protected] P. Sarkar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 689, https://doi.org/10.1007/978-981-99-2322-9_23
311
312
S. Dass et al.
1 Introduction Cancer is still the biggest cause of mortality for both men and women around the world [1]. Early cancer diagnosis is critical for improving survival odds since patients can receive therapy if they are diagnosed early enough. As a result, extensive study has been taken place in order to discover better ways for preventing, diagnosing, and treating this disease in order to reduce mortality [1]. Microarray technology [2] has been regarded as one of the most significant achievements in molecular biology, particularly in terms of finding marker genes related to certain types of cancer. This technology permits researchers to assess the levels of expression of thousands of genes at the same time. However, researchers face a major challenge in classifying or identifying cancer by using gene expression microarray data since it has several hundreds of genes, while only a fewer samples are available for analysis [3]. Due to above curse of dimensionality problem, learning from gene expression data becomes a rigorous task. Furthermore, duplicate and irrelevant genes in microarray data have an adverse impact on the classifier’s ability to predict [3]. To overcome the above issue and enhance disease prediction accuracy, gene selection, also known as feature selection [4], may be used to discover a subset of genes that are most informative. Gene selection is mainly categorized into two groups: filter and wrapper [4]. This filter-based model [5] evaluates the importance of genes without employing any learning technique. Different filters are based on various metrics (distance, probability distribution, information theory, etc.). Hence, the filter-based methods are often faster. There are a stack of filter-based approaches for picking out genes, such as correlation coefficient [6], mutual information [7], F-score criterion [8], maximumrelevance minimum-redundancy (mRMR) [10], correlation-based feature selection (CFS) [9], ReliefF [11], and many others. A learning method is used in the wrapperbased gene selection models [5] to quantify the accurateness of the chosen geneset. In this kind of selecting genes mechanism, a search strategy has been employed to hunt final set of genes, where for every epoch of this search approach, a candidate gene-subset induced, and superiority of this subset computed by a learning algorithm. Finally, the most competent subset discovered is chosen as the final geneset. A number of wrapper-based feature selection methods have been proposed, including sequential backward selection (SBS) [12], sequential forward selection (SFS) [12], sequential backward floating selection (SFBS) [13], and sequential forward floating selection (SFFS) [13]. However, these methods have the disadvantage that it is more often to obtain the local optimal solution. Researchers have also applied evolutionary commutating (EC) mechanism, including genetic algorithms (GAs) [17], ant colony optimization (ACO) [14], genetic programming (GP) [16], and particle swarm optimization (PSO) [15] to feature selection problems to overcome the disadvantages of traditional wrapper approaches. Regarding this, we have presented a novel hybrid biomarker selection technique that incorporates the ReliefF and GWO algorithms and GWO searches top-rated
Identification of Promising Biomarkers in Cancer Diagnosis …
313
genes and identifies marker genes that can be used to correctly classify cancer cell. Normally hybrid techniques combine filter and wrapper methods, strive to optimize both of their benefits together: the filter method reduces the dimensionality of the gene space as a pre-processing step, while the wrapper approach selects prominent genes from among remaining genes. Initially in our work, ReliefF is employed to compute each gene’s weight and then subsequently ranked in decreasing order. The candidate genes are then picked based on their weight, and biomarkers are selected from among the gene subsets using the GWO algorithm. The hybrid ReliefF-GWO algorithm has been tested on 3-conventional genomic microarray datasets, and outcomes have been contrasted with several other algorithms that are recently published. It has been shown that ReliefF-GWO can acquire superior classification accurateness with a minimally selected set of genes or biomarkers; i.e. identifying a small number of essential genes achieve maximal classification efficiency when dealing with gene selection problems. • To present a unique two-stage filter-wrapper based on ReliefF and GWO, which can improve classification of predictions. • To eliminate genes that are meaningless or duplicated. • To minimize noise and improve classification results accuracy. • To pick genetic markers subsets from entire biological datasets. Here’s how the rest of this article is put together: Sect. 2 introduces the developed hybrid prediction algorithm. Section 3 reports on the experimental results, and a conclusion, as well as suggestions for further research, are presented in Sect. 4.
2 Proposed Methodology In this part, the feature selection algorithms ReliefF (which is considered a filter) and Grey Wolf Optimization (which is a wrapper) have been integrated into a single algorithm. First, we choose highly rated genes from the gene expression dataset using the ReliefF method. The second step involved GWO picking the best biomarker genes from the candidate dataset using k-NN’s classification accuracy as the fitness function. Figure 1 provides an in-depth process flow diagram of the proposed approach, and the subsequent sections provide a concise analysis.
2.1 ReliefF ReliefF [11], proposed by Kononenko, is broadly used in filter-based gene selection approach that calculates weights (ω) of genes and estimated based upon how well they distinguish between classes that are close to each other. Initially, it randomly
314
S. Dass et al. Filter Method
Gene Expression Dataset
Rank Genes by ReliefF
Compare each Grey Wolves fitness value and determine current first three best Wolves
Reduced Gene Expression Dataset
Select top Genes based on ranking
Set Initial Parameter & Initialize Grey Wolf Population randomly
Calculate fitness of each Grey Wolf
No Termination criteria is satisfied
Update positions of current Grey Wolves
Yes
Optimal subset of Genes
Wrapper Method
Fig. 1 Proposed hybrid model
selects an instance F p , then looks for its m-nearest neighbours who belong to the same class (known as the nearest hit, H ), and again it searches for the m-nearest neighbours of different classes (known as the nearest miss, M). Then, it updates the quality estimator ω for gene J relying upon the upsides of F, H , and M. For example, the quality estimator ω is decreased, when instance F and those in H have the gene G at different values. Similarly, ω is increased, when instance F and those in M have the gene G at different values. This process is repeated n times for every gene, and the weight values of each gene are determined using Eq. 1. −
−
( H + M) n
ω(G)new = ω(G)old +
(1)
where m H= M=
q=1 diff(G,
F p , Hq )
(2)
m
P(C) . 1 − P(class(x p )) C=class(x)
m q=1 diff(G,
m
F p , Mq (C))
(3)
where G is a geneset of expression dataset; ω(G)old and ω(G)new denote weightcoefficient before and after revising, respectively; diff(G, F p , Hq ) is a quantitative description of the contrast between Hq and F p on every geneset in G; P(class(x p )) and P(C) are both proportion of the samples in the same class, including x p and the target-samples C to the total-samples, respectively; Mq (C) indi-
Identification of Promising Biomarkers in Cancer Diagnosis …
315
cates the q t h neighbour-sample in the different class with the target-samples C; and diff(G, F p , Mq (C)) is a quantitative presentation of the contrast between Mq (C) and F p on each geneset in G.
2.2 Grey Wolf Optimization A Swarm-based Grey Wolf Optimization (GWO) algorithm [18] is inspired by grey wolves’ social and behavioural way of living, especially their hierarchical social structure and hunting methods. In 2014, Mirjalili devised and mathematically modelled the GWO algorithm. The work-flow of the algorithm is described in the following subsections.
2.2.1
Inspiration
Most grey wolves like to live in groups or packs. On average, there are 5–12 in a pack. Each group of grey wolves has a clear order of who is the most dominant. Alpha is the most influential member of the entire pack and takes the lead in all activities related to survival, including hunting, feeding, and moving. Beta is the second strongest wolf in the hierarchy of grey wolves and will take over as the pack leader if alpha is either incapacitated or dead. Delta and omega have significantly less of an impact than beta and alpha. This form of social intelligence is the primary inspiration for the GWO algorithm.
2.2.2
Mathematical Model
During a hunt, grey wolves would encircle their prey. So, grey wolf’s encircling behaviour can be described mathematically as: →
→
→ →
S (i + 1) = S p (i) + A. D → → → → D = C. S p (i) − S (i)
→
→
(4) (5) →
where, S p represents the position vector of the prey, S (i) and S (i + 1) represents the position vector of the grey wolf at i th and (i + 1)th iteration, respectively. Co-efficient →
→
vectors A and C expressed as: →
→ →
→
A = 2 a .r1 − a →
C=
→ 2.r2
(6) (7)
316
S. Dass et al. →
→
Where, random values between 0 and 1 are assigned to r1 and r1 vectors and → components of a decays linearly from 2 to 0 with each iteration. Typically, in most cases, the alpha will act as the hunt’s leader. To imitate the hunting behaviour of grey wolves mathematically, presume that the alpha (superior candidate solution), delta, and beta have superior knowledge of the possible location of prey. The top three solutions need all of the other search agents, including Omegas, to readjust their locations so that they are aligned with those of the most efficient search agents. Therefore, the following equations should be used while doing the update for the wolves’ positions: → → → → → → → → → → → D α = C1 . Sα − S , Dβ = C2 . Sβ − S , Dδ = C3 . Sδ − S → → → → → → → → → → → → S1 = Sa − A1 | Dα , S2 = Sβ − A2 | Dβ , S3 = Sδ − A3 | Dδ →
→
→
(9)
→
S1 + S2 + S3 S (t + 1) = 3
→
(8)
(10)
→
Grey wolf attacks on prey use a vector a whose value is between [−a, a] with a linear decrease in each iteration to range from 2 to 0. →
a = 2 − i.
2 itermax
(11)
where, itermax total number of iteration for algorithm and i is the number of iteration.
3 Result and Discussion 3.1 Experimental Setup and Parameters The same computer with a 2.40 GHz Intel Core i3 processor, 12 GB of RAM, and a 64-bit version of Windows 10 Home has been used for all the testing. Python 3.9 is used to implement the ReliefF, GWO, and k-NN algorithms. The population size will be set to 30, and the maximum number of iterations will be set to 1000.
3.2 Datasets Three publicly available cancer gene expression datasets were used to test our proposed approach for biomarker gene selection: colon cancer, DLBCL, and leukaemia. Table 1 provides a summary of the datasets’ characteristics.
Identification of Promising Biomarkers in Cancer Diagnosis … Table 1 Detailed description of datasets Dataset #Of gene #Of sample
317
#Of class
Description
References
Colon cancer
2000
62
40 22
Tumour Normal
Alon et al. [19]
DLBCL
7129
77
58 19
DLBCL FL
Shipp et al. [20]
Leukaemia
7129
72
47 25
ALL AML
Golub et al. [21]
Fig. 2 Confusion matrix
3.3 Performance Metric The following metrics are utilized to evaluate our proposed model’s efficiency and effectiveness.
3.3.1
Confusion Matrix
In order to evaluate the efficacy of a classification method, one might make use of a table known as a confusion matrix. Confusion matrices are a useful tool for visualizing and summarizing a classification algorithm’s performance. The confusion matrix is presented in the Fig. 2. From the confusion matrix shown in Fig. 2, we can infer some common metrics.
318
S. Dass et al.
Table 2 Performance comparison with or without gene selection Dataset Gene #Genes Recall Precision selection Colon cancer Without With DLBCL Without With Leukaemia Without With
2000 6 7129 6.8 7129 7.4
0.9834 1 0.7 0.836 0.6274 0.978
0.8384 1 0.9334 1 0.88 1
F-score
Accuracy (%)
0.9046 1 0.7712 0.9268 0.713 0.9882
85 100 91.75 95 82 98.8
Bold values represent the best statistically significant results
TP (TP + FN) TP Precision = (TP + FP) (Recall × Precision) F − Measure = 2 × (Recall + Precision) (TP + TN) Accuracy = (TP + TN + FP + FN) Recall/Sensitivity =
(12) (13) (14) (15)
3.4 Experimental Outcomes This section presents extensive analyses to assess the proposed gene selection approach. The proposed model integrates ReliefF ranking with GWO technique for extracting the subsets of significant and informative genes from different publicly available biological datasets mentioned in Table 1. We first calculated the classification accuracy of microarray gene expression datasets considering every genes (i.e. without utilizing any gene selection mechanism) employing a k-NN classifier. Then, our designed framework picks out the biomarkers or subsets of marker genes from these genomic microarray datasets. The same accuracy for classification has been computed for reduced datasets of pertinent genes (i.e. with gene selection mechanism). A complete performance comparison without and with selection of genes has displayed in Table 2. The Receiver Operating Characteristics (ROC) curve graphically illustrates the association between sensitivity and specificity that governs the binary classifier’s capability. The true-positive-rate (TPR), and false-positive-rate (FPR), respectively, are represented by the curve’s y-axis and an x-axis. If the ROC curve’s point is nearer to the vertical axis (shows greater area in ROC spaces), then the efficiency is better. However, the test performance is less accurate if the ROC curve’s point is nearer to the horizontal axis (shows less region in ROC spaces). Figure 3 shows the ROC curves for three datasets.
Identification of Promising Biomarkers in Cancer Diagnosis …
319
Fig. 3 Comparison of ROC Curve: (A) Colon cancer (B) DLBCL, (C) Leukaemia without gene selection & (D) Colon cancer, (E) DLBCL, (F) Leukaemia with gene selection Table 3 Comparing the proposed model to similar methods for each dataset Publication
Model
Colon cancer
DLBCL
Leukaemia
#Gene
Accuracy #Gene (%)
Accuracy #Gene (%)
Accuracy (%)
Dass et al. [9]
KWs test and CFS
23
90.90
–
–
24
98.61
Su et al. [22]
K-S test- CFS
10.7
90.10
–
–
25.2
79.60
Algamal & Lee [23]
SIS-MLE
5
94.61
–
–
7
95.51
Hashemi et al. [24]
Pareto-based ensembled
80
85
80
93.67
–
–
BaldomeroNaranjo et al. [25]
SVM with ramp loss limiting
17
87.50
15
95
32
94.60
Proposed
ReliefF with GWO
6
100
6.8
95
7.4
98.80
Bold values represent the best statistically significant results
Table 3 provides a comparative comparison of classification accuracy for the existing research and the proposed method to classify biological datasets. Average classification accuracy was obtained by a certain number of runs on the proposed hybrid model. It achieves an average of 100% accuracy by picking an average of 6 informative genes from colon cancer dataset, whereas 95% and 98.8% by selecting an average of 6.8 and 7.4 informative genes from DLBCL and Leukaemia, respectively. Comparing the results, it is clear that we can conclude that our proposed model’s average accuracy is better than other methods.
320
S. Dass et al.
4 Conclusion This study combines the ReliefF and GWO algorithms to create a hybrid filterwrapper strategy to pick highly informative genes from high-dimensional biological datasets. ReliefF assigns rankings to each of the genes and selects the subset of genes with the highest rankings in the initial stage of our suggested model. In the second, the GWO chooses the genes that are the most useful and pertinent. Thus, combining the ReliefF and GWO yields a successful method of gene selection. By calculating classification efficiency, we evaluated the usefulness of the proposed approach. Comparing the outcomes of our suggested technique to other feature selection techniques currently in use, we see promising results. The future phase will involve research to improve our model by integrating different filtering mechanisms in the ensemble approach to identify biomarkers from various complex disease microarrays and RNA sequencing datasets with more than two classes. We will also look at the validation of biological genes as an essential aspect of diagnosing real-world diseases.
References 1. Siegel RL, Miller KD, Jemal A (2019) Cancer statistics, 2019. CA: Cancer J Clinicians 69(1):7– 34 (2019) 2. Quackenbush J (2001) Computational analysis of microarray data. Nat Rev Genetics 2(6):418– 427 3. Somorjai RL, Dolenko B, Baumgartner R (2003) Class prediction and discovery using gene microarray and proteomics mass spectroscopy data: curses, caveats, cautions. Bioinformatics 19(12):1484–1491 4. Sta´nczyk U (2015) Feature evaluation by filter, wrapper, and embedded approaches. In: Feature selection for data and pattern recognition. Springer, pp 29–44 5. Saeys Y, Inza I (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517 6. Senan EM, Abunadi I, Jadhav ME, Fati SM (2021) Score and correlation coefficient-based feature selection for predicting heart failure diagnosis by using machine learning algorithms. Comput Math Methods Med 7. Vergara JR, Estévez PA (2014) A review of feature selection methods based on mutual information. Neural Comput Appl 24(1):175–186 8. Gu Q, Li Z, Han J (2012) Generalized fisher score for feature selection. arXiv preprint arXiv:1202.3725 9. Dass S, Mistry S, Sarkar P, Paik P (2021) An optimize gene selection approach for cancer classification using hybrid feature selection methods. In: International conference on advanced network technologies and intelligent computing. Springer, pp 751–764 10. Rachburee N, Punlumjeak W (2012) A comparison of feature selection approach between greedy, ig-ratio, chi-square, and mrmr in educational mining. In: 2015 7th international conference on information technology and electrical engineering (ICITEE). IEEE, pp 420–424 11. Urbanowicz RJ, Meeker M, La Cava W, Olson RS, Moore JH (2018) Relief-based feature selection: introduction and review. J Biomed Inf 85:189–203 12. Rodriguez-Galiano VF, Luque-Espinar JA, Chica-Olmo M, Mendes MP (2018) Feature selection approaches for predictive modelling of groundwater nitrate pollution: an evaluation of filters, embedded and wrapper methods. Sci Total Environ 624:661–672
Identification of Promising Biomarkers in Cancer Diagnosis …
321
13. Yulianti Y, Saifudin A (2020) Sequential feature selection in customer churn prediction based on naive bayes. In: IOP conference series: materials science and engineering. vol 879. IOP Publishing, p 012090 14. Tabakhi S, Najafi A, Ranjbar R, Moradi P (2015) Gene selection for microarray data classification using a novel ant colony optimization. Neurocomputing 168:1024–1036 15. Ghosh T, Mitra S, Acharyya S (2021) Pathway marker identification using gene expression data analysis: a particle swarm optimisation approach. In: International conference on emerging applications of information technology. Springer, pp 127–136 16. Ahvanooey MT, Li Q, Wu M, Wang S (2019) A survey of genetic programming and its applications. KSII Trans Internet Inf Syst (TIIS) 13(4):1765–1794 17. Jansi Rani M, Devaraj D (2019) Two-stage hybrid gene selection using mutual information and genetic algorithm for cancer data classification. J Med Syst 43(8):1–11 18. Al-Tashi Q, Md Rais H, Abdulkadir SJ, Mirjalili S, Alhussian H (2020) A review of grey wolf optimizer-based feature selection methods for classification. Evolutionary Mach Learning Tech 273–286 19. Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci 96(12):6745–6750 20. Shipp MA, Ross KN, Tamayo P, Weng AP, Kutok JL, Aguiar RC, Gaasenbeek M, Angelo M, Reich M, Pinkus GS et al (2002) Diffuse large b-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med 8(1):68–74 21. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537 22. Su Q, Wang Y, Jiang X, Chen F, Lu WC (2017) A cancer gene selection algorithm based on the ks test and cfs. BioMed Res Int 23. Algamal ZY, Lee MH (2019) A two-stage sparse logistic regression for optimal gene selection in high-dimensional microarray data classification. Adv Data Anal Classification 13(3):753–771 24. Hashemi A, Dowlatshahi MB, Nezamabadi-pour H (2021) A pareto-based ensemble of feature selection algorithms. Expert Syst Appl 180:115130 25. Baldomero-Naranjo M, Martinez-Merino LI, Rodriguez-Chia AM (2021) A robust svm-based approach with feature selection and outliers detection for classification problems. Expert Syst Appl 178:115017
Prediction of Ectopic Pregnancy in Women Using Hybrid Machine Learning Techniques Vimala Nagabotu and Anupama Namburu
Abstract Ectopic pregnancy is life-threatening when not diagnosed on time. Early detection of an ectopic pregnancy allows practitioners to counsel and manage cases with greater scrutiny. The current work aims to improve the prediction and safety of ectopic pregnancy. Pulling off from logistic regression, Naive Bayes, and XGBoost algorithms were implemented for predicting ectopic pregnancy with the help of factors like BMI (kg/m2 ), FET day(mm) parameters like endometrial thickness, -hCG serum levels on the 7th day (mIU/mL), bE2 (pg/mL), infertility (years) and age (years), and so on. Focusing on the Reproductive and Genetic Center of the Affiliated Hospital of Shandong University of Traditional Chinese Medicine, out of 2582 patients interested in vitro fertilization (IVF) and frozen-thawed embryo transfer (FET), 1061 patients were selectively included in the analysis after inclusion and exclusion criteria and found that 55 members were diagnosed with ectopic pregnancy and observed 97.35% accurate results for the testing data set in the XGBoost algorithm when compared to other algorithms. Keywords Ectopic pregnancy · Pregnancy risk factors · Naïve Bayes, XGBoost, Regression
1 Introduction Broadly, pregnancy is the state of a female (mainly) carrying an unborn life inside the body. An over-the-counter urine test followed by a blood test, ultrasound, and foetal heartbeat detection can provide complete assurance in this state. This state of carrying a developing embryo for a homo sapiens lasts 9 months, beginning with the woman’s last menstrual period (LMP). This whole period is broken down into three trimesters with three months each. During the first trimester, which is also the most important trimester of all, any injury to the foetus will almost certainly V. Nagabotu (B) · A. Namburu School of Computer Science and Engineering, VIT-AP University, Beside AP Secretariat, Near Vijayawada, Andhra Pradesh 522237, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 689, https://doi.org/10.1007/978-981-99-2322-9_24
323
324
V. Nagabotu and A. Namburu
Fig. 1 Difference between normal pregnancy and ectopic pregnancy
result in miscarriage or major impairment. The most common problem associated with early pregnancy is miscarriage in the first trimester Ammon and Wilcox [1, 2]. Abortion occurs in 11–13% of all confirmed births Almeida and Rossen [3, 4]. Although the aetiology of most abortions is unclear, they are believed to be caused by a complicated interaction of parental age, genetic, hormonal, metabolic, immunological, and environmental variables Agenor and Garrido [5, 6]. Although it is difficult to measure, recent studies have found its probability in recognized foetuses to be just above 14–18% Magnus and Yuksel [7, 8]. One of the most common causes of miscarriage is egg fertilization outside the uterine cavity, which is known as an "ectopic pregnancy." The fallopian tube has the highest rate of EPs (83.98%), as there are increasing tubal reconstructive surgeries and an increased incidence of PID. However, cervical, cornual, hysterotomy scar, intramural, ovarian, or abdominal surgery are alternatives that can be opted out. It affects a small percentage of all pregnancies, inevitably resulting in foetal death or illness during the first trimester Fernandez and Zhang [9, 10]. The most significant ectopic pregnancy risk is defective fallopian tube function, which prevents the fertilized egg from adequately transplanting in the uterine cavity. Pelvic inflammatory diseases (PID) are the most common risk factors, which include ectopic pregnancy history, tubal surgery, and smoking. Chukus [11]. As illustrated in Fig. 1, there are different types of ectopic pregnancies: heterotopic ectopic pregnancy (1–3%), ovarian pregnancy ( 0.5. This model has a testing accuracy of 69.78%.
336
V. Nagabotu and A. Namburu
Fig. 3 LR model-ROC curve Table 2 Representation of various models’ accuracy for ectopic pregnancy S.No. Algorithm Training accuracy Testing accuracy 1 2 3 4 5
Decision tree Random forest SVM Naïve Bayes XGBoost
80.11 100 98.15 89.19 100
76.7 95.9 95.36 89.69 97.35
The DT model classified the total number of normal pregnancies (705 into 642 normal pregnancies and wrongly classified 63 as ectopic pregnancies), and the total number for the ectopic pregnancy class is 71, out of which 20 are classified as ectopic pregnancies and, wrongly, 51 are classified as normal pregnancies for the training data set. Table 3 depicts the confusion matrix for the same. The total number of normal pregnancies is 285 divided by 228 normal pregnancies, and 57 are incorrectly classified as ectopic pregnancies. The total number of ectopic pregnancy cases is 21, of which 7 are incorrectly classified as ectopic pregnancy and 14 as normal pregnancy. Table 3 depicts the confusion matrix for the same. The DT model achieved an accuracy of 80.11% in training and 76.9% in testing with these classification results, as shown in Table 2 and Fig. 7 represent the ROC curve of decision tree. The RF model classified the total number of normal pregnancies at 700 into 700 normal pregnancies and wrongly classified 0 as ectopic pregnancies. The total number for the ectopic pregnancy class is 73, out of which 73 are classified as ectopic pregnancies and, wrongly, 0 are classified as normal pregnancies for the training data set. The confusion matrix for the same is shown in Table 3. The total number of normal pregnancy is 285 divided by 271 normal pregnancy, and 14 are incorrectly classified as ectopic pregnancy for the test data set. The total number for the ectopic pregnancy class is 21, out of which 21 are incorrectly classified as ectopic pregnancy and 0 are incorrectly classified as normal pregnancy. Table 3 depicts the confusion matrix for the same. With these classification results, the RF model achieved a training model
Prediction of Ectopic Pregnancy in Women … Table 3 Confusion matrix of the models Model Metrics Normal DT (train data)
337
EP
Sum
Normal EP Sum Normal EP Sum Normal EP Sum Normal EP Sum Normal EP Sum Normal EP Sum Normal
601 115 673 216 51 271 700 0 700 271 0 271 672 1 673 292 6 298 660
23 17 83 18 21 35 0 73 73 14 21 35 13 70 83 9 16 25 45
685 71 756 285 21 306 700 73 773 285 21 306 685 71 756 301 22 323 705
EP Sum Naïve bayes (test Normal data) EP Sum XGBoost (train Normal data) EP Sum XGBoost (test Normal data) EP Sum
40 700 225
36 81 15
76 781 240
14 239 768
14 29 0
28 268 768
0 768 200
88 88 2
88 856 202
3 203
18 20
21 223
DT (test data)
RF (train data)
RF (test data)
SVM (train data)
SVM (test data)
Naïve bayes (train data)
338
V. Nagabotu and A. Namburu
Fig. 4 Error plot representation of random forest model
Fig. 5 Representation of min loss in XGBoost model based on number of iterations Fig. 6 To indicate the feature relevance can be represented based on the XGBoost mechanism
accuracy of 100% and a testing accuracy of 95.9%, as shown in Table 2 and Fig. 4 represent the error plot representation of random forest. The SVM model classified the total number of normal pregnancies (685 into 672 normal pregnancies and wrongly classified 13 as ectopic pregnancies. The total number for the ectopic pregnancy class is 71, out of which 70 are classified as ectopic pregnancies and, wrongly, 1 is classified as normal pregnancies for the training data
Prediction of Ectopic Pregnancy in Women …
339
Fig. 7 Decision tree model—ROC curve
set. The confusion matrix for the same is shown in Table 3, where the total number of normal pregnancy is 301, divided by 292 normal pregnancy, and 9 are incorrectly classified as ectopic pregnancy for the test data set. The total number for the ectopic pregnancy class is 22, of which 16 are incorrectly classified as ectopic pregnancy and 6 are incorrectly classified as normal pregnancy. Table 3 depicts the confusion matrix for the same. The RF model achieved an accuracy of 98.15% in training and 95.36% in testing with these classification results, as shown in Table 2. The Naive Bayes model classified the total number of normal pregnancies (705 into 660 normal pregnancies and incorrectly classified 45 as ectopic pregnancies for the training data set); the total number for the ectopic pregnancy class is 76, of which 36 are classified as ectopic pregnancies and 40 are incorrectly classified as normal pregnancies.Table 3 shows the confusion matrix for the same, where the total number of normal pregnancies is 240, which is incorrectly classified as 2225 normal pregnancies, and the total number for the ectopic pregnancy class is 28, of which 14 are classified as ectopic pregnancies and 14 are incorrectly classified as normal pregnancies for the test data set. Table 3 depicts the confusion matrix for the same. The RF model achieved an accuracy of 98.15% in training and 95.36% in testing with these classification results, as shown in Table 2. The XGBoost model classified the total number of normal pregnancies as 78 into 768 normal pregnancies and wrongly classified 0 as ectopic pregnancies. The total number of ectopic pregnancies is 88, out of which 88 are classified as ectopic pregnancies and, wrongly, 0 are classified as normal pregnancies for the training data set. The confusion matrix for the same is shown in Table 3. The total number of normal pregnancies is 202, out of which 2 are wrongly classified as ectopic pregnancies, and the total number of ectopic pregnancies is 21, out of which 18 are classified as ectopic pregnancies and, wrongly, 3 are classified as normal pregnancies for the test data set. Table 3 depicts the confusion matrix for the same. With these classification results, the RF model achieved a training model accuracy of 100% and a testing accuracy of 97.35%, as shown in Table 2 and Fig. 5 refers the representation of the min loss in XGBoost model based on number of iterations. Figure 6 refers to indicate the feature relevance can be represented based on XGBoost mechanism.
340
V. Nagabotu and A. Namburu
4 Conclusion In this paper, we mainly focused on comparing the models Nave Bayes, decision tree, random forest, SVM, logistic regression, and XGBoost for finding out whether the pregnancy falls in the ectopic or normal category. Unexpected ectopic pregnancies have become much more common, and when a few cases are discovered late, it might lead to maternal mortality and mobility. Due to the high risk condition that affects the majority of fertile wombed patients, there is no clear consensus on how to manage ectopic pregnancy. On the other hand, pre-timed detection as well as treatments help a lot with the mother’s survival. We discovered 76.7% training accuracy for decision tree, 95.9, 95.36, 89.69, and 97.35% training accuracy for random forest, SVM, Naive Bayes, and XGBoost from a data set of 2582 patients at the Reproductive and Genitic Center of the Affiliated Hospital of Shandong University of Traditional Chinese Medicine who were interested in vitro fertilization and FET. Therefore, we conclude that of all the models, XGBoost gives the best accuracy.
5 Conflict of Interest The authors have no conflicts of interest in any matter related to the paper.
6 Data Availability Data are available in a public, open access repository. Extra data can be accessed via the Dryad data repository at https://datadryad.org/ with the doi:10.5061/dryad.8931zcrnj.
References 1. Ammon Avalos L, Galindo C, Li D-K (2012) A systematic review to calculate background miscarriage rates using life table analysis. Birth Defects Res Part A: Clinical Molecular Teratology 94(6):417–423 2. Wilcox AJ, Weinberg CR, O’Connor JF, Baird DD, Schlatterer JP, Canfield RE, Armstrong EG, Nisula BC (1988) Incidence of early loss of pregnancy. New England J Med 319(4):189–194 3. Almeida ND, Basso O, Abrahamowicz M, Gagnon R, Tamblyn R (2016) Risk of miscarriage in women receiving antidepressants in early pregnancy, correcting for induced abortions. Epidemiology 27(4):538–546 4. Rossen LM, Ahrens KA, Branum AM (2018) Trends in risk of pregnancy loss among us women, 1990–2011. Paediatric Perinatal Epidemiol 32(1):19–29 5. Agenor A, Bhattacharya S (2015) Infertility and miscarriage: common pathways in manifestation and management. Women’s Health 11(4):527–541
Prediction of Ectopic Pregnancy in Women …
341
6. Garrido-Gimenez C, Alijotas-Reig J (2015) Recurrent miscarriage: causes, evaluation and management. Postgraduate Med J 91(1073):151–162 7. Magnus MC, Wilcox AJ, Morken NH, Weinberg CR, Håberg SE (2019) Role of maternal age and pregnancy history in risk of miscarriage: prospective register based study. BMJ 364 8. Yüksel D (2021) Rare types of ectopic pregnancies. Curr Obstetrics Gynecol Rep 10(1):1–6 9. Fernández ADR, Fernández DR, Sánchez MTP (2019) A decision support system for predicting the treatment of ectopic pregnancies. Int J Med Inf 129:198–204 10. Zhang D, Shi W, Li C, Yuan JJ, Xia W, Xue RH, Sun J, Zhang J (2016) Risk factors for recurrent ectopic pregnancy: a case-control study. BJOG: Int J Obstetrics Gynaecol 123:82–89 11. Chukus A, Tirada N, Restrepo R, Reddy NI (2015) Uncommon implantation sites of ectopic pregnancy: thinking beyond the complex adnexal mass. Radiographics 35(3):946–959 12. Harzif AK, Hyaswicaksono P, Kurniawan RH, Wiweko B (2021) Heterotopic pregnancy: diagnosis and pitfall in ultrasonography. Gynecol Minimally Invasive Therapy 10(1):53 13. Garg P et al (2021) Ovarian ectopic pregnancy-a case report of two cases. Sch J Med Case Rep 6:667–669 14. Fowler ML, Wang D, Chia V, Handal-Orefice R, Latortue-Albino P, Mulekar S, White K, Perkins R (2021) Management of cervical ectopic pregnancies: a scoping review. Obstetrics Gynecol 138(1):33–41 15. Scibetta EW, Han CS (2019) Ultrasound in early pregnancy: viability, unknown locations, and ectopic pregnancies. Obstet Gynecol Clinics 46(4):783–795 16. Yuan X, Saravelos SH, Wang Q, Xu Y, Li T-C, Zhou C (2016) Endometrial thickness as a predictor of pregnancy outcomes in 10787 fresh ivf-icsi cycles. Reprod Biomed Online 33(2):197–205 17. Rombauts L, McMaster R, Motteram C, Fernando S (2015) Risk of ectopic pregnancy is linked to endometrial thickness in a retrospective cohort study of 8120 assisted reproduction technology cycles. Hum Reprod 30(12):2846–2852 18. Liu X, Qu P, Bai H, Shi W, Shi J (2019) Endometrial thickness as a predictor of ectopic pregnancy in 1125 in vitro fertilization-embryo transfer cycles: a matched case-control study. Arch Gynecol Obstet 300(6):1797–1803 19. Liu H, Zhang J, Wang B, Kuang Y (2020) Effect of endometrial thickness on ectopic pregnancy in frozen embryo transfer cycles: an analysis including 17,244 pregnancy cycles. Fertil Steril 113(1):131–139 20. Gao G, Cui X, Li S, Ding P, Zhang S, Zhang Y (2020) Endometrial thickness and ivf cycle outcomes: a meta-analysis. Reprod Biomed Online 40(1):124–133 21. Niu L (2020) A review of the application of logistic regression in educational research: common issues, implications, and suggestions. Educ Rev 72(1):41–67 22. Ilanjselvi M, Priya KS (2021) Prospective study on ectopic pregnancy in a tertiary care hospital. Int J Reprod Contracept Obstet Gynecol 10(5):1890 23. Huang C, Xiang Z, Zhang Y, Tan DS, Yip CK, Liu Z, Li Y, Yu S, Diao L, Wong LY et al (2021) Using deep learning in a monocentric study to characterize maternal immune environment for predicting pregnancy outcomes in the recurrent reproductive failure patients. Front Immunol 12:642167 24. Ramos-Medina R, García-Segovia A, Gil J, Carbone J, Aguaron de la Cruz A, Seyfferth A, Alonso B, Alonso J, León JA, Alecsandru D et al (2014) Experience in ivi g therapy for selected women with recurrent reproductive failure and nk cell expansion. Am J Reprod Immunol 71(5):458–466 25. Dhillon R, McLernon D, Smith P, Fishel S, Dowell K, Deeks J, Bhattacharya S, Coomarasamy A (2016) Predicting the chance of live birth for women undergoing ivf: a novel pretreatment counselling tool. Hum Reprod 31(1):84–92 26. Milewski R, Kuczy´nska A, Stankiewicz B, Kuczy´nski W (2017) How much information about embryo implantation potential is included in morphokinetic data? a prediction model based on artificial neural networks and principal component analysis. Adv Med Sci 62(1):202–206
342
V. Nagabotu and A. Namburu
27. Vaegter KK, Lakic TG, Olovsson M, Berglund L, Brodin T, Holte J (2017) Which factors are most predictive for live birth after in vitro fertilization and intracytoplasmic sperm injection (ivf/icsi) treatments? analysis of 100 prospectively recorded variables in 8400 ivf/icsi singleembryo transfers. Fertil Steril 107(3):641–648 28. Hafiz P, Nematollahi M, Boostani R, Jahromi BN (2017) Predicting implantation outcome of in vitro fertilization and intracytoplasmic sperm injection using data mining techniques. Int J Fertil Steril 11(3):184 29. Hassan MR, Al-Insaif S, Hossain MI, Kamruzzaman J (2020) A machine learning approach for prediction of pregnancy outcome following ivf treatment. Neural Comput Apol 32(7):2283– 2297 30. Kafaee Ghaeini M, Amin-Naseri MR, Aghahoseini M (2018) Prediction of clinical pregnancy occurrence after icsi using decision tree and support vector machine methods. J Obstet Gynecol Cancer Res (JOGCR) 3(4):149–155 31. Blank C, Wildeboer RR, DeCroo I, Tilleman K, Weyers B, De Sutter P, Mischi M, Schoot BC (2019) Prediction of implantation after blastocyst transfer in in vitro fertilization: a machinelearning perspective. Fertil Steril 111(2):318–326 32. Vogiatzi P, Pouliakis A, Siristatidis C (2019) An artificial neural network for the prediction of assisted reproduction outcome. J Assisted Reprod Genetics 36(7):1441–1448 33. Qiu J, Li P, Dong M, Xin X, Tan J (2019) Personalized prediction of live birth prior to the first in vitro fertilization treatment: a machine learning method. J Transl Med 17(1):1–8 34. Barnett-Itzhaki Z, Elbaz M, Butterman R, Amar D, Amitay M, Racowsky C, Orvieto R, Hauser R, Baccarelli AA, Machtinger R (2020) Machine learning versus classic statistics for the prediction of ivf outcomes. J Assisted Reprod Genetics 37(10):2405–2412 35. Bruno V, D’Orazio M, Ticconi C, Abundo P, Riccio S, Martinelli E, Rosato N, Piccione E, Zupi E, Pietropolli A (2020) Machine learning (ml) based-method applied in recurrent pregnancy loss (rpl) patients diagnostic work-up: a potential innovation in common clinical practice. Sci Rep 10(1):1–12 36. Yuan L, Yu L, Sun Z, Song J, Xiao J, Jiang H, Sa Y (2020) Association between 7-day serum β-hcg levels after frozen-thawed embryo transfer and pregnancy outcomes: a single-centre retrospective study from china. BMJ Open 10(10):035332 37. Breiman L (2001) Random forests. Mach Learn 45(1):5–32 38. Vaulet T, Al-Memar M, Fourie H, Bobdiwala S, Saso S, Pipi M, Stalder C, Bennett P, Timmerman D, Bourne T et al (2022) Gradient boosted trees with individual explanations: an alternative to logistic regression for viability prediction in the first trimester of pregnancy. Comput Methods Programs Biomed 213:106520
Redundancy Reduction and Adaptive Bit Length Encoding-Based Purely Lossless ECG Compression Butta Singh , Neetika Soni, and Indu Saini
Abstract In the proposed work, the redundancy reduction-based two-dimensional (2D) electrocardiogram (ECG) data compression technique is performed wherein the inter-beat and intra-beat ECG samples were exploited to reduce their amplitudes. The resultant samples showed significant reduction in amplitudes out of which few variations with relatively high magnitudes were observed. Implementation of fixed length coding schemes degrades the compression efficiency. Whereas the existing variable-length encoding schemes failed to provide variable codelengths to the sample amplitudes. In the proposed work, a new Adaptive Bit Length Encoding technique was designed that adaptively selected the number of binary bits required to encode the elements in a block, thus reducing the data size efficiently. The proposed approach ensured lossless ECG compression with guaranteed signal reconstruction. The experiments were performed on all 48 records of the standard MIT-BIH arrhythmia database of 5 min duration. The 100% reconstruction of the compressed signal resulted in zero Percentage Residual Difference (PRD) and Wavelet Energybased Diagnostic Distortion (WEDD) while the Compression Ratio (CR) comes out to be 3.14 when measured on all records of arrhythmia database which depicted the proficiency of the proposed technique as compared to other state of the art techniques. Keywords Tele-healthcare services · 2D ECG compression · Lossless compression · Compression ratio
B. Singh (B) · N. Soni Department of Engineering and Technology, Guru Nanak Dev University, Regional Campus, Jalandhar, India e-mail: [email protected] I. Saini Department of Electronics and Communication Engineering, Dr. B. R. Ambedkar National Institute of Technology, Jalandhar, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 689, https://doi.org/10.1007/978-981-99-2322-9_25
343
344
B. Singh et al.
1 Introduction The upsurge in the communication infrastructure and development in Internet of Things (IoT) has significantly helped the medical industry to promote e-healthcare services. It has rapidly changed the dynamics of conventional medical management systems by providing quality health care to the home-bound or remote patients by the globally available experienced medical practitioners without their physical interface. Since, patients do not come in direct contact with medical practitioners, these services are persuasive in the treatment of contagious diseases (e.g., COVID-19) also. Patient’s medical biography (physiological signals, medical images and metadata) is sent to the doctor and appropriate action can be taken without any delay. However, the revolution in e-healthcare services also demands the increase in bandwidth and storage capacity. Electrocardiogram (ECG), which represents the electrical activity of the heart, is predominant in diagnosis of cardiovascular (CVD) as well as autonomic nervous system diseases [1–3]. But long-term recordings of ECG signals generate a hefty amount of data that poses a huge threat and unaffordable burden on transmission and storage devices. It demands implication of some efficient compression techniques that can reduce the data size vis a vis guarantees to preserve the signal morphology after reconstruction. The compression techniques can be lossless or lossy [4, 5]. Lossless compression techniques explore the inter-beat and inter-sample relationships of ECG signals to reduce the data size. These techniques are reversible and signals can be reconstructed perfectly but at low compression ratio (CR). Lossy techniques are irreversible. They produce high CRs but at the expense of reconstruction error. Though both the approaches are frequently used in compression but in case of sensitive applications such as biomedical signals, distortion due to compression can mislead the diagnosis, hence in such cases lossless techniques are preferred. The ECG compression techniques are widely classified into two categories; Time domain and Transform domain [5]. Time domain techniques exploits all those redundancies in the signal that have statistical dependence between samples [6–8]. Various approaches that fall in category of time domain includes statistical encoding, e.g., runlength encoding [9], Huffman encoding [10–12], Golomb encoding [13], redundancy reduction [14], adaptive sampling [15] and parameter extraction methods [16]. In transform domain [17–19], the coefficients of the original signal associated with some predefined basis are evaluated and modified such that the signal can be represented by few parameters only. Tremendous research has been reported in literature that perform lossy ECG compression in both spatial and transform domain, but limited techniques performed lossless ECG compression. Takahiro [20] performed online lossless ECG compression using anti-dictionary codes. The average CR obtained when measured on 10 randomly selected records from standard MITBIH arrhythmia database of lengths 0.1 and 1.3 MB was 0.27 and 0.25, respectively. Chang and Lin developed lossless ECG compression based on delta coding and optimally selected Huffman encoding [10]. The delta encoding reduced the magnitude
Redundancy Reduction and Adaptive Bit Length Encoding-Based …
345
of samples while entropy-based Huffman encoding compressed the reduced magnitudes. The CR achieved was 3.10 but the amount of side information generated was not discussed in the results. VLSI-based lossless compressors for wireless healthcare monitoring applications have been designed [21, 22]. Chua and Fang performed priority data selector, prediction and parameter estimation and golumb-rice coding to compress the signal [21]. A two-stage entropy encoder based on two-Huffman coding tables was employed in [22]. Average CR of 2.43 was obtained from records of MIT-BIH arrhythmia database. Further, ASCII character-based ECG compression approach is proposed and tested on all 12 leads of PTB diagnostic ECG database [23]. The technique attained CR of 7.18 at minimal PRD% of 0.023. A joint QRS detection and prediction-based ECG data compression were proposed to achieve CR of 2.28 when measured on all records of the MIT-BIH arrhythmia database [24]. Most of the existing lossless ECG compression techniques are implemented on 1-dimension (1D) signals only. ECG signal is a quasi-periodic signal with strong correlations between adjacent beats and adjacent samples. In the proposed technique, these properties of ECG signal are explored to convert 1D ECG into 2D ECG [25] and further to remove the redundancies among them. The proposed technique promises the complete reconstruction with zero PRD and comparable CR.
2 Proposed Method The proposed technique performs redundancy reduction-based 2D lossless ECG compression in the spatial domain. In this approach, the pre-processed 1D ECG is converted into 2D and then multiple computations are performed to reduce the magnitudes of 2D ECG samples. The reduced magnitudes are encoded with the newly developed Adaptive Bit Length Encoding (ABLE) technique. ABLE performs purely lossless compression by adaptively selecting the number of bits required to encode the elements of ECG samples rather than using the fixed bit length to encode the complete sequence. Figure 1 shows the basic steps involved in the process of ABLE-based lossless ECG compression and decompression. These steps are explained as.
2.1 Pre-processing of the Input ECG Signal The input signal ECGi is initially normalized by shifting the signal to zero DC level. This is done by subtracting the signal mean value ECGi from the signal ECGi Normalized ECG(ECG) = ECGi − ECGi
(1)
346
B. Singh et al.
Fig. 1 Redundancy reduction and ABLE-based ECG a Compression b Decompression process
2.2 Conversion of 1D ECG Signal to 2D ECG Image The 1D ECG signal is converted into 2D ECG based on the cut and align method [25]. The R-peaks are detected using Pan-Tompkin algorithm [26] and R-R intervals are computed. Considering R-peaks as centers, the RR interval is divided in the ratio of 1:2, and ECG beats are stacked vertically with aligned R-peaks. Since ECG signals are quasi-periodic and have different beat lengths which are normalized with zero-padding [27]. Figure 2 displays the 2D ECG(I m) formed from record 100 from MIT-BIH arrhythmia database of 1 min.
2.3 Redundancy Reduction Process The strong correlation in ECG beats is exploited to remove the inter-beat and intersample redundancy by reducing their amplitudes in three steps:
Redundancy Reduction and Adaptive Bit Length Encoding-Based …
347
Fig. 2 2D ECG image (I m) formed from record 100 of MIT-BIH arrhythmia database of 1 min duration
(i) Calculate the mean of I m. Figure 3 shows the resultant average beat (ECGav ). ECGav = mean(I m)
(2)
(ii) Subtract each ECG beat of I m from ECGav and store the result in ECG D . If (r, c) be the number of rows and columns of I m, for i = 1 to r ECG D (i, 1 : c) = ECGav − ECG(i, 1 : c)
(3)
end
Fig. 3 Impact of redundancy reduction on amplitude of an ECG beat a Raw ECG b Normalized ECG c ECGD d ECGR
348
B. Singh et al.
(iii) To remove inter-sample redundancy, apply differential coding on ECG D by subtracting the beat samples from the adjacent samples in the same row. The residual ECG samples (ECG R ) obtained are: k=1 for j = 1 : c ECG R (k, j) = ECG D (k, j + 1) − ECG D (k, j)
(4)
k=k+1 end The process of redundancy reduction is well demonstrated in Fig. 3. The first ECG beat of I m, i.e., I m 1 is considered for the experimentation. It is observed from Fig. 3c that ECG D has reduced amplitude range as compared to the amplitude of original ECG beat, and it is further decreased in ECG R (Fig. 3d).
2.4 Encoding of the Residual Samples The final step after removing the redundant amplitudes is to encode the residual signal. The 2D I m with reduced amplitudes is again converted into 1D for encoding. The zero-padding is removed prior to conversion. The resultant amplitudes obtained in ECG R have both positive and negative terms; therefore, it is required to separate out their signs prior to encoding. Generation of Sign Matrix The signs are separated with the following criteria. If len = length of ECG1D for g = 1 to len if ECG1D (g) ≥ 0 sign_mat (g) = 0 else sign_mat (g) = 1 This sign matrix is the part of side information and is used during signal reconstruction.
Redundancy Reduction and Adaptive Bit Length Encoding-Based …
349
Adaptive Bit Length Encoding (ABLE) It is noticed in Fig. 3d that the residual beat has low amplitude throughout except a spike in the QRS complex region. The number of bits required to encode this part of beat is more as compared to the bits required by the rest of the beat and encoding the whole beat samples with fixed code-length increases the data size. In this work, a novel method of encoding is employed that adaptively selects the number of binary bits required to encode the block of ECG samples which ultimately reduces the data size and improves the CR. The complete process of encoding the compressed ECG samples using ABLE technique is discussed in algorithm 1. The ABLE-based encoding technique is further explained with the help of an example. Consider the absolute values of few rows and columns of the residual 2D ECG as
350
B. Singh et al.
⎡
1 ⎢0 ⎢ ⎢ ECR R = ⎢ 1 ⎢ ⎣1 0
3 1 0 1 2
0 0 0 0 0
1 ⎢3 ⎢ ⎢ = ⎢0 ⎢ ⎣ 12 4
0 1 0 10 4
12 10 17 14 11
⎤ 4 4⎥ ⎥ ⎥ 4⎥ ⎥ 4⎦ 4
1. Convert ECGR into 1D ⎡
ECG1D
1 1 0 1 0 0 17 14 4 4
⎤ 0 2 ⎥ ⎥ ⎥ 0 ⎥ ⎥ 11 ⎦ 4
2. Select block length (fr) = number of rows in ECGR, i.e., fr = 5 3. Compute maximum amplitude in ECG1D max _val = 17 4. Calculate number of bits required to represent max_val into binary nb =
log2 (max_val) + 1 = log2 (17) + 1 = 5
verify if max_val ≥ 2nb − 3 (17 < 29) Therefore, nb = 5 flag1 = 2nb − 1 = 31; flag2 = 2nb − 2 = 30; 5. Divide ECG1D into blocks each of size fr = 5 ⎡ ⎤ 1 0 1 1 0 ⎢3 1 0 1 2 ⎥ ⎢ ⎥ ⎢ ⎥ 6. ECG1D = ⎢ 0 0 0 0 0 ⎥ ⎢ ⎥ ⎣ 12 10 17 14 11 ⎦ 4 4 4 4 4 Number of blocks (b) =
length of ECG1D =5 fr
Redundancy Reduction and Adaptive Bit Length Encoding-Based …
351
For adaptive encoding: i. Consider block b1 = [1 0 1 1 0] Since the elements in the block are different, therefore according to algorithm 1, this is case 3 max_b1 = 1
nb_b1 = log2 (1 ) + 1 = log2 (1) + 1 = 1 Append header = nb_b1 in front of b1 to generate new_seq1 as new_seq1 = [nb_b1 b1 ] = [1 1 0 1 1 0] header is coded with (nb) bits while the block values are encoded with (nb_b1 ), therefore, total bits required to convert new_seq1 into binary length (bin_seq1) = (nb) + fr* nb_b1 = (5) + (5*1) = 10 bits ii. block b2 = [3 1 0 1 2] max_b2 = 3
nb_b2 = log2 (2 ) + 1 = log2 (3) + 1 = 2 Append header = nb_b2 in front of b2 to generate new_seq2 as new_seq2 = [nb_b2 b2 ] = [2 3 1 0 1 2] bits required to convert new_seq2 into binary is length(bin_seq2) = (nb) + ( fr* nb_b2 ) = (5) + (2*5) = 15 bits iii. block b3 = [0 0 0 0 0] This is case 1 of algorithm 1, new_seq3 = header = 2nb − 1( flag1) bits required to convert new_seq3 into binary is length(bin_seq3) = nb bits = 5 bits iv. block b4 = [12 10 17 14 11] (case 3) max_b4 = 17
nb_b4 = log2 (4 ) + 1 = log2 (17) + 1 = 5 Append header = nb_b4 in front of b4 to generate new_seq as
352
B. Singh et al.
new_seq4 = [nb_b4 b4 ] = [5 12 10 17 14 11] bits required to convert new_seq4 into binary is length(bin_seq4) = (nb) + fr* nb_b4 = (5) + (5*5) = 30 bits v. block b5 = [4 4 4 4 4] Since all values are same, this is case 2 according to algorithm 1 X=4 Append header = nb_b5 in front of X to generate new_seq5 as new_seq5 = [ flag2 X] = [30 4] bits required to convert new_seq5 into binary is = nb bits each to encode both flag2 and X length(bin_seq5) = (2*nb) = 10 bits Therefore total bits required to encode ECG1D using ABLE technique: length(bin_seq1) + length(bin_seq2) + length(bin_seq3) + length(bin_seq4) + length(bin_seq5) = 10 + 15 + 5 + 30 + 10 = 70 bits Fixed length coding: nb*length(ECG1D ) = 5*25 = 125bits It is evident from this example that the data size reduces significantly using ABLE technique. During the compression process, all the computations are performed on integer values hence it is possible to reconstruct the complete signal without any loss. The side information that is also transmitted along with the bin_seq consists of locations of R-Peaks (loc_Rpeaks), ECGav , sign_mat, fr and nb. Reverse operation to obtain decompressed and reconstructed ECG signals is explained in algorithm 2.
3 Result and Discussion The performance of the proposed compression algorithm is evaluated on the standard MIT-BIH arrhythmia database which is publicly available online [28]. The database consists of two-channel ambulatory 48 half-hour recordings from 47 subjects. The signals are recorded at sampling frequency of 360 Hz per channel with 11-bit resolution over 10 mV range for 30 min. The efficacy of the proposed compression algorithm is evaluated in terms of its compression efficiency and the signal reconstruction efficiency where compression efficiency is determined by the CR while the signal reconstruction efficiency is computed by the distortion parameters such PRD and WEDD. CR is defined as the ratio of the number of bits required to encode the
Redundancy Reduction and Adaptive Bit Length Encoding-Based …
353
original signal to the bits used to encode the compressed signal. PRD is the ratio of root mean square difference between the samples of the original and the reconstructed signal to the square of the original signal. It distributes the error equally over the whole signal thus unable to find the actual point of distortion. Wavelet Energy-based Diagnostic Distortion (WEDD) computes the local distortion in the signal hence provides clinically acceptable diagnostic measure [29]. Algorithm 2. Decompression Process Input: received binary sequence (bin_seq), side information (location of Rpeaks (loc_Rpeaks), sign matrix (sign_mat), number of binary bits used for conversion (nb), length of the block (fr) Output: Decompressed ECG signal (dec_ECG) k=1; 1. Compute first header from first nb bits received. rev_hd=binary2decimal(bin_seq(k:k+nb-1)) Case 1: if rev_hd=2nb-1, i.e the fr elements of block (bi) are zero dec_ECG=fr number of 0’s Case 2: if rev_hd= 2nb-2, all the elements of the block have the same value X which is computed from the next nb bits of bin_seq. dec_ECG=fr number of X’s where X is generated from next nb bits of bin_seq. otherwise Case3: All the elements of the block are different and encoded with nb_bi bits where nb_bi is computed from the next nb bits of bin_seq. dec_ECG=binary2decimal(bin_seq(k:k+rev_hd-1)) Based on the above algorithm, 1D residual beat is recovered (r_ECG1D). 2. Apply signs according to sign_mat to the recovered decimal sequence (r_ECG1D). 3. Using the R-peaks locations (loc_Rpeaks) convert 1D (r_ECG1D) into 2D (r_ECGR). 4. Follow the reverse procedure to regenerate r_ECGD(m+1)= r_ECGD (m)+ r_ECGR (m). 5. Calculate final amplitude as rev_amp=ECGav + r_ECGD 6. Obtain decompressed ECG by converting rev_amp into 1D (rECG).
Quality Score (QS) is the rational metric that considers both CR and distortion to measure the performance and is measured as the ratio of CR and PRD. Since the signal is reconstructed completely without any loss, therefore both statistical and clinically acceptable distortions are zero and the QS comes out to be infinite. The CR obtained on all the 48 records of the MIT-BIH arrhythmia database of 2 min duration
354
B. Singh et al.
Table 1 Analysis on 48 records of MIT-BIH arrhythmia database of 2 min duration Record number and CR Record
CR
Record
CR
Record
CR
100
3.16
118
2.99
214
2.83
101
3.07
119
2.96
215
2.78
102
3.03
121
3.43
217
3.44
103
2.98
122
3.65
219
3.52
104
2.65
123
3.74
220
3.24
105
3.45
124
3.22
221
3.10
106
3.44
200
3.33
222
3.23
107
3.23
201
3.21
223
2.93
108
2.89
202
3.10
228
2.87
109
3.02
203
3.12
230
3.56
111
3.13
205
2.54
231
3.32
112
3.15
207
2.85
232
2.91
113
2.78
208
3.45
233
2.98
114
2.98
209
3.5
234
3.42
115
3.11
210
2.91
Avg
3.14
116
3.09
212
3.47
117
2.77
213
2.96
Average (Avg) is the average CR of all the records
are shown on Table 1. The average CR obtained with the proposed technique is 3.14 which is comparable with the existing state of the art techniques.
3.1 Comparison with Existing Approaches The performance of the proposed lossless ECG compression technique is compared with the existing state of the art techniques. In all these techniques, the compression is performed in time domain to achieve purely lossless compression with zero distortion. It is clearly displayed in Table 2 that CR achieved by the proposed technique is higher than the existing techniques.
Redundancy Reduction and Adaptive Bit Length Encoding-Based …
355
Table 2 The comparison of the proposed compression technique with the existing techniques Sr. No
Technique used
Database used
CR
1.
Anti-dictionary codes for finite alphabets [20]
10 selected records from Mita-db
0.27
2.
Delta coding + optimal selective Huffman coding[10]
record 100 of Mita-db
3.10
3.
Integrated VLSI-based biosensor [21]
48 records of Mita-db
2.38
4.
Adaptive Predictor + two-stage entropy encoder-based Huffman coding [22]
48 records of Mita-db
2.43
5..
Adaptive linear data prediction [24]
48 records of Mita-db
2.28
6.
Proposed technique (Redundancy reduction + ABLE)
48 records of Mita-db
3.14
4 Conclusion The redundancy reduction-based purely lossless 2D ECG compression technique in time domain is proposed. The compressed data is encoded with the newly developed ABLE technique which further performs lossless compression. The technique assures the complete reconstruction of the ECG signal without any distortion in its diagnostic features. Experimental results showed that the proposed approach achieved an average CR of 3.14 from all 48 records of MIT-BIH arrhythmia database with maximum and minimum CR obtained are 3.74 and 2.54 on ECG records 123 and 205, respectively, which are quite better than the existing lossless techniques. In the present work, the column length of residual matrix is considered as a block size ( fr) to implement ABLE technique. The future work plan is to optimize the selection of block size ( fr) to further improve the CR.
References 1. Berkaya SK, Uysal AK, Gunal ES, Ergin S, Gunal S, Gulmezoglu MB (2018) A survey on ECG analysis. Biomed Signal Process Control 43:216–235 2. Calandra-Buonaura G, Provini F, Guaraldi P, Plazzi G, Cortelli P (2016) Cardiovascular autonomic dysfunctions and sleep disorders. Sleep Med Rev 26:43–56 3. Floras JS (2014) Sleep apnea and cardiovascular risk. J Cardiol 63(1):3–8 4. Tiwari A, Falk TH (2019) Lossless electrocardiogram signal compression: a review of existing methods. Biomed Signal Process Control 51:338–346 5. Manikandan MS, Dandapat S (2014) Wavelet-based electrocardiogram signal compression methods and their performances: a prospective review. Biomed Signal Process Control 14(1):73–107 6. Kumar V, Saxena SC, Giri VK (2006) Direct data compression of ECG signal for telemedicine. Int J Syst Sci 37(1):45–63
356
B. Singh et al.
7. Pandey A, Singh B, Saini BS (2021) Electrocardiogram data compression techniques in 1D/ 2D domain. Biomed Eng Appl Basis Commun 33(2):2150011 8. Cox JR, Nolle FM, Fozzard HA, Oliver GC (1968) AZTEC, a preprocessing program for real-time ECG rhythm analysis. IEEE Trans Biomed Eng BME-15(2):128–129 9. Akhter S, Haque MA (2010) ECG compression using run length encoding. In: 2010 18th European signal processing conference. IEEE, pp 1645–1649 10. Chang GC, Lin YD (2010) An efficient lossless ECG compression method using delta coding and optimal selective Huffman coding. In: 6th World congress of biomechanics (WCB 2010), Singapore. IFMBE Proceedings, vol 31. Springer, Berlin, Heidelberg, pp 1327–1330 11. Hameed ME, Ibrahim MM, Manap NA, Mohammed AA (2019) A lossless compression and encryption mechanism for remote monitoring of ECG data using Huffman coding and CBCAES. Futur Gener Comput Syst 111:829–840 12. Hameed ME, Ibrahim MM, Manap NA, Mohammed AA (2020) An enhanced lossless compression with cryptography hybrid mechanism for ECG biomedical signal monitoring. Int J Electr Comput Eng 10(3):3235–3243 13. Sayood K (2018) Lossless image compression. In: Introduction to data compression. Elsevier Inc., pp 187–220 14. Kortman CM (1967) Redundancy reduction—a practical method. Proc IEEE 55(3):253–263 15. Peric Z, Denic D, Nikolic J, Jocic A, Jovanovic A (2013) DPCM quantizer adaptation method for efficient ECG signal compression. J Commun Technol Electron 58(12):1241–1250 16. Ibaida A, Al-Shammary D, Khalil I (2014) Cloud enabled fractal based ECG compression in wireless body sensor networks. Futur Gener Comput Syst 35:91–101 17. Lu Z, Kim DY, Pearlman WA (2000) Wavelet compression of ECG signals by the set partitioning in hierarchical trees algorithm. IEEE Trans Biomed Eng 47(7):849–856 18. Abo-Zahhad MM, Abdel-Hamid TK, Mohamed AM (2014) Compression of ECG signals based on DWT and exploiting the correlation between ECG signal samples. Int J Commun Netw Syst Sci (IJCNS) 7(1):53–70 19. Benzid R, Messaoudi A, Boussaad A (2008) Constrained ECG compression algorithm using the block-based discrete cosine transform. Digit Signal Process 18(1):56–64 20. Takahiro OTA, Hiroyoshi M (2010) On-line electrocardiogram lossless compression using antidictionary codes for a finite alphabet. IEICE Trans Inf Syst E93-D(12):3384–3391 21. Chua E, Fang W-C (2011) Mixed bio-signal lossless data compressor for portable brain-heart monitoring systems. IEEE Trans Consum Electron 57(1):267–273 22. Chen S-L, Wang J-G (2013) VLSI implementation of low-power costefficient lossless ECG encoder design for wireless healthcare monitoring application. Electron Lett 49(2):91–93 23. Mukhopadhyay SK, Mitra S, Mitra M (2011) A lossless ECG data compression technique using ASCII character encoding. Comput Electr Eng 37(4):486–497 24. Deepu CJ, Lian Y (2015) A joint QRS detection and data compression scheme for wearable sensors. IEEE Trans Biomed Eng 62(1):165–175 25. Lee H, Buckley KM (1999) ECG data compression using cut and align beats approach and 2-D transforms. IEEE Trans Biomed Eng 46(5):556–564 26. Pan J, Tompkins WJ (1985) A real-time QRS detection algorithm. IEEE Trans Biomed Eng BME-32(3):230–236 27. Pandey A, Saini BS, Singh B, Sood N (2016) A 2D electrocardiogram data compression method using a sample entropy-based complexity sorting approach. Comput Electr Eng 56:30–45 28. Retrieved from https://www.physionet.org/cgi-bin/atm/ATM 29. Manikandan MS, Dandapat S (2007) Wavelet energy based diagnostic distortion measure for ECG. Biomed Signal Process Control 2(2):80–96
Antecedents, Barriers, and Challenges of Artificial Intelligence Adoption for Supply Chains: A Tactical Review Kalya Lakshmi Sainath
and Lakshmi Devasena C
Abstract Industries are moving toward digitizing their process to meet customeroriented factors like identifying preferences, customizing products, producing on-time, on-time delivery, satisfaction, feedback mechanisms, etc. Espousal of technology by one company in an industry pushes its competitors to acclimatize the same and endure in the markets. Of all the technological advancements, artificial intelligence is one of the dynamic forces to adopt and has a noteworthy application across industries. Conversely, though there is potential to apply it to supply chain practices, many firms need to instrument and use AI. The objective of the study is to advocate how artificial intelligence is changing supply chains and the detailed analysis of its Antecedents (A), Barriers (B), and Challenges (C) that need to be considered in the adoption of AI. Fewer applications on the supply chain were reconnoitered to make firms apprehend the outcomes of post-implementation of AI and provide a roadmap to the industries. Keywords AI adoption in supply chain · ABC of AI adoption · Antecedents of AI adoption · ABC of AI adoption in supply chains
1 Introduction Supply chains are progressively spending to enrich their performance to manage unexpected disruptive events to continue the flow of critical elements [1]. Artificial intelligence is a most encouraging area of technology that can help us make data-driven decisions, which helps supply chain managers make more proactive, more intelligent decisions. AI relies on using tools, methods, and algorithms to allow cognitive thinking like humans to complete complicated tasks. As the cost K. L. Sainath (B) · C. Lakshmi Devasena IBS Hyderabad, IFHE University, Hyderabad, India e-mail: [email protected] C. Lakshmi Devasena e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 689, https://doi.org/10.1007/978-981-99-2322-9_26
357
358
K. L. Sainath and Lakshmi Devasena C.
of computing hardware and specialized AI software continues to drop, the field is becoming more accessible and adaptable. The interplay between several industries, most notably the marketing, logistics [2], and manufacturing spheres, makes supply chain management a demanding profession. The supply chain is an intricate and comprehensive notion that spans the complete manufacturing and distribution routes to fulfill consumer demand and increase responsiveness [3]. The success of the supply chain reflects in the overall business performance, defined as the success of the business. Moreover, consistent changes in business practices in the volatile and competitivedriven measures are taking place [4]. AI builds intelligent systems capable of learning, adapting, and performing tasks. Firms are concerned with storing, analyzing, and evaluating data as the best output of information, which are more intelligent than information systems. Technological advancements are pushing to adopt different tools like AI, blockchain, and other data-driven tools to mitigate those challenges. AIenhanced machines can do more, including pattern recognition, data collection, and synthesis, assisting with creating various conclusions, and predictive modeling [5, 6]. AI can improve operational efficiencies through digital operations; it helps bring accurate forecasting, optimize techniques, lower the manufacturer cost with higher quality, and identify demographics to provide customers with the best experience [7]. Artificial intelligence aims to “construct rational beings that can see and behave so that some objective function is optimal”. “Advancing the scientific knowledge of the processes underlying cognition and intelligent behavior and their embodiment in computers”, says the association for the development of artificial intelligence. One of the most promising digital supply chain transformation areas is using AI in supply chain management, which might lead to more efficient inventory management and other benefits [8]. AI applications have improved supply chain efficiency in warehousing operations, demand planning, supplier relationship management, and transport optimization, indicating that AI has become one of the most significant contributors to digitalization. It has a positive impact on cost reduction, revenue growth, and market growth [9–11].
2 Review Literature 2.1 Supply Chain Management Supply chain management (SCM) can be described as a network of facilities that instigate raw materials, transmute them into intermediate goods, and finally produce finished goods. It consists of the acquisition, production, and distribution. The main aim of supply chain management is to maximize supply chain performance at the lowest possible cost. In other words, it tries to connect all partners or organizations throughout the supply chain, so they may work together to maximize supply chain productivity. In a competitive environment, companies are adopting supply chain
Antecedents, Barriers, and Challenges of Artificial Intelligence …
359
methods. The resource dependence theory is vital for optimizing supply network resources for efficiency and productivity, and the network environment helps build knowledge of all parties’ operational requirements to sustain connections [12]. SCM can be described as managing upstream and downstream supplier and customer interactions to generate higher customer value at lower supply chain costs. The supply chain aims to meet consumer demand, increase responsiveness, and develop a network among varied stakeholders [13].
2.2 Artificial Intelligence Humans may now use AI to think strategically and create comprehensive solutions. Moreover, it’s success will depend on their ability to steer clear of pitfalls and directly influence evolving operations and organizational variables. The enthusiasm produced by the promise of big data analytics based on artificial intelligence among enterprises in emerging countries regarding its use and potential advantages is due to issues including lack of top management engagement and neglecting consumers [14–16].
3 ABC of Artificial Intelligence 3.1 Antecedents (A) There are two types of forerunners: technical and informational. From a business’s point of view, technical viability is based on several elements, all of which may be considered antecedents. The degree to which a new artificial intelligence technology is ready for effective adoption and customized to the organization’s unique needs is a function of its maturity, security, and compatibility. System compatibility and integration is another area that ensures that technology is compatible with the systems that exist at present. Cybersecurity is the practice of keeping computer networks safe from intrusion. Supply chain informational antecedents encompass all the characteristics that facilitate the timely availability of accurate and trustworthy data at every stage of the supply chain. Quality data implies that data needs influence efficient target-oriented accessibility that may aid in effective automation. Data exchange reveals the degree to which businesses or supply chain partners can and are willing to share information using available technologies [17–21]. Table 1 lists the antecedents of AI adoption which is describing the different factors that are driving for the AI adoption and these are categorized as people, process and technology which are the key areas to act as antecedents.
360 Table 1 Antecedents of AI adoption
K. L. Sainath and Lakshmi Devasena C.
Category
Factor
People
Culture
[9]
Inter-organizational collaboration
[22]
Process
Technology
Sources
Trust
[22]
Business alignment
[23–26]
Information sharing
[23–26]
Social influence
[27]
Collaborative planning system
[23–26]
Connectivity
[28]
3.2 Barriers (B) The adoption of AI is hindered by several factors, including technology, economics, and costs; maintenance; the need for assistance; a dearth of valuable data; and a need for more broadly applicable data infrastructures. Increased reliance on nonhumans, employment insecurity, misunderstanding of possible advantages, lack of confidence, and difficulties engaging stakeholders are all costs associated with social barriers. In the context of management, lack of understanding of the strategic importance of industry 4.0, financial and human resources, companies focus on operational expenses, the threat of data and the workforce requires education, employee readiness, and understanding of technology and human beings [4, 29–32]. Table 2 lists the barriers to AI adoption in which cost is the primary factor that firms think on the development of any technology and before making any decision of technology adoption it needs proper infrastructure support and the inefficient of knowledge may leads to unsuccessful implementation technology but the trust among the partners or people in a firm may create a risk which is a barrier to the adoption of AI.
3.3 Challenges (C) Adopting AI has many challenges; however, the key challenges are technological, data, ethical, organizational, and managerial related aspects. Only some studies have identified that the benefits and challenges related to data are significantly balanced. The different sets of data an organization uses are of high dimensionality, and several data challenges are surrounded by data integrity. Studies suggest that the use of more sophisticated technology where human and AI systems are enhanced, and aligned information flow tells the key challenges that lack strategy toward the implications of AI, which could impact the company goals from a workforce perspective. However, there still needs to be a more efficient understanding of AI outcomes in organizational and management contexts. There is a technological barrier to entry when artificial intelligence systems struggle to comprehend human context and experience and the
Antecedents, Barriers, and Challenges of Artificial Intelligence … Table 2 Barriers to AI adoption
Category
Factor
Cost
AI requires human input to develop
361
Data labeling may be a very costly endeavor Infrastructure
Infrastructure support is necessary for large-scale deployment Large training datasets are available AI’s inability to interpret unstructured data
Knowledge
Insufficient knowledge of AI’s potential capabilities
Safety
Potential safety risks that might endanger people
Trust
Untrustworthy of technology
Jobs
Insecurity of jobs
Unrealistic expectations of technology
requisite infrastructure to do so. Concerns that ethical problems are not addressed within how legal laws are created and amplified by the evolution and progress of AI technologies; thus, appropriate policies, regulations, and legal frameworks must be developed to forestall the abuse of the technology [18, 23, 26, 33–36].
4 Application of AI in Supply Chains 4.1 AI Helps in Delivery Due to the increase in customer-driven businesses, the change in delivery mechanisms like as two hours delivery, etc., are making it convenient for the customer. Based on the previous mechanisms, AI monitors and suggests the buying patterns of customers whether to pick the material or not; in the same context, the evolution of drone deliveries and tech-based applications drastically changed the delivery methods to compete in the market [23, 26, 37, 38].
4.2 Smart Manufacturing with AI The application of AI in production helps reduce cycle times, improve efficiency, prevent defects, and automate risky activities; it helps reduce inventory costs with accurate demand planning and forecasting price optimizations. The real-time indicators of manufacturers are optimizing the traditional patterns where there is a platform
362
K. L. Sainath and Lakshmi Devasena C.
to adapt to the changes in demand. AI has brought changes in the plants around the global supply chains, which made all the entities interconnected and collaborative. The elimination of manual activities and the prediction of business trends, optimizing the warehouse and logistic costs, and fulfilling orders and delivering goods without storage of the material [2, 3, 27, 39].
4.3 AI in Inventory Control AI techniques have a new approach to inventory control and planning errors with the significant and influential complexity of capturing information in capturing inventory patterns in the supply chain. The dynamic stock storage methods can estimate the desirable inventory level at each point or rack by the managers without causing profound fluctuations in demand, which may lead to the bullwhip effect [40, 41].
4.4 AI in Transportation Network Design Problems like vehicle routing, scheduling, freight consolidation, intermodal connectivity, road network design, traffic consignment, space utilization, etc., have emerged, and the nature of these problems has changed with the implementation of algorithms and heuristics. [12, 40, 42, 43].
4.5 AI in Procurement Expert systems assist the purchasing managers in evaluating the decision. It suggests whether they must make or buy, evaluate suppliers, automate searching prospective suppliers, online catalogs, and supplier management. It includes identifying attributes, screening possible suppliers, and creating and expediting the purchase orders has become simple with AI [33].
4.6 AI Helps to Forecast Demand and Optimization It helps identify patterns and trends, which informs production plans. It offers precise and reliable demand forecasts, allowing firms to improve sourcing-related items like buying and order processing, lowering costs [44]. Figure 1 shows the AI tools and their applications in the supply chain.
Antecedents, Barriers, and Challenges of Artificial Intelligence …
363
Fig. 1 AI tools and their applications in supply chain
Organizational structure, employee development, and management of transitions are all crucial ancillary tasks that businesses must do. Companies must spend money on change management and capacity development to guarantee that innovative solutions will be adopted. Workers need to accept new methods of operation, which calls for a concerted push to inform them of the need for the changes and incentivize the right kinds of actions [29]. Table 3 highlights the potential advantages that the adoption of AI that are surrounded by cost reductions in the different processes across the domains, improving the responsiveness of the supply chains like reducing lead time, manufacturing waste and helps to enhance the flexibility in the process. Third, quality parameter which is more on the reduction of supply chain risk that is associated during the uncertainties and helping to make continuous improvements in the existing process.
364
K. L. Sainath and Lakshmi Devasena C.
Table 3 Advantages of implementing digital tools like AI Components
Benefits
Source
Cost Reductions Reduces the necessity for stocking up on massive quantities at once [30] Lowers transportation expenses Reduces the cost of tooling Lowers assembly costs Removes the redesign penalty A commercial lot’s size is reduced Simplifies manufacturing procedures Responsiveness
Reduces lead time
[30]
Allows for on-demand manufacturing Enhances process flexibility Reduces manufacturing waste Quality
Improves quality
[30]
Incorporates customer feedback Manages demand uncertainty Reduces supply chain risk, which improves performance
5 Conclusion AI is a practical decision-oriented tool that associates firms with their suppliers, customers, and supply chain partners with information exchange among their business entities. There was minor use of AI during an emergency due to the cost, difficulty to adopt, and use. Research on artificial intelligence in supply chain management still needs to be completed. This study affords insight into the field of AI adoption and the Antecedents, Barriers, and Challenges associated with it by revealing the potential it is associated with. The significance of artificial intelligence (AI) and the progression of the organization in a modest environment should be brought to managers’ attention. AI and the organization’s growth can help to cultivate any “Y” factor allied with increased productivity, which can be accomplished by augmenting and automating business processes in a firm. This can also lead to developing robust relationships with supply chain entities and business partners.
References 1. Milaˇci´c VR, Miler A (1986) Artificial intelligence—morphological approach as a new technological forecasting technique. Int J Prod Res 24(6):1409–1425. https://doi.org/10.1080/002 07548608919812 2. Min H (2010) Artificial intelligence in supply chain management: theory and applications. Int J Log Res Appl 13(1):13–39. https://doi.org/10.1080/13675560902736537
Antecedents, Barriers, and Challenges of Artificial Intelligence …
365
3. Belhadi A, Kamble S, Wamba SF, Queiroz MM (2022) Building supply-chain resilience: an artificial intelligence-based technique and decision-making framework. Int J Prod Res 60(14):4487–4507. https://doi.org/10.1080/00207543.2021.1950935 4. Poppe R (2010) A survey on vision-based human action recognition. Image Vis Comput 28(6):976–990. https://doi.org/10.1016/j.imavis.2009.11.014 5. Klumpp M, Ruiner C (2018) Regulation for artificial intelligence and robotics in transportation, logistics, and supply chain management: background and developments. Netw Ind Q 20(2):3–7 6. Srinivasan R, Swink M (2018) An investigation of visibility and flexibility as complements to supply chain analytics: an organizational information processing theory perspective. Prod Oper Manag 27(10):1849–1867 7. Okponyia KO, Oke SA (2021) Process optimisation regarding overall equipment effectiveness of tyre manufacturing using response surface methodology and grey relational analysis. Eng Access 7(2):109–125 8. Damerji H, Salimi A (2021) Mediating effect of use perceptions on technology readiness and adoption of artificial intelligence in accounting. Acc Educ 30(2):107–130. https://doi.org/10. 1080/09639284.2021.1872035 9. Doetzer M (2020) The role of national culture on supply chain visibility: lessons from Germany, Japan, and the USA. Int J Prod Econ 230:107829 10. Bhalerao K, Kumar A, Kumar A, Pujari P (2022) A study of barriers and benefits of artificial intelligence adoption in small and medium enterprises. Acad Mark Stud J 26(1):1–6 11. Seyedghorban Z, Tahernejad H, Meriton R, Graham G (2020) Supply chain digitalization: past, present and future. Prod Plann Control 31(2–3):96–114. https://doi.org/10.1080/09537287. 2019.1631461 12. Guo Z-H, Wu J, Lu H-Y, Wang J-Z (2011) A case study on a hybrid wind speed forecasting method using BP neural network. Knowl-Based Syst 24(7):1048–1056. https://doi.org/10. 1016/j.knosys.2011.04.019 13. Garg D, Agarwal A, Shukla RK (2011) Understanding of supply chain: a literature review. Int J Eng Sci Technol (IJEST) 3(3):2059–2072 14. Collins C, Dennehy D, Conboy K, Mikalef P (2021) Artificial intelligence in information systems research: a systematic literature review and research agenda. Int J Inf Manag 60:102383. https://doi.org/10.1016/j.ijinfomgt.2021.102383 15. Kalyanakrishnan S, Panicker RA, Natarajan S, Rao, S (2018) Opportunities and challenges for artificial intelligence in India. In: AIES 2018: Proceedings of the 2018 AAAI/ACM conference on AI, ethics, and society, pp 164–170. https://doi.org/10.1145/3278721.3278738 16. Succeeding in the AI Supply chain revolution (2021) In: McKinsey & Company. Retrieved from https://www.mckinsey.com/industries/metals-and-mining/our-insights/succee ding-in-the-ai-supply-chain-revolution/. Accessed on 14 Jul 2022 17. Foster MN, Rhoden SLNH (2020) The integration of automation and artificial intelligence into the logistics sector: a Caribbean perspective. Worldwide Hospitality Tourism Themes 12(1):56–68. https://doi.org/10.1108/WHATT-10-2019-0070 18. Grover P, Kar AK, Dwivedi YK (2022) Understanding artificial intelligence adoption in operations management: insights from the review of academic literature and social media discussions. Ann Oper Res 308(1):177–213 19. Hofmann E, Sternberg H, Chen H, Pflaum A, Prockl G (2019) Supply chain management and Industry 4.0: conducting research in the digital age. Int J Phys Distrib Logistics Manag 49(10):945–955 (2019). Emerald Group Holdings Ltd. https://doi.org/10.1108/IJPDLM-112019-399 20. Nozari H, Szmelter-Jarosz A, Ghahremani-Nahr J (2022) Analysis of the challenges of Artificial Intelligence of Things (AIoT) for the smart supply chain (case study: FMCG industries). Sensors 22(8):2931. https://doi.org/10.3390/s22082931 21. Olan F, Liu S, Suklan J, Jayawickrama U, Arakpogun EO (2021) The role of Artificial Intelligence networks in sustainable supply chain finance for food and drink industry. Int J Prod Res 60(4):1–16. https://doi.org/10.1080/00207543.2021.1915510
366
K. L. Sainath and Lakshmi Devasena C.
22. Soltani ZK (2021) The applications of artificial intelligence in logistics and supply chain. Turk J Comput Math Educ 12(13):4488–4499 23. Dubey R, Gunasekaran A, Childe SJ, Bryde DJ, Giannakis M, Foropon C, Roubaud D, Hazen BT (2020) Big data analytics and artificial intelligence pathway to operational performance under the effects of entrepreneurial orientation and environmental dynamism: a study of manufacturing organisations. Int J Prod Econ 226:107599. https://doi.org/10.1016/j.ijpe.2019. 107599 24. Dubey R, Altay N, Gunasekaran A, Blome C, Papadopoulos T, Childe SJ (2018) Supply chain agility, adaptability and alignment: empirical evidence from the Indian auto components industry. Int J Oper Prod Manag 38(1):129–148 25. Dubey R, Gunasekaran A, Childe SJ, Papadopoulos T, Blome C, Luo Z Antecedents of resilient supply chain: an empirical study. IEEE Trans Eng Manag 66(1):8–19. https://doi.org/10.1109/ TEM.2017.2723042 26. Dubey R, Gunasekaran A, Childe SJ, Papadopoulos T, Luo Z, Roubaud D (2020) Upstream supply chain visibility and complexity effect on focal company’s sustainable performance: Indian manufacturers’ perspective. Ann Oper Res 290:343–367 27. Saßmannshausen T, Burggräf P, Wagner J, Hassenzahl M, Heupel T, Steinberg F (2021) Trust in artificial intelligence within production management–an exploration of antecedents. Ergonomics 64(10):1333–1350 28. Wamba SF, Lefebvre LA, Bendavid Y, Lefebvre É (2008) Exploring the impact of RFID technology and the EPC network on mobile B2B eCommerce: a case study in the retail industry. Int J Prod Econ 112(2):614–629 29. Wong L-W, Tan GW-H, Ooi K-B, Lin B, Dwivedi YK (2022) Artificial intelligence-driven risk management for enhancing supply chain agility: a deep-learning-based dual-stage PLS-SEMANN analysis. Int J Prod Res, pp 1–21. https://doi.org/10.1080/00207543.2022.2063089 30. World Trade Organization (2013) Global value chains in a changing world. In: Elms DK, Low P (eds) World trade organization. Retrieved from https://www.wto.org/english/res_e/booksp_ e/aid4tradeglobalvalue13_e.pdf. Accessed on 14 Dec 2019 31. Nishant R, Kennedy M, Corbett J (2020) Artificial intelligence for sustainability: challenges, opportunities, and a research agenda. Int J Inf Manag 53:102104. https://doi.org/10.1016/j.iji nfomgt.2020.102104 32. Sanders NR, Boone T, Ganeshan R, Wood JD (2019) Sustainable supply chains in the age of AI and digitization: research challenges and opportunities. J Bus Logistics 40(3):229–240 (2019). Wiley-Blackwell. https://doi.org/10.1111/jbl.12224 33. Cui R, Li M, Zhang S (2020) AI and procurement. Manuf Serv Oper Manag, pp 1–36. In: SSRN. Retrieved from https://ssrn.com/abstract=3570967 (or) http://dx.doi.org/10.2139/ssrn. 3570967 34. Wamba SF, Queiroz MM, Guthrie C, Braganza A (2022) Industry experiences of artificial intelligence (AI): benefits and challenges in operations and supply chain management. Prod Plann Control 33(16):1493–1497. https://doi.org/10.1080/09537287.2021.1882695 35. Nitsche B, Straube F, Wirth M (2021) Application areas and antecedents of automation in logistics and supply chain management: a conceptual framework. Supply Chain Forum Int J 22(3):223–239. https://doi.org/10.1080/16258312.2021.1934106 36. Nuseir MT, Basheer MF, Aljumah A (2020) Antecedents of entrepreneurial intentions in smart city of Neom Saudi Arabia: does the entrepreneurial education on artificial intelligence matter? Cogent Bus Manag 7(1825041):1–16. https://doi.org/10.1080/23311975.2020.1825041 37. Dash R, McMurtrey M, Rebman C, Kar UK (2019) Application of artificial intelligence in automation of supply chain management. J Strateg Innov Sustain 14(3):43–53 38. Hasan R, Kamal MM, Daowd A, Eldabi T, Koliousis I, Papadopoulos T (2022) Critical analysis of the impact of big data analytics on supply chain operations. Prod Plann Control, pp 1–25. https://doi.org/10.1080/09537287.2022.2047237 39. Kar AK, Kushwaha AK (2021) Facilitators and barriers of artificial intelligence adoption in business—insights from opinions using big data analytics. Inf Syst Front. https://doi.org/10. 1007/s10796-021-10219-4
Antecedents, Barriers, and Challenges of Artificial Intelligence …
367
40. Zhang D, Pee LG, Cui L (2021) Artificial intelligence in E-commerce fulfillment: a case study of resource orchestration at Alibaba’s smart warehouse. Int J Inf Manag 57:102304. https:// doi.org/10.1016/j.ijinfomgt.2020.102304 41. Praveen U, Farnaz G, Hatim G (2019) Inventory management and cost reduction of supply chain processes using AI based time-series forecasting and ANN modeling. Procedia Manuf 38:256–263. https://doi.org/10.1016/j.promfg.2020.01.034 42. Attaran M (2020) Digital technology enablers and their implications for the supply chain management. Supply Chain Forum Int J 21(3):158–172. Taylor and Francis Ltd. https://doi. org/10.1080/16258312.2020.1751568 43. Moradpour S, Long S (2017) K-mean clustering in transportation: a work zone simulator case study. In: 38th International annual conference of the American society for engineering management (ASEM 2017), pp 138–142 44. Wu Q, Xu G, Chen L, Luo A, Zhang S (2017) Human action recognition based on kinematic similarity in real time. PLoS ONE 12(10(e0185719)):1–15
Statistical Influence of Parameters on the Performance of SDN Deepjyot Kaur Ryait and Manmohan Sharma
Abstract Today, the most promising technology is Software Defined Networking (SDN) which dissociates the data plane from the control plane. The complete logic control is moved to an entity that provides a global view of the network is known as the SDN controller. A centralized controller manages all responsibilities of the network then it becomes very difficult to accomplish their responsibilities simultaneously. It is a very imperative characteristic of networking to monitor the network traffic in real-time. For this purpose, a queuing technique that effectively uses the global view of the network to control the congestion of the network is proposed. Moreover, it provides the facility to determine packet delivery ratio along with the various queue objects. To determine what is the significance of queue size to control packet drop rate in the network. These parameters enhance the quality-of-service (QoS) of a network. This paper provides a purview of the queuing technique for managing and monitoring load balancing in SDN controllers based on the probability of various parameters which help to evaluate the performance of controllers. Furthermore, the variation of the Packet Successfully Delivered in the network can be statistical analysis through analytical tools. Keywords Load balance · Packet delivery ratio · Correlation matrix · Multiple regression model
1 Introduction Network communication has a great influence on the life of a human being due to the evolution of the Internet. An efficient communication system is required to accomplish the needs of a user. But still, at present technology is undergoing several D. K. Ryait (B) · M. Sharma School of Computer Applications, Lovely Professional University, Phagwara, India e-mail: [email protected] M. Sharma e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 689, https://doi.org/10.1007/978-981-99-2322-9_27
369
370
D. K. Ryait and M. Sharma
changes; recently introduced a technology that attracts more attention in academic and industrial areas is the Software Defined Network. It aims to create an environment of network with efficiency, scalability, adaptability, flexibility, reliability, etc. Every commercial enterprise is willing to earn maximum profit in the network field by offering various communication facilities. Any failure that occurs during the communication of the network then results in a heavy loss of revenue. So, it is essential to increase the availability of networks and minimize revenue loss [1]. While an innovation in the networking field is negligible in the past decade. The wider adoption of the traditional network is more complex and harder to manage. Because it is a toughest job to configure the network devices due to its predefined policies. To overcome this situation, by restructuring the traditional network infrastructure. Thus, Software Defined Networks (SDN) are invented; it is an important technology that offers a lot of potential uses in a field by increasing the efficiency of the network [2]. Software Defined Networking helps to reduce the complexity of network management. Companies including Google and Amazon have become a hub of this revolutionary concept. It provides several facilities like programmability, adjustability, and dynamically reconfiguration of the networking elements. Currently, the Software Defined Networking paradigm is supported by large industries such as Microsoft, Google, Cisco, Facebook, HP, IBM, and Samsung. Software Defined Network (SDN) is defined “by decoupling/disassociating the control plane from the data plane in a network”. It is an emerging architecture of the network that offers programmability, easily manageable and adaptable, dynamic configuration of network elements, control, and optimization of network resources in a cost-effective way. It happened due to the separation of network control and forwarding plane. SDN enables networks to be directly connected to applications via application programming interfaces (APIs). SDN is an architecture designed to make a network more flexible and easier to manage due to centralizing management of control which abstracts the control plane from the data forwarding function in networking devices. Thus, to create a flexible, dynamic network architecture that can be changed as needed. The framework of Software Defined Networks offers an abstraction network, which becomes too easy to achieve network reachability. The various characteristics of Software Defined Networks are listed below: 1. Reduce complexity of the network by decoupling the control plane from the data plane. 2. Network Intelligence is logically centralized in SDN controllers to maintain a global view of the network. 3. Easily deploy applications and services through APIs. 4. Network control is directly programmable due to the separation of functional components. 5. Substantially reduce manual configuration in the network. 6. A readily programmable network eliminates manual configuration. 7. SDN is an agile model which abstracts the control from the forwarding function to adjust dynamic changes. 8. Due to centralized control, forwarding elements can be configured at scale.
Statistical Influence of Parameters on the Performance of SDN
371
9. Controller interacts with network elements through APIs. 10. Easy to achieve network reachability through SDN as compared to the traditional networks. 11. Vendor Neutrality is achieved in SDN networks because instructions are provided by SDN controller. 12. Open standard-based is attained/accomplished in SDN by supporting standard protocols for communication between devices from multiple vendors and maintaining a common software environment.
1.1 Traditional Network versus Software Defined Network Software Defined Networking is the most popular way to deploy applications in a network by organizations with a faster rate and decrease the overall cost of deployment. It offers an option to upscale its infrastructure in a network with minimal disturbance. But in a traditional network, there is a strong interdependency between both functional components and the non-existence of programmability. Thus, it is not coping with modern enterprise user demand. When reconfiguring these traditional network devices it results in a fault, imbalance, or alter the configuration of these devices which surges the complexity of the network [2, 3]. So, it is a tedious and complex task to add new functionality in traditional network devices as the change in the topology of a network is uneconomical; because of the lack of programmability in a traditional network. Moreover, if you try to make any modification in the control plane then all network devices need to be mounted with new firmware or upgrade their hardware devices. The main difference between both networks is that Software Defined Network is based on a software-based network rather than hardware; whereas a traditional network mainly relies on physical infrastructure such as switches, and routers to make a communication connection between them. In contrast, SDN is based on software that allows the user to control resources through a control plane rather than interacting with physical infrastructure. The reason behind this rigidity of traditional networks is the vertical integration of both functional components which are bonded together inside the forwarding elements [2]. While a failure of links increased the inflexibility of a network due to the loss of perceptibility of operators over their network. The reason for this is that switches behave like a black box and are written by multiple vendors which prevents the network operators from modifying their implementation to satisfy the requirements of customers. Thus, traditional networks cause many difficulties to handle the data transmission in the network. The difference between both networks is shown in Table 1.
372
D. K. Ryait and M. Sharma
Table 1 Traditional networks versus Software Defined Networks Traditional networks
Software Defined Networks (SDN)
Hardware-based Network
Software-based Network
A strong bond exists between both functional components
In SDN, a strong bond does not exist between both functional components
Traditional networks maintain a routing table in SDN maintains a flow table for every switch every switch Lack of programmability in a network
Increase programmability of the network
Do not provide centralized control of the network
The controller provides centralized control of an entire network
Rigidity of adaptability in the network due to a tightly coupled bond exists in functional components
More adaptability is offered in the network due to a centralized control
Do not provide network virtualization in traditional networks
Network virtualization provided in SDN
Do not provide vendor neutrality. It works in the vendor-specification environment
It’s provided vendor neutrality
Little flexibility is provided as compared to SDN networks
More flexible than traditional networks due to programmability
The maintenance cost is higher
The maintenance cost is less
The operational and capital cost is higher
The operational and capital cost is less
1.2 Architecture of Software Defined Network The Software Defined Network provides a novel paradigm for networking that delinks the data plane from the control plane. In this paradigm mainly three distinct planes are included such as data plane, control plane, and application plane. These planes follow a bottom-up approach as shown in Fig. 1. All forwarding devices like switches and routers are embedded in the data plane. They act as simple forwarding elements in the network. The working behavior of these elements is controlled by the control plane. It is often called a controller in SDN. It has the responsibility of maintenance and upgrading the topology information of a network. Through programmability, controllers control the network traffic and decrease the complexity of the network. It is done by the controller due to a centralized view of a network. Thus, the SDN controller acts as a “Brain of Network”. The various types of network services or applications are implemented in the application plane. These applications are always running on top of the controller. Communication between these planes is possible through APIs like northbound and southbound APIs [1–6]. Basically, SDN architecture works on these four pillars as shown in Fig. 2. Thus, network innovation provides a position to SDN as the forthcoming of networking. It provides the facility to add new features to the network dynamically in the form of applications. This feature makes the control plane an independent entity. Then the controller makes all the changes in the network as per the requirement of
Statistical Influence of Parameters on the Performance of SDN
373
Fig. 1 Architecture of Software Defined Network
Fig. 2 Four pillars of SDN for working
the user, this happens in the SDN network by segregating the control plane from the data plane as compared to the conventional networks. Several advantages of Software Defined Network provide are: • The SDN provides high flexibility and fast deployment of applications in the network with the help of an application programming interface (through APIs). • Customized the network infrastructure via programmability. • Through programming, the configuration of network elements is done automatically which eliminates manual configuration.
374
D. K. Ryait and M. Sharma
• SDN provides the user with a holistic view of the network to know about the risk of failure or fault occurring in the network. Thus, the developer can easily segregate the affected network element from the network. • The SDN controller makes changes as per requirements in the network because it has full control over the network traffic.
2 Related Work In earlier Simple Network Management Protocol (SNMP) tools are used for just monitoring in traditional networks, but it has many problems like being unable to collect flow statistics information and measuring metrics in the network in terms of loss and delay [7]. Mahjoubi et al. [8] in SDN single controller suffer serious problems like scalability, availability, and single point of failure. To overcome these issues distributed controllers are used. But still, they have to deal with fault tolerance and load balancing challenges among controllers. Mamushiane et al. [9] decouple the control plane from the data plane in SDN poses several challenges regarding scalability, fault tolerance, load balancing, and network performance. To resolve these issues to deploy multiple controller networks; in the future try to integrate both load balancing and fault tolerance into a solution. Xu et al. [10] various strategies are proposed to resolve these issues like load balancing, performance, and robustness of SDN controllers. While multi-controller deployment in SDN by using a queue system; then what is the impact on cost deployment of the controller in the network and how it is minimized in future. Mondal et al. [11] in the future there is a need to analyze the model to ensure quality-of-service with minimum packet drop in the system, reducing waiting time in the system while using TCAM memory in the system. Faraj et al. [12] load balancing strategy is required when congestion and overloading problem has occurred in the network. In the future queue length is utilized for load balancing to reduce congestion in the network while using multiple controllers rather than a single controller. Mondal et al. [13] in this paper proposed a Markov chain-based analytical model in SDN that analysis the performance of packet flow through OpenFlow. A large number of packets drop in the network either table-miss entry or output action is not specified due to high delay. In the future, it is extended with a queue scheme which aids to reduce the delay of the packet flow. Hamdan et al. [14] in SDN load balancing technique is used to improve performance of the network. The key motive of load balancing is minimum response time, proper utilization of resources, the maximum throughput of the system, and avoiding bottleneck problems. During load balancing, several issues arise like controller failure, migration of switch(es), managing load of the controller, resources allocations, synchronization, controller placement, and so on. These open issues provide further research direction in SDN. Rowshanrad et al. [7] a controller can communicate with forwarding elements via OpenFlow which provides flow statistics of a network to a controller for monitoring the network system. In the future, flow statistics information is combined with queueing techniques to optimize network performance in the network. Huang
Statistical Influence of Parameters on the Performance of SDN
375
and Youn [15] compare the architecture of a Software Defined Network with a traditional network. SDN provides several advantages over a traditional network such as programmability, agility, flexibility, and centralized control. To use a proactive approach based on the utilization of a flow table rather than a reactive manner. Then the probability of flow entries matching is maximum. In the future improved the factors of performance by using an analytical model.
3 Proposed Model 3.1 Necessity of Load Balancing in Multiple Controllers In a distributed environment, a multi-controller deployed in a Software Defined Network. When more than one controller is used in a network then there are several challenges encountered such as imbalance between controllers, inappropriate utilization of resources, high rate of a controller failure, increased latency, and decrease in the performance of the network. This is happening due to the lack of insufficient implementation of load balancing techniques in the network. As a consequence, some controllers are overloaded and some are underloaded. When a controller has to handle network traffic more than its capacity or threshold value; then the rate of packet loss is increasing and the performance of the network is decreased. This situation has occurred when some controllers are overloaded in the network and some controllers are in the underloaded or idle state then overall utilization of resources is hampered which also affects the performance of the network [3, 5, 6, 16, 17]. That is why it is essential to develop an adaptive load balancing strategy that improves the performance of the network. To resolve these issues, to propose an algorithm that works adaptively to manage the load of controllers in the SDN network. For this purpose, the integrated concept of Queuing Theory Technique and Markov Continuous Chain to manage the load balance between the SDN controllers, which reduces the packet loss ratio of a network. So, it is necessary to distribute the proper workload on controllers because network traffic fluctuates dynamically. The Queuing Technique is used for load balancing of SDN controllers because of its three main characteristics like arrival rate, service facility, and actual waiting time in the line [16]. These characteristics are easily incorporated by the SDN controller as shown in Fig. 3. The arrival rate of packets in the system is represented by λ which is followed by poison probability of distribution and the service rate of packets can be followed by exponential probability distribution is represented by μ. The probability of service facility provided by the system is represented by ρ (rho); it is also known as the utilization factor of the system. Network traffic behaves like a stochastic process whose behavior is random changes over time. The Markov process is a simple stochastic process, in which the distribution of the future state depends only on the present state of the process
376
D. K. Ryait and M. Sharma
Fig. 3 Using queuing concept in SDN controller
rather than on how to arrive at the present state. Thus, the stochastic process must possess the “memoryless” property [16, 17]. This characteristic says that the present state “Yt ” at time instant “t” determines the probability of the future state “Yt+1 ” at time instant “t + 1 ” as defined in Eq. (1): P Yt+1 |Yt , Yt−1 , . . . , Y2 , Y1 , Y0 = P Yt+1 |Yt
(1)
That means Yt+1 depends upon Yt , but does not depend upon Yt−1 , Yt−2 , . . . , Y1 , Y0 . For this purpose, to design a model to deal with multiple controllers in SDN as shown in Fig. 4.
3.2 Proposed Algorithm for Load Balancing in SDN Controllers /* Algorithm for load balancing in SDN controllers through the queuing technique*/ Initial Requirements: lc → load of the controller; th → threshold value of controller; lq → length of the queue; sq → specified queue size; begin if (lc < th and lq < sq) then { /* current controller handles the workload and monitor queue variables by using procedure current_controller */ exec proc current_controller; } else { /* select a controller whose threshold value is less than load of the controller and its queue length is less then specified queue size by using procedure control_ transfer */
Statistical Influence of Parameters on the Performance of SDN
377
Fig. 4 Design of model for multiple controllers
exec proc control_transfer; } end if; end;
4 Simulation and Evaluation of Results During simulation all required manipulation on both queue models with infinite and finite capacity; the result of the simulation is represented in the form of graphs as shown in Fig. 5a to g, respectively. Figure 6 highlights the average throughput, average delay, and packet delivery ratio in both models; The average delay of the unlimited queue model is much higher than the limited queue model; because when the size of the queue is increasing or set to infinite then the packet delivery ratio increases but at the cost of increased average delay which is not affordable or acceptable for any communication; If the size of the
378
D. K. Ryait and M. Sharma
Fig. 5 a: Instantaneous delay in queue model with infinite capacity, b: instantaneous delay in queue model with finite capacity, c: comparison between both queue models with respect to queue size, d: comparison between both queue models with respect to packet arrived, e: comparison between both queue models with respect to packet departed, f: comparison between both queue models with respect to packet dropped, g: comparison between both queue models with respect to queueparameters. *Note “Y” axis represents Queue-Parameters like qsizeB, qsizeP, arrivedP, departedP, droppedP, arrivedB, departedB, and droppedB; where *B is number in Bytes, and *P is number in Packets
Statistical Influence of Parameters on the Performance of SDN
379
queue is set to finite, then an average delay of the network become lessen as shown in Fig. 6. According to these parameters, the finite capacity queue model is more preferred than the infinite queue model. Several different types of queue objects are available like DropTail, Stochastic Fair Queue (SFQ), and so on. These queue objects show varying behavior in terms of packet drop rate, probability of packet delivery ratio, and packet drop ratio in Fig. 7a to c, respectively. The DropTail queue object shows a packet delivery ratio of 99.22% which is higher than other objects and a packet drop ratio of 0.78% approximately which is lower than another queue object. Thus, to elect a finite capacity queue model with DropTail queue object after analyzing the above simulation result. Now analyze the influence of parameters by using statistical tools. Correlation is used to express the association between the variables. The degree of association is measured by a correlation coefficient. The value of the correlation coefficient varies from + 1 to − 1. If the value of one variable is increased, and the value of another variable is also increased then a positive correlation exists. If the value of one variable is increased and the value of another variable is decreasing then a negative correlation is existing. The absence of correlation between variables is represented by zero. This analytical process is based on a matrix of correlation between the variables. Valuable insight can be obtained from this matrix. The matrix of correlation between various things is shown in Fig. 8. The No. of Packet Received parameter is correlated to Queue Size is 0.389 (approx.), Total No. of Packets is 0.964 (approx.), and Packet Size is 0.078 (approx.) and similar to other variables. Moreover, the multiple regression model is performed on these parameters; to study how the response variable is influenced by two or more explanatory variables. In general, the multiple regression model is defined as: y = β0 + β1 X 1 + β2 X 2 + · · · + βk X k M/M/1 Queue Model Average Throughput (kbps)
Packet Delivery Rao (%)
Average Delay (ms)
99.9988% 69.975%
241.30 90.37 Infinite Capacity
63.24
33.36
Finite Capacity
Fig. 6 Comparison between both queue model with respect to parameters
(2)
380
D. K. Ryait and M. Sharma
b
Rate of Packets Drop
Packet Delivery Rao
25
99.40
20
99.20
Probabilty
No.of Packets Drop
a
15 10 5
99.00 98.80 98.60
0 DropTail
SFQ
DRR
Probabilty
c
98.40
RED
DropTail
SFQ
DRR
RED
Packet Drop Rao
1.50 1.00 0.50 0.00
DropTail
SFQ
DRR
RED
Fig. 7 a: Number of packets drop in various queue object, b: probability of packet delivery ratio in various queue object, c: probability of packet drop ratio in various queue object
Correlation between Parameters
No. of Packet Received
No. of Packet Received Queue Size Total No. of Packets Packet Size
1 0.38931286 0.964086941 0.078563975
Queue Size 1 0.3623451 0.59669339
Total No. of Packets
1 0.035202443
Packet Size
1
Fig. 8 Correlation matrix between various parameters
where y is a response variable, X 1 + X 2 + · · · + X k are explanatory variables, β0 is a slope intercept coefficient and β1 + β2 + · · · + βk are the coefficient of variables. In Fig. 9, where y(response variable) is Packet Successfully Delivered, X 1 + X 2 + · · · + X k (explanatory variables) are Queue Size, Total No. of Packets sent, and Packet Size, β0 is a slope intercept coefficient and β1 + β2 + · · · + βk are the coefficient of variables. Then derived an equation for packet successfully delivered by using Eq. (2) as shown below: Packet Successfully Delivered ⎡ ⎤ (−442.855) + (14.928 ∗ Queue Size) ⎢ ⎥ + (1.127 ∗ Total No. of Packets) ⎦ = ⎣ + (0.001 ∗ Packet Size)
Statistical Influence of Parameters on the Performance of SDN
381
It was observed that the P-value of packet size parameter was more than 0.05 in Fig. 9; that signifying that packet size is not a statistically significant variable in the overall regression model. Then, another regression was done (Fig. 10), to derive an equation for packet successfully delivered for two explanatory variables (Queue Size and Total No. of Packets send) as shown below by using Eq. (2) in the regression model as shown below: Packet Successfully Delivered
(−432.250) + (16.1596 ∗ Queue Size) = + (1.1207 ∗ Total No. of Packets) It was observed that this time multiple regression model was having higher adjusted R Square value of 0.930 (versus previous case 0.929; Fig. 9) and with all statistically significant explanatory variables. Then multiple regression model shows 92.95% of the variation in Packet Successfully Delivered in the network can be statistically explained with Queue Size, Total No. of Packets, and Packet Size in Figs. 9 and 10 shows 93.015% of the variation in Packet Successfully Delivery in the network can be statistically explained with Queue Size and Total No. of Packets in multiple regression model. Therefore, both Queue Size and Total No. of Packets are individually useful in the prediction of Packet Successfully Delivered in the network.
5 Conclusion The Software Defined Network provides a novel paradigm of networking that enhances innovation in networking as compared to the traditional network. The main contribution is to propose an adaptive algorithm for load balancing in multiple controllers by using the queuing technique with the Markov chain to evaluate the probability distribution for controllers, which assists manage the load among controllers in a convenient way. Based on probability, it reduces packet loss ratio, overheads, and migration cost of the network due to managing load balancing. As a consequence, a cascaded failure of controllers in a network can be avoided that occurs due to an imbalance of controllers. Moreover, a multiple regression model is applied to study how the Packet Successfully Delivered variable is influenced by other parameters in Figs. 9 and 10. That implies the multiple controllers provide a ubiquitous and robust network that extends scalability, reliability, and high availability of a network service after evaluating the probability distribution of controllers.
Fig. 9 Result of multiple regression model of three variables
382 D. K. Ryait and M. Sharma
383
Fig. 10 Result of multiple regression model of two variables
Statistical Influence of Parameters on the Performance of SDN
384
D. K. Ryait and M. Sharma
References 1. Malik A, Aziz B, Al-Haj A, Adda M (2019) Software-defined networks: a walkthrough guide from occurrence to data plane fault tolerance. PeerJ Preprints, pp 1–26 2. Kreutz D, Ramos FMV, Verissimo PE, Rothenberg CE, Azodolmolky S, Uhlig S (2015) Software-defined networking: a comprehensive survey. Proc IEEE 103(1):14–76 3. Rehman AU, Aguiar RL, Barraca JP (2019) Fault-tolerance in the scope of software-defined networking (SDN). IEEE Access 7:124474–124490 4. Yu Y, Li X, Leng X, Song L, Bu K, Yang J, Chen Y, Zhang L, Cheng K, Xiao X (2018) Fault management in software-defined networking: a survey. IEEE Commun Surv Tutorials 21(1):349–392 5. Karakus M, Durresi A (2017) A survey: control plane scalability issues and approaches in software-defined networking (SDN). Comput Netw (Elsevier) 112:279–293 6. Aly WHF (2019) Controller adaptive load balancing for SDN networks. In: 2019 Eleventh international conference on ubiquitous and future networks (ICUFN). IEEE, pp 514–519 7. Rowshanrad, Namvarasl S, Keshtgari M (2017) A queue monitoring system in openflow software defined networks. J Telecommunications and Information Technology, pp 39–43 8. Mahjoubi A, Zeynalpour O, Eslami B, Yazdani N (2019) LBFT: load balancing and fault tolerance in distributed controllers. In: 2019 International symposium on networks, computers and communications (ISNCC). IEEE, pp 1–6 9. Mamushiane L, Mwangama J, Lysko AA (2018) Given a SDN topology, how many controllers are needed and where should they go?. In: 2018 IEEE Comference on network function virtualization and software defined networks (NFV-SDN). IEEE, pp 1–6 10. Xu J, Wang L, Song C, Xu Z (2019) Minimizing multi-controller deployment cost in softwaredefined networking. In: 2019 IEEE Symposium on computers and communications (ISCC). IEEE, pp 1–6 11. Mondal A, Misra S, Maity I (2019) Buffer size evaluation of openflow systems in softwaredefined networks. IEEE Syst J 13(2):1359–1366 12. Faraj MK, Al-Saadi A, Albahadili RJ (2020) Load balancing using queue length in SDN based switches. J Xi’an Univ Archit Technol XII(IV):2603–2611 13. Mondal A, Misra S, Maity I (2020) AMOPE: performance analysis of openflow systems in software-defined networks. IEEE Syst J 14(1):124–131 14. Hamdan M, Hassan E, Abdelaziz A, Elhigazi A, Mohammed B, Khan S, Vasilakos AV, Marsono MN (2021) A comprehensive survey of load balancing techniques in software-defined network. J Netw Comput Appl 174(102856):1–30 15. Huang G, Youn HY (2020) Proactive eviction of flow entry for SDN based on hidden Markov model. Front Comp Sci 14(4):1–10 16. Kleinrock L (1975) Queueing systems—Volume 1: Theory. Wiley-Interscience 17. Nencioni G, Helvik BE, Gonzalez AJ, Heegaard PE, Kamisinski A (2016) Availability modelling of software-defined backbone networks. In: 2016 46th Annual IEEE/IFIP international conference on dependable systems and networks workshop (DSN-W). IEEE, pp 105–112
Investigations on Channel Characteristics and Range Prediction of 5G mmWave (39 GHz) Wireless Communication System I. Johnsi Stella and B. Victoria Jancee
Abstract Millimeter-wave (mmWave) and microwave frequency bands, covering frequencies in the range of 24 to 86 GHz, are likely to be used by next-generation wireless communication networks. The choices made for practically every aspect of wireless communications are greatly influenced by engineers’ ability to build, deploy, and compare potential wireless systems. In this article, channel characteristics at 39 GHz are investigated and this can be useful in designing 5G radio systems for the indoor scenario. The NYUSIM Millimeter-Wave Channel Simulator (version 3.0), which was created from numerous years of field measurements, is used in this work to conduct a detailed analysis on the channel characteristics under various environmental conditions. From the data obtained from simulator output data files, indoor coverage is predicted for the assumed non-line-of-sight and line-of-sight environments. Keywords mmWave · Coverage · Non-line-of-sight · Line-of-sight · NYUSIM · Channel simulator
1 Introduction Millimeter-wave (mmWave) wireless communication in the 30–300 GHz band holds the key toward achieving higher data rates in 5G networks of the future, which have the potential for enabling cutting edge applications such as wireless HD, 3D telepresence, virtual reality, and V2X (vehicle to everything) communication [1]. The practical implementation of mmWave technology, however, confronts a number of obstacles, including larger route losses, more sophisticated hardware, and severe signal blocking. This has inspired academics to look into workable ways that deal with these difficulties without materially impacting the system’s overall performance.
I. J. Stella (B) · B. V. Jancee St. Joseph’s College of Engineering, Chennai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 689, https://doi.org/10.1007/978-981-99-2322-9_28
385
386
I. J. Stella and B. V. Jancee
In order to do this, understanding channel behavior is crucial for creating highperformance wireless communication systems. As a result, this has obviously been the subject of many studies in this field. Service operators around the world have paid billions of dollars for spectrum to service their customers. Auction pricing for spectrum highlights its value in the market and the scarcity of this precious resource. Opening up new spectrum could enable service operators to accommodate more users while also delivering a higher performance mobile broadband data experience. Compared to spectrum below 6 GHz, mmWave is plentiful and lightly licensed, meaning it is more accessible to service operators around the world. The cost of mmWave equipment has substantially decreased with to improvements in silicon production, making it affordable for consumer electronics. The challenges impacting mmWave adoption now lie primarily in the unanswered technical questions regarding this largely un-investigated spectrum. Through these varying groups and motivations, a set of frequencies are beginning to emerge as the candidates for 5G: 28, 39, and 72 GHz. These 3 frequency bands have emerged for a number of reasons. First off, they have substantially lower rates of oxygen absorption than 60 GHz, which results in a loss of about 20 dB/km and makes them more suitable for long-distance communications. These frequencies can be utilized for non-line-of-sight communications and perform well in multipath settings. Using highly directional antennas in combination with beam forming and beam tracking, mmWave can provide a reliable and very secure link. Extensive research is being carried out to investigate the channel properties and potential performance at 28, 39, and 72 GHz. This article examines 39 GHz channel characteristics, which can be helpful when designing 5G radio systems for indoor application. This study makes extensive use of the New York University Simulator (NYUSIM) Millimeter-Wave Channel Simulator (version 3.0), which was created as a result of years of field observations, to analyze the channel characteristics under various environmental situations. From the data obtained from simulator output data files, indoor coverage is predicted for the assumed non-line-of-sight (NLOS) and line-of-sight (LOS) environments. The statistics presented here can be used in realistic system-level simulation and airinterface design of next-generation mmWave wireless communication systems. The remaining portions of the article are structured as follows: The literature evaluation of several research publications pertinent to the indoor coverage prediction is presented in Sect. 2. Section 3 presents the basic terminologies used in wireless communication systems and discusses about the NUYSIM simulator used in this research work. Section 4 discusses the methodology used and the simulation study conducted for prediction of indoor coverage of 5G mmWave wireless communication at 39 GHz. In Sect. 5, a conclusion about the research work is presented and future scope of this work is discussed.
Investigations on Channel Characteristics and Range Prediction …
387
2 Related Works In order to enable large data rates in the next-generation 5G communication systems, millimeter-wave (mmWave) communication, which makes use of the spectrum in the 30 to 300 GHz region, has demonstrated tremendous promise [2]. However, mmWave communication confronts a number of difficulties when compared to typical systems using carrier frequencies below 6 GHz, including higher path losses, severe signal blocking, and increased device complexity [3]. This has sparked a variety of academic and commercial research interests for indoor and outdoor mmWave operations. A thorough analysis of the channel models that are applied in the design of 5G radio systems is given by Sun et al. in [4]. Two well-known mmWave channel models—the 3rd Generation Partnership Project model, adopted by the International Telecommunication Union, and the NYUSIM model, developed from years of field measurements in New York City—as well as the key distinctions between mmWave and microwave channel models are presented. The use of the channel models in examples for a variety of applications follows, showing the broad use of the models and their parameter values. These findings demonstrate that channel performance measures, including spectrum efficiency, coverage, hardware/signal processing needs, etc., are highly dependent on the selection of channel models. An indoor 3D spatial statistical channel model for millimeter wave and subTHz frequencies was reported by Ju et al. [5, 6] based on extensive radio propagation measurements at 28 and 140 GHz carried out in an inside office setting. Around 15,000 measured power delay profiles were used to develop omnidirectional and directional path loss models as well as channel data like the amount of time clusters, cluster delays, and cluster powers. The resultant channel statistics demonstrate that for both LOS and NLOS environments at 28 and 140 GHz, the number of temporal clusters follows a Poisson distribution and the number of sub-routes inside each cluster follows a composite exponential distribution. A related indoor channel simulator is created, capable of simulating any mmWave and sub-THz carrier frequencies up to 150 GHz, signal bandwidth, and antenna beamwidth 3D omnidirectional, directional, and multiple input multiple output (MIMO) channels. Future air interface, beamforming, and transceiver designs for 6G and beyond will be guided by the statistical channel model and simulator that have been provided. Realistic channel impulse response, PDP, and power angular spectrum can be produced via an indoor channel simulation software that is based on the well-known outdoor channel simulator NYUSIM. A thorough overview of the investigations conducted to describe mmWave channels covering frequencies from 28 to 100 GHz for interior environments has been presented by Al-Saman et al. [7]. For mmWave in various indoor contexts, a study of measuring methods, well-known path loss models, analysis of path loss and delay spread are described. The mmWave indoor propagation research and measurements’ potential future trends have also been examined. The important indoor environment, the roles of artificial intelligence, the channel characterization for indoor devices, reconfigurable intelligent surfaces, and mmWave for 6G systems are some of these.
388
I. J. Stella and B. V. Jancee
3 Simulation of Propagation Channel of 5G Wireless Communication System 3.1 Basic Terminologies 3D AoA power spectrum: AoA—angle of arrival shows the angle of multipath components at which power arrives at the receiver (RX). 3D AoD power spectrum: AoD—angle of departure illustrates the position of multipath components at which the power of the signal is communicated from transmitter (TX). Directional PDP: The graphic shows a sample directional power delay profile (PDP) with the highest power when directional antenna gain patterns are used at the TX and/ or RX. Since directional antennas/antenna arrays will be used at the TX and/or the RX in a realistic mmWave communication system to provide gains to compensate for the higher free space path loss at mmWave frequencies, this figure is produced by allowing users to implement arbitrary directional antenna patterns (gains and HPBWs). Small Scale PDP: The type of antenna array, the number of antenna elements, and the distance between the antenna elements are specified by the user on the GUI, which displays a sequence of PDPs over each receive antenna element. Line-of-sight (LOS): Line-of-sight (LOS) propagation is a characteristic of electromagnetic radiation in which two stations can only transmit and receive data signals when they are in direct view of each other with no obstacles in between. Satellite and microwave transmission are two common examples of LOS communication. Non-line-of-sight (NLOS): When there is no visual line-of-sight (LOS) between the transmitting and receiving antennas, a radio channel or connection is said to be non-line-of-sight (NLOS) in the context of radio communications.
3.2 Simulation of Propagation Channel Before adopting new technologies, performance evaluation of communications systems and network deployment simulations using computer-aided design tools like channel simulators are crucial. The development and implementation of channel models are becoming more and more important for wireless communication system design. Previous researchers have developed and used a number of channel simulators, including SIRCIM, SMRCIM, and BERSIM [8]. NYUSIM [9], an open-source channel simulator, is utilized in this study. NYUSIM was created using data from several wideband millimeter-wave (mmWave) propagation channel tests conducted in the real world at frequencies ranging from 500 MHz to
Investigations on Channel Characteristics and Range Prediction …
389
150 GHz. NYUSIM is usable for a wide range of carrier frequencies from 500 MHz to 150 GHz and RF bandwidths from 0 to 800 MHz, and it delivers an accurate portrayal of the actual channel impulse responses in both time and space. The simulator’s source code was created in MATLAB. Because of its platform-independent graphical user interface (GUI), NYUSIM can be used on computers running either Windows or Macintosh operating systems without the need for MATLAB to be loaded. NYUSIM 3.0 is capable of creating mmWave multiple input multiple output (MIMO) channels for both outdoor and indoor scenarios. This newly implemented indoor channel model extension shares the graphical user interface (GUI) with the outdoor channel simulation and can generate realistic wideband channel impulse response (CIR), PDP, and angular power spectrum (APS) from 500 MHz to 150 GHz. The new NYUSIM 3.0 and NYUSIM 3.1 can support various applications and designs of advanced hybrid beamforming algorithms, signal strength prediction, and channel state estimation, for next-generation cellular and WIFI wireless systems. The Monte Carlo simulations run by the NYUSIM 3.0 simulator produce samples of CIRs at particular transmitter–receiver (T-R) separation distances. The user specifies the T-R separation range, and a uniform distance is chosen from that range to represent the actual T-R separation. For simulation, there are two running modes: drop-based mode and spatial consistency mode. There are 49 input parameters to the channel simulator, which are grouped into four main categories: channel parameters, antenna properties, spatial consistency parameters, and human blockage parameters [8]. The panel, channel parameters, contains 21 fundamental input parameters about the propagation channel; the panel, antenna properties, comprises 12 input parameters related to the TX and RX antenna arrays; the panel, spatial consistency, contains 10 input parameters related to the spatial consistency implementation, and the panel, human blockage, includes 6 input parameters related to the human blockage shadowing loss due to a person near the mobile phone [8]. The GUI of NYUSIM provides options for the users to select output file type and folder for saving the output figure files and output data files which are generated after the simulation is completed.
4 Range Prediction for 5G Wireless Communication System 4.1 Range Prediction Methodology The prediction of the coverage of an indoor wireless system is crucial for the deployment of large wireless networks. Due to the increasing path loss within the first meter of propagation distance, the cell size shrinks and the ultra-dense network is
390
I. J. Stella and B. V. Jancee
required to provide sufficient link margin. NYUSIM can be used to predict the typical coverage of a 5G mmWave indoor wireless communication system. For predicting the coverage of 5G mmWave communication system in a given indoor scenario, the input parameters are set in the GUI of NYUSIM simulator. The average received power level as function of T-R is computed and plotted using the power delay profile data generated by the simulator. The designer can determine the range of the communication system from the T-R distance Vs average received power level plot, for the given receiver sensitivity. The step-by-step procedure for indoor coverage prediction using the data obtained from NUYSIM is illustrated in Fig. 1. As an example illustration, to predict the coverage of 5G mmWave communication system operating at 39 GHz, the input parameters on the NYUSIM are set with the following values [9]: • • • •
Frequency: 39 GHz RF bandwidth: 800 MHz Scenario: InH Distance range option: Indoor (5–50 m)
Fig. 1 Methodology for indoor coverage prediction using NYUSIM
Investigations on Channel Characteristics and Range Prediction …
• • • • • • • • • • • • • • • • • • •
391
Environment: LOS/NLOS Lower bound of T-R separation distance: 5 m Upper bound of T-R separation distance: 50 m TX power: 10 dBm Base station height: 2.5 m Polarization: Co-Pol Number of RX locations: 50 in LOS and 50 in NLOS TX array type: URA RX array type: URA Number of TX antenna elements Nt: 16 Number of RX antenna elements Nr: 4 TX antenna spacing: 0.5 wavelength RX antenna spacing: 0.5 wavelength Number of TX antenna elements per row Wt: 4 Number of RX antenna elements per row Wr: 1 TX antenna azimuth HPBW: 10° TX antenna elevation HPBW: 10° RX antenna azimuth HPBW: 30° RX antenna elevation HPBW: 30°
4.2 Analysis of Simulator Output For each simulation run, five figures are generated and stored that are based on the particular results of the simulation that is being run, and an additional figure of path loss scatter plot is generated and stored after N (N ≥ 1) continuous simulation runs with the same input parameters are complete [8]. The last figure generated for N (N ≥ 1) continuous simulation runs with the same input parameters, along with the five figures generated from the initial simulation run, are displayed on the screen for visual purposes regardless of the number of simulation runs (RX locations). The details of figures displayed are provided here. 3D AoA power spectrum is shown in Fig. 2. Three-dimensional (3D) AoD power spectrum is illustrated in Fig. 3. In Fig. 4, an example of an omnidirectional PDP is shown. The PDP plot shows some fundamental data, including the frequency, surroundings, T-R separation distance, RMS delay spread, omnidirectional received power, omnidirectional path loss, and PLE. Figure 5 shows a sample directional PDP with the strongest power, where directional antenna gain patterns are used at the TX and/or RX. This figure is generated by allowing users to implement arbitrary directional antenna patterns (gains and HPBWs), since directional antennas/antenna arrays will be utilized at the TX and/or the RX in a realistic mmWave communication system to provide gains to compensate for the higher free space path loss at mmWave frequencies. A series of PDPs over each receive antenna element is displayed as shown in Fig. 6, where the antenna array type, number of antenna elements, and antenna element spacing are specified on the GUI by the user.
392
I. J. Stella and B. V. Jancee
Fig. 2 3D AOA power spectrum
Fig. 3 3D AOD power spectrum
As shown in Fig. 7, after N (N ≥ 1) continuous simulation runs with the identical input parameters, a path loss scatter plot called “PathLossPlot” is produced. In addition to the fitted pass loss exponent (PLE) and the standard deviation of the shadow fading, this figure displays the omnidirectional path loss and directional path loss values for the whole distance range. These results were obtained from the N (N ≥ 1) continuous simulation runs. In the legend of the “PathLossPlot” image,
Investigations on Channel Characteristics and Range Prediction …
393
Fig. 4 Omnidirectional power delay profile
Fig. 5 Directional power delay profile with strongest power
“omni” implies omnidirectional, “dir” represents directional, and “dirbest” denotes the direction with the strongest received power. “n” denotes the PLE.
4.3 Inference from Simulator Output Data Files For each simulation run, five sets of .txt files and five corresponding .mat files are generated, namely “AODLobePowerSpectrumn_Lobex.txt”, “AODLobePowerSpectrumn.mat”, “AOALobePowerSpectrumn_Lobex.txt”, “AOALobePowerSpectrumn.mat”, “OmniPDPn.txt”, “OmniPDPn.mat”, “DirectionalPDPn.txt”, “DirectionalPDPn.mat”, “SmallScalePDPn.txt”, and “SmallScalePDPn.mat”, where n denotes the nth RX location n run and x represents the xth spatial lobe [8]. After N (N ≥ 1) continuous simulation runs with the same input parameters are complete, another three.txt files and three corresponding .mat files are
394
I. J. Stella and B. V. Jancee
Fig. 6 Power delay profiles over different receive antenna elements Fig. 7 Scatter plot showing the omnidirectional and directional path loss values generated from NYUSIM with 50 simulation runs for the 39 GHz Indoor LOS scenario
produced, i.e., “BasicParameters.txt”, “BasicParameters.mat”, “OmniPDPInfo.txt”, “OmniPDPInfo.mat”, “DirPDPInfo.txt”, and “DirPDPInfo.mat”[8]. The text file “BasicParameters.txt” and the .mat file “BasicParameters.mat” subsume all the input parameter values as shown on the GUI when running the simulation. The text file “OmniPDPInfo.txt” and the .mat file “OmniPDPInfo.mat” contain five columns where each column represents a key parameter for each of the N omnidirectional PDPs from N continuous simulation runs. The parameters are T-R separation distance (m), received power (dBm), path loss (dB), RMS delay spread (ns), and Ricean K-factor (dB) [8]. The text file “DirPDPInfo.txt” and the
Investigations on Channel Characteristics and Range Prediction …
395
.mat file “DirPDPInfo.mat” contain 11 columns where each column represents a key parameter for each of the directional PDPs from N continuous simulation runs. The parameters are simulation run number, T-R separation distance (m), time delay (ns), received power (dBm), phase (rad), azimuth AoD (degree), elevation AoD (degree), azimuth AoA (degree), elevation AoA (degree), path loss (dB), and RMS delay spread (ns). Received power and the average power level of the received signals as a function of distance is analyzed for the channel conditions considered. Received power data as a function of distance is imported from “OmniPDPInfo.mat” and “DirPDPInfo.mat” files, and average received power is computed. Figure 8 shows the scatter plot of the received power and the average power level of the received signals at distances from 5 to 50 m for indoor omnidirectional/directional channels in LOS and NLOS environment. Figure 9 shows the plot of the average power level of the received signals at distances from 5 to 50 m for indoor omnidirectional/directional channels in LOS and NLOS environment. Antenna arrays providing narrow beams with HPBW beamwidths of 10° and 30° are used at the transmitter and receiver, respectively, which can provide good directional gains and eliminate interference from communication links of other hotspots or mobile devices. The TX and RX beams are pointed in the boresight direction in the LOS environment, while the beams are pointed in the strongest reflection direction in the NLOS environment. Coverage depends on the receiver sensitivity and can be determined from the graph plotted between T-R distance and average received power level. This is illustrated with an example. Assuming that the receiver has a sensitivity of − 85 dBm, Fig. 10 suggests that the receiver maintains a sufficient power level for distances even beyond 50 m in the Omni-LOS environment. In the Dir-LOS environment, the received power drops below the receiver sensitivity (−85 dBm) at the separation distance of 15.5 m. However, in the omni-NLOS/ Dir-NLOS environment, the received power drops below the receiver sensitivity (−85 dBm) at any distance from transmitter and communication receiver will not receive the signal.
5 Conclusion This article presented a detailed account on wireless channel behavior at 39 GHz in LOS and NLOS scenarios in an indoor environment. The indoor channel simulator NYUSIM 3.0 was used to analyze AoA power spectrum, AoD power spectrum, directional power delay profile, omnidirectional power delay profile, and small scale power delay profile. The secondary channel statistics provided by the simulator in the data file was used to obtain average power as a function of T-R distance. These results can help the network designer to successfully design and deploy wireless network at mmWave frequencies. Methodology is illustrated for indoor coverage prediction and channel behavior is investigated, which can support analysis and design of 5G indoor wireless systems. The results presented here can be used in realistic system-level simulations and air-interface design of mmWave communication systems.
Fig. 8 T-R distance versus received power in indoor wireless channel (LOS/NLOS)
396 I. J. Stella and B. V. Jancee
Fig. 9 T-R distance versus average received power in indoor wireless channel (LOS/NLOS)
Investigations on Channel Characteristics and Range Prediction … 397
Fig. 10 Range prediction from T-R distance versus average received power plot
398 I. J. Stella and B. V. Jancee
Investigations on Channel Characteristics and Range Prediction …
399
The NYUSIM simulator allows the user to alter about 49 input parameters which are grouped under four categories viz., “channel parameters, antenna parameters, spatial consistency parameters, and human blockage parameters”. Channel characteristics can be investigated for any given frequency and channel conditions using the methodology presented in this article.
References 1. Srivastava S et al (2019) Quasi-static and time-selective channel estimation for block-sparse millimeter wave hybrid MIMO systems: Sparse Bayesian Learning (SBL) based approaches. IEEE Trans Sig Process 67(5):1251–1266 2. Rappaport TS, Heath RW Jr, Daniels RC, Murdock JN (2015) Millimeter wave wireless communications. Prentice Hall 3. Rappaport TS, Sun S, Mayzus R, Zhao H, Azar Y, Wang K, Wong GN, Schulz JK, Samimi M, Gutierrez F (2013) Millimeter wave mobile communications for 5G cellular: it will work! IEEE Access 1:335–349 4. Sun S, Rappaport TS, Shafi M, Tang P,Zhang J, Smith PJ (2018) Propagation models and performance evaluation for 5G millimeter-wave bands. IEEE Trans Veh Technol 67(9):8422– 8439 5. Ju S, Xing Y, Kanhere O, Rappaport TS (2021) Millimeter wave and sub-terahertz spatial statistical channel model for an indoor office building. IEEE J Sel Areas Commun 39(6):1561– 1575 6. Ju S, Xing Y, Kanhere O, Rappaport TS (2020) 3-D Statistical indoor channel model for millimeter-wave and sub-terahertz bands. In: GLOBECOM 2020 - 2020 IEEE Global communications conference. IEEE, pp 1–7 7. Al-Saman A, Cheffena M, Elijah O, Al-Gumaei YA, Rahim SKA, Al-Hadhrami T (2021) Survey of millimeter-wave propagation measurements and models in indoor environments. Electronics 10(14):1653. https://doi.org/10.3390/electronics10141653 8. Sun S, MacCartney GR, Rappaport TS (2017) A novel millimeter-wave channel simulator and applications for 5G wireless communications. In: 2017 IEEE International conference on communications (ICC). IEEE, pp 1–7 9. Ju S, Sun S, Rappaport TS (2021) NYUSIM User Manual Version 3.0, New York
Cervical Spine Fracture Detection Flemin Thomas and P. Savaridassan
Abstract For the imaging diagnosis of adult spine fractures, computed tomography (CT) has nearly totally replaced radiography (X-rays). Finding any vertebral fractures as soon as feasible is essential to preventing neurologic deterioration and paralysis after trauma. In order to detect and localize fractures to the seven cervical spine vertebrae, a machine learning model is used to match the radiologists’ performance. Spinal fractures most typically happen in the cervical spine. The incidence of spinal fractures in the elderly has grown, and in this population, concurrent degenerative diseases and osteoporosis may make fractures harder to identify on imaging. Radiologists may benefit from using the exploratory data analysis to organize their to-do lists and spot cervical spine fractures on CT imaging. Keywords Cervical · Spine · CT · EDA · Fractures
1 Introduction Numerous researches examining the ability of artificial intelligence (AI) to identify fractures have been carried out. AI has been used to identify fractures in the thoracic and lumbar spine on dual X-ray absorptiometry, as well as the hip, humerus, distal radius, wrist, hand, and ankle on radiographs. In addition, CT scans of fractures in the thoracic, lumbar, and calcaneal vertebral bodies have been detected using AI. Over 1 million patients with blunt trauma and probable cervical spine damage are examined each year in the USA, while over 3 million patients in North America get annual evaluations for cervical spine injuries. High morbidity and mortality rates have been linked to cervical spine injuries, and inadequate immobilization due to a
F. Thomas (B) · P. Savaridassan Department of Networking and Communications, School of Computing, SRM IST, Chennai, India e-mail: [email protected] P. Savaridassan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 689, https://doi.org/10.1007/978-981-99-2322-9_29
401
402
F. Thomas and P. Savaridassan
delayed diagnosis of an unstable fracture can have disastrous effects on neurologic function. The evaluation of trauma patients must therefore start with the cervical spine being cleared through imaging, and multidetector CT has become the imaging method of choice for assessing cervical spine trauma mortality and morbidity in cervical spine injury patients. Retrospective clinical diagnosis is used to determine the existence of fractures.
2 Literature Survey For most of us, “what other people think” has always been a crucial piece of knowledge when making decisions. Now, among other things, it is possible to learn about the perspectives and experiences of the enormous group of individuals who are neither our close friends nor well-known professionals in the field of criticism—that is, people we have never heard of. Conversely, an increasing number of people are sharing their ideas online with complete strangers. This area of interest is driven by the interest that individual users have in online reviews of goods and services as well as the potential power that these reviews may have. Additionally, there are numerous obstacles in the way of this process that must be overcome in order to produce the desired results. In this study, we examined the core approach that normally takes place in this procedure as well as the steps that must be taken to overcome any challenges.
2.1 CT Cervical Spine Fracture Detection Using a Convolutional Neural Network Small et al. [1], presented a detailed study on the standard of care imaging method for assessing cervical spine trauma is multidetector CT. Our goal was to assess how well a convolutional neural network identified fractures of the cervical spine on CT. We assessed C-spine, a convolutional neural network created by Aidoc and approved by the FDA to identify fractures of the cervical spine on CT. In all, 665 exams were included in the analysis. Retrospective CT visualization of a fracture using all CT, MR, and convolutional neural network output data was used to determine the ground truth. Convolutional neural network and radiologist evaluations were compared to ground truth in order to determine the diagnostic accuracy and agreement, and the OE coefficients, sensitivity, specificity, positive and negative predictive values, respectively, were calculated with 95% CIs. The accuracy of the convolutional neural network in detecting cervical spine fractures was 92% (95% CI, 90–94%), with a sensitivity and specificity of 76% (95%
Cervical Spine Fracture Detection
403
CI, 68–83%) and 97% (95% CI, 95–98%), respectively. With 93% (95% CI, 88– 97%) sensitivity and 96% (95% CI, 94–98%) specificity, the radiologist’s accuracy was 95% (95% CI, 94–97%). The lower cervical spine fractures that are frequently hidden by CT beam attenuation as well as fractured anterior osteophytes, transverse processes, and spinous processes were among the fractures that the convolutional neural network and radiologists both missed. The convolutional neural network has potential for helping radiologists to prioritize their work lists and find cervical spine fractures on CT scans. It is crucial to comprehend the convolutional neural network’s advantages and disadvantages before successfully implementing it in therapeutic settings. Convolutional neural network diagnostic utility will be improved with further sensitivity enhancements.
2.2 Deep Sequential Learning for Cervical Spine Fracture Detection on Computed Tomography Imaging Salehinejad et al. [2] proposed that medical emergencies such cervical spine fractures can result in lifelong paralysis or even death. Computed tomography (CT) diagnosis of patients with suspected fractures must be accurate for effective patient care. In this study, we suggest using a deep convolutional neural network (DCNN) with a bidirectional long short-term memory (BLSTM) layer to automatically identify fractures of the cervical spine in CT axial images. About 3666 CT images (729 positive and 2937 negative instances) were utilized in an annotated dataset that was used to train and evaluate the algorithm. In the balanced (104 positive and 104 negative instances) and unbalanced (104 positive and 419 negative cases) test datasets, the classification accuracy was 70.92% and 79.18%, respectively, according to the validation findings. It is exceedingly difficult to automatically detect fractures in cervical spine CT scans. The machine learning model we suggested in this study, based on the ResNet50 + BLSTM layer, exhibits the capabilities of deep neural networks to take on this issue. We encourage the research community to tackle this issue head-on and are in the process of making our sizable labeled dataset available for analysis.
3 Materials 3.1 Exploratory Data Analysis Exploratory data analysis is a crucial procedure that entails performing early investigations on data in order to find patterns, identify anomalies, test hypotheses, and validate assumptions with the aid of summary statistics and graphical representations. In the process of exploratory data analysis (EDA), a dataset is summarized and visually explored. Unsupervised learning, a suite of techniques for identifying
404
F. Thomas and P. Savaridassan
intriguing subgroups and patterns in our data, is a crucial component of EDA. EDA can be used to produce hypotheses as opposed to statistical hypothesis testing, which is used to disprove hypotheses (which can then be confirmed or rejected by new studies). Finding outliers and inaccurate observations with the help of EDA can result in a dataset that is cleaner and more usable. In EDA, we pose questions about our data and then endeavor to provide succinct statistical and visual responses. Some inquiries will turn out to be crucial, while others will not.
3.2 Handling Missing Values When no information is given for one or more elements, or for the entire unit, missing values can happen. Missing data are a major issue in real-world situations. In Pandas, missing data can also refer to Not Available (NA) values. In the Pandas DataFrame, the following handy functions can be used to find, replace, and remove null values: 1. 2. 3. 4. 5. 6.
isnull(). notnull(). dropna(). fillna(). replace(). interpolate().
3.3 Data Visualization It is significantly easier to understand trends or patterns in the data when it has been analyzed in the form of graphs or maps. Data visualization is the practice in question. There are many different kinds of visualization: 1 Univariate analysis: There is only one variable in this type of data. Univariate data analysis is the simplest sort of analysis because there is only one variable that varies. The analysis does not deal with causes or correlations; rather, its main objectives are to interpret the data and find any patterns in it. 2 Bivariate analysis: There are two different variables in this kind of data. The analysis of this kind of data focuses on linkages and causes, and it seeks to understand the causal connection between the two variables. 3 Multivariate analysis: Data with three or more variables are referred to as multivariate data.
Cervical Spine Fracture Detection
405
4 Dataset Description The challenge planning task force gathered imaging data, including about 3000 CT tests, from twelve sites across six continents to produce the ground truth dataset. To identify the presence, vertebral level, and location of any cervical spine fractures, professionals in spine radiology from the ASNR and ASSR gave expert image level annotations for these studies. A subset of the imaging datasets was automatically segmented using a 3D UNET model, and the segmentations were adjusted and approved by radiologists. The given segmentation labels have values of 0 for everything else and range from 1 to 7 for C1 to C7 (seven cervical vertebrae) and 8 to 19 for T1 to T12 (the center of your upper and middle back is home to twelve thoracic vertebrae). All scans show C1–C7 labels, but not all thoracic labels because we concentrated on the cervical spine. The following files adopted for the model training and testing are detailed as follows, train.csv: Metadata for the train–test set. 1. StudyInstanceUID—the research ID. Every patient scan has a different study ID. 2. patient_overall—an example of a target column. The result at the patient-level, i.e., whether any vertebrae are shattered. 3. C[1–7]—the other target columns. Whether the given vertebrae are fractured. See this diagram for the real location of each vertebra in the spine. test.csv: The test set prediction structure’s metadata. The test set’s initial few rows are the only ones that can be downloaded. 1. row_id—the row ID. 2. StudyInstanceUID—the study ID. 3. prediction_type—which of the eight target columns require a forecast in this row. Each image is stored in a DICOM file. The bone kernel, axial orientation, and slice thickness of the DICOM image files are all less than 1 mm. A few of the DICOM files are JPEG compressed. To read the pixel array in these files, you might need extra tools like GDCM and pylibjpeg.
5 Methodology The motivation to use AI for this task is that a quick diagnosis can reduce the chance of neurologic deterioration and paralysis after trauma. The dataset we are using is made up of roughly 2000 CT studies, from twelve locations and across six continents. Spine radiology specialists have provided annotations to indicate the presence, vertebral level, and location of any cervical spine fractures. We must forecast the likelihood of fracture for each of the seven cervical vertebrae identified by the letters C1, C2, C3, C4, C5, C6, and C7 as shown in Fig. 1, as well
406
F. Thomas and P. Savaridassan
as the likelihood that the cervical spine will sustain any fractures overall. Notably, fractures of the thoracic spine, ribs, and clavicles are not taken into consideration. The metric is a weighted multi-label logarithmic loss averaged across all patients, as shown in Fig. 2. L i j = −w j yi j log pi j + 1 − yi j log 1 − pi j ,
(1)
where the weights are given by
Fig. 1 Figure shows the different bones and sections on the Human Vertebral Column. The area we are focusing on is the cervical spine from C[1] to C[7].
Cervical Spine Fracture Detection
407
Fig. 2 Figure represents the weighted multi-label loss graph
⎧ 1, if vertebrae negative ⎪ ⎪ ⎨ 2, if vertebrae positive wj = ⎪ 7, if patient negative ⎪ ⎩ 4, if patient positive
(2)
Positive instances are given greater consideration, as well as the overall likelihood of any fractures. Pydicom library is used to open and explore the training set. The training set consists of .dcm files. The Digital Imaging and Communications in Medicine (DICOM) format are used by .dcm files. It serves as the industry standard for the storage of medical photographs and associated metadata. Although it has undergone several revisions, it was first published in 1983. The Neuroimaging Informatics Technology Initiative (NIfTI) format is used by .nii files. NIfTI is less complicated and easy to support than the DICOM. In order to diagnose and localize fractures to the seven vertebrae that make up the cervical spine, a machine learning model would be created that matches the radiologists’ performance. Segmentations of the CT scan give us the location of the vertebrae, which is very helpful because we know that the fracture can occur only in these regions. Bounding box measurements are useful in telling exactly where the fracture has occurred. An object localization algorithm can be trained to provide bounding boxes for the whole dataset.
6 Result A total of 2019 cervical spine examinations were identified. The total goal is very evenly distributed (52/48 split). As shown in Fig. 3, 1058 patients do not have a fracture, whereas 961 patients have at least one fracture. The C7 vertebra has the
408
F. Thomas and P. Savaridassan
highest percentage of fractures (19%) and the C3 vertebra has the lowest percentage (4%) as shown in Fig. 4. Many patients have multiple fractures, shown in Fig. 5. If a patient experiences multiple fractures, they are more likely to happen in vertebrae close to one another, such as C4 and C5, rather than C1 and C7. Bounding box measurement shows the area of the spine which has been ruptured as shown in Fig. 6.
Fig. 3 Figure shows the number of patients having a fracture
Fig. 4 Figure shows the number of patients having fractures in each vertebra
Cervical Spine Fracture Detection
409
Fig. 5 Number of fractures by patient Fig. 6 Figure represents the bounding box for a single image slice
7 Conclusion The goal of this study is to determine whether cervical spine fractures may be detected and located more easily with the use of artificial intelligence. Using graphical representations and statistical summaries, exploratory data analysis has been used to study the data using visual approaches; used to find trends, patterns, or to test hypotheses. The convolutional neural network has potential for helping radiologists to prioritize their work lists and find cervical spine fractures on CT scans. It is crucial to comprehend the convolutional neural network’s advantages and disadvantages before successfully implementing it in therapeutic settings. Convolutional neural network diagnostic utility will be improved with further sensitivity enhancements.
410
F. Thomas and P. Savaridassan
8 Scope of Work To stop neurologic degeneration and paralysis following trauma, this procedure will aid in immediately identifying and localizing any spinal fractures. The project will be developed further by training an EfficientNetV2-based classifier to train the images and for further predictions. The project may be improved in ways to identify both acute and chronic fractures, which is a further aspect of its reach.
References 1. Small JE, Osler P, Paul AB, Kunst M (2021) Ct cervical spine fracture detection using a convolutional neural network. Am J Neuroradiol 42(7):1341–1347 2. Salehinejad H, Ho E, Lin HM, Crivellaro P, Samorodova O, Arciniegas MT, Merali Z, Suthiphosuwan S, Bharatha A, Yeom K, Mamdani M (2021) Deep sequential learning for cervical spine fracture detection on computed tomography imaging. In: 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI), pp. 1911–1914, April. IEEE. 3. https://www.ncbi.nlm.nih.gov/pmc/articles 4. https://www.kaggle.com/competitions/rsna-2022-cervical-spine-fracture-detection/overview 5. RSNA Fracture Detection: DICOM & Images Explore by Andrada Olteanu 6. Fork of great EDA with fix for slice count dist by Tomasz Bartczak 7. What are .DCM and .NII files and how to read them by Robert Kwiatkowski
Unmanned Ground Vehicle for Survey of Endangered Species Kesia Mary Joies , Rahul Sunil , Jisha Jose , and Vishnu P. Kumar
Abstract Wildlife monitoring and surveying of endangered species have become important for conserving their habitat and preventing its extinction from the planet. A non-invasive method using unmanned ground vehicles (UGV) with advanced cameras and minimal motion is a solution. Regular surveying of endangered species is important for articulating conservation laws and ensuring that they are effective in helping the ecosystem stay in balance. Our developed solution will not depend on any external communication agents like Internet or GSM, therefore enabling it to be deployed anywhere and controlled within a range of 6 km, and travel across different terrains. This will enable a search of the species across different areas, and detected images will be transferred via XBee modules. Our project works toward replacing invasive attached devices in favor of non-invasive remote detection with smart cameras, advanced LiDAR sensors and machine learning. The advancement in sensor technology and the rise of unmanned systems have paved the way for automated systems. We have identified several high-profile species of wildlife for which smart camera detection could be enhanced if sensors and smart sensors were mounted on an unmanned ground vehicle (UGV). Keywords Unmanned ground vehicle · XBee · OpenMV camera
K. M. Joies (B) · R. Sunil · J. Jose · V. P. Kumar Department of Computer Science and Engineering, Mar Baselios College of Engineering and Technology, Thiruvananthapuram 695015, Kerala, India e-mail: [email protected] R. Sunil e-mail: [email protected] J. Jose e-mail: [email protected] V. P. Kumar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 689, https://doi.org/10.1007/978-981-99-2322-9_30
411
412
K. M. Joies et al.
1 Introduction An endangered species is a group of organisms that have a risk of becoming extinct. More than 90% of all species that have ever lived on earth have become extinct. When a species becomes endangered, it is a sign that an ecosystem is out of balance and the consequences can be critical [1]. Regularly surveying species and wild places is the only way we can tell if conservation work is needed or working. The conventional way of surveying is a tedious task that requires hundreds of hours of manual labor, and yet the results could be inaccurate. Our project works toward replacing invasive attached devices in favor of noninvasive remote detection with smart cameras, advanced LiDAR sensors and machine learning. The advancement in sensor technology and the rise of unmanned systems have paved the way for automated systems. We have identified several high-profile species of wildlife for which smart camera detection could be enhanced if sensors and smart sensors were mounted on an unmanned ground vehicle (UGV). UGVs could also be valuable for LiDAR detection and counting of critical insect pollinators and pests on intensive agricultural operations. UGVs have already been proven to be valuable tools for precision agricultural operations including applying pesticides for berry and vegetable crops. The initial goal is to develop the unmanned vehicle prototype for detecting and surveying them. The UGV will likely remain stationary at a survey point for 24 h as the smart camera searches for recognizable moving objects. After completing 24 h at one site, we will send commands to the UGV to move a short distance and begin another 24-h search.
2 Existing Works In recent years, technology has taken a huge leap, and unmanned vehicles are used in various applications especially for detection, surveying and monitoring scenarios. Unmanned vehicles are of various types depending on where it is deployed. They can be moving on the ground (unmanned ground vehicle—UGV), in the air (unmanned aerial system—UAS or known as ‘drone’), at the sea surface (unmanned surface vehicle—USV) or in the water column (unmanned underwater vehicle— UUV). We had read various research papers to check on the existing solutions. The paper [2] that focuses on detecting and monitoring marine fauna uses unmanned surface vehicle (USV) or unmanned underwater vehicle (UUV) depending on the animal to be detected. This showed that the selection of the vehicle depends on the location to be used, target species and the behavior of the marine fauna. For detecting and maintaining livestock and to limit the involvement of farmers, an autonomous unmanned ground vehicle-based paper [3] proposes a solution using restricted supervised learning and image fusion. This work exploits multiple cameras in the UGV for automating the counting process. Another paper [4] that explains the method of autonomous flocking of UAV and UGV is used to incorporate them on a smaller
Unmanned Ground Vehicle for Survey of Endangered Species
413
scale using a Husky robot. This helps to detect the objects in simulation. For wildlife research, unmanned aircraft systems are also used in a paper [5]. For an efficient animal detection system, we came across a paper [6] that proposes the model ‘IFFUZ’, which recognizes the animals using thermal images. It is robust to several challenging image conditions like low contrast/illumination, haze/blur, occlusion and camouflage. Existing approaches pose certain limitations on their approach. The paper that works on autonomous flocking of UAV and UGV [4] shows a limitation to detect objects quickly in a timely manner. The Husky robot was unable to react accordingly. Also, the method for animal detection proposed in one of the paper [6] seemed really efficient, but it showed that it has a low accuracy in detecting animals from RGB images during the day. A technique for monitoring irrigation [7] was discussed in paper, but limits for scalability.
3 Proposed Solution An endangered species is a group of organisms that have a risk of becoming extinct. More than 90% of all species that have ever lived on earth have become extinct. When a species becomes endangered, it is a sign that an ecosystem is out of balance and the consequences can be critical. Regularly surveying species and wild places is the only way we can tell if conservation work is needed or working. The conventional way of surveying is a tedious task that requires hundreds of hours of manual labor, and yet the results could be inaccurate. The proposed methodology is deploying a non-invasive unmanned vehicle with minimal motion to monitor and survey the movements of an endangered species in its own habitat. We are developing the software as well as assembling the hardware components needed for our project to ensure the most compact and efficient system needed.
3.1 Hardware Architecture The hardware components needed for the project are assembled on a PCB and integrated onto a chassis and connected using an ARM Processor. The entire system is powered using a 10000 mAh LiPo battery. The architecture (Fig. 1) of the tank chassis is built with an advanced camera (OpenMV H7 Plus), XBee module, GPS module and motor encoders fitted on the ARM Processor board. The components and its usage are described in Table 1.
414
K. M. Joies et al.
Fig. 1 Hardware architecture
Table 1 Hardware components of UGV Components
Description
Chassis
A shock absorption metal robot tank car chassis with tracks is used as the chassis for UGV which will provide better grip and stable movement over uneven terrains
Advanced camera
An OpenMV Cam H7 Plus for capturing the image which is used as an input for the surveying and monitoring software part as well as for the navigation
GPS module
It is attached for better navigation from long distances by receiving the location of UGV
ARM Processor It is part of RISC architecture, excelling at power efficiency and heat dissipation Motor encoder
It is mounted on a shaft under the chassis, tracking the speed and rotation of the shaft for precision movements
XBee Pro S2C
It is used for communicating between the UGV and the controller
3.2 XBee Pro S2C The communication channel between the UGV and the controller is established using XBee [8], therefore enabling the controller to be as far as 6 km from the unmanned vehicle. The communication channel is used for sending the captured images as well as navigation of the UGV. It works on ZigBee mesh communication protocols [9]. It offers point to multipoint device connectivity with ease and is a cost-effective wireless communication. It is widely used in embedded solutions and is a modem used for communication and high connectivity. It offers an outdoor range up to 2 miles (3.2 km) line of sight and an indoor range up to 300 feet (90 m).
Unmanned Ground Vehicle for Survey of Endangered Species
415
3.3 Software Methodology 3.3.1
UGV Software
A MicroPython script will be running on the UGV for interfacing the hardware, capturing the images and for communicating with the controller unit. Interfacing the Hardware: The MicroPython script running on the ARM Processor will have direct access to the hardware components including OpenMV camera, XBee module, motors (via a driver unit) and the GPS module. Capturing the Images: The MicroPython script is responsible for capturing the images in regular intervals as well as storing it in the correct format. Communicating with Controller Unit: The data from the UGV is sent to the controller as packets since the maximum transfer size of XBee module is 256 bytes. The MicroPython script is responsible for splitting the original data to be sent into packets and sending it reliably.
3.3.2
Controller Software
The Python script running on the controller unit is responsible for interfacing with the XBee module as well as connecting with EdgeImpulse for processing the captured images. The script running on the controller is also responsible for sending navigation commands to the UGV.
3.3.3
Image Transfer Using XBee
Since the maximum transfer size of the XBee module is 256 bytes, the captured images are compressed and split into packets before sending it to the controller as shown in Fig. 2. Compression: The images are compressed using the Pillow module which significantly reduces the size of the image without compromising on the quality of the image. Fig. 2 Image transfer using XBee modules
416
K. M. Joies et al.
Asynchronous Data Transfer: To ensure least latency, asynchronous data transfer is used. Reliability of the Transfer: MD5 Hash of the original image is generated and is sent as the last packet from the UGV to the controller for verifying that the entire image is transferred without packet loss. Moreover, a packet numbering method is also used to identify the dropped packets for resending. Reconstruction of the Image: The received packets are reconstructed in the controller, and the hash of the reconstructed image is cross-verified with the hash of the original image received in the last to ensure the reliability of the system.
3.4 Software Tools For integrating and scripting the modules for its connectivity with the UGV, various IDE and software tools are used. Digi XCTU: It is a freely available multi-application platform providing tools to test and configure the XBee modules. OpenMV IDE: It provides a development environment for interfacing the OpenMV module camera. It provides an easy platform for testing and viewing the camera, along with a powerful debug terminal, histogram viewer and text editor.
4 Results The OpenMV camera is fixed on top of the board for capturing the image. Initially, the image captured was found to be difficult in transferring to the XBee module at the controller side, as there is a limit of just 256 bytes that can be transferred via XBee modules. Using data packet transmission, images are divided into several small packets and compressed reducing the bytes to be transferred. This allowed the images to be sent and received on the other side. This is done using the ZigBee communication protocol, supported by XBee hardware. To ensure that the packets do not get dropped on transfer and enhance the security, the hash of the original image transferred from the XBee module connected on the board is sent along with the packets and thereby confirmed on the other controller end. The total count of packets transferred is also cross-verified to ensure that image is received correctly. Tests were performed (Fig. 3) on sending different images to the controller end of different image sizes and showed that the image was transferred with no packets being lost on transfer and security instantiated.
Unmanned Ground Vehicle for Survey of Endangered Species
417
Fig. 3 Testing image transfer using XBee modules
5 Conclusion In this paper, a non-invasive method is used for surveying the endangered species and tracking them. The unmanned ground vehicle with the mounted OpenMV camera captures the information of the species and sends them to the controller using the XBee modules attached to the chassis and communicating with the other module present on the controller from a kilometer away.
6 Future Scope 6.1 Insect Pollinators Further development of this system can also be used for monitoring insect pollinators, which helps in precision agriculture [10]. Using unmanned vehicles poses a costeffective solution. This will help in getting a better knowledge of the pollinators and their interaction with the plants [11]. It will help in the research and study of the plant–insect interactions. Acknowledgements We would like to thank Microsoft and Field Data Technologies (FDT) and Dr. Tessy Mathew (Professor and Head of Department of Computer Science and Engineering, MBCET) for giving us the opportunity to work on this idea. This paper and the research behind it would not have been possible without the exceptional support of our mentors Mr. Douglas M Bonham (President and CEO of Field Data Technologies and Sr. Research Development Engineer,
418
K. M. Joies et al.
Microsoft Surface Pro) and Mr. Niraj Nirmal (Technical Program Manager, Microsoft), for the timely support and funding for helping us reach our project to completion.
References 1. Scheele BC, et al (2018) How to improve threatened species management: An Australian perspective. J Environ Manag 223:668–675 2. Verfuss UK, et al (2019) A review of unmanned vehicles for the detection and monitoring of marine fauna. Marine Pollution Bull 140:17–29. https://doi.org/10.1016/j.marpolbul.2019. 01.009. Epub 2019 Jan 18. PMID: 30803631 3. Meena SD, Agilandeeswari L (2021) Smart animal detection and counting framework for monitoring livestock in an autonomous unmanned ground vehicle using restricted supervised learning and image fusion. Neural Process Lett 53:1253–1285. https://doi.org/10.1007/s11063021-10439-4 4. Shi Y (2021) Autonomous flocking of UAV and UGV using UWB sensors and visual-inertial odometry (Doctoral dissertation) 5. Christie KS, et al (2016) Unmanned aircraft systems in wildlife research: current and future applications of a transformative technology. Front Ecol Environ 14(5):241–251 6. Meena D, Agilandeeswari L (2020) Invariant features-based fuzzy inference system for animal detection and recognition using thermal images. Int J Fuzzy Syst 22(6):1868–1879. https://doi. org/10.1007/s40815-020-00907-9 7. Hapsari GI, et al (2020) Wireless sensor network for monitoring irrigation using XBee Pro S2C. Bull Elect Eng Info 9(4):1345–1356 8. Ramesh B, Panduranga Vittal K (2020) Wireless monitoring and control of deep mining environment using thingspeak and XBEE. In: Innovative data communication technologies and application: ICIDCA 2019. Springer International Publishing, pp 440–446. https://doi.org/10. 1007/978-3-030-38040-3_50 9. Pereira DS et al (2020) Zigbee protocol-based communication network for multi-unmanned aerial vehicle networks. IEEE Access 8:57762–57771. https://doi.org/10.1109/ACCESS.2020. 2982402 10. Mateo-Aroca A, et al (2019) Remote image capture system to improve aerial supervision for precision irrigation in agriculture Water 11(2):255. https://doi.org/10.3390/w11020255 11. Pegoraro L, et al (2020) Automated video monitoring of insect pollinators in the field. Emerging Topics Life Sci 4(1):87–97. https://doi.org/10.1042/ETLS20190074. PMID: 32558902
AI/AR and Indian Classical Dance—An Online Learning System to Revive the Rich Cultural Heritage Gayatri Ghodke and Pranita Ranade
Abstract Dance is one of the oldest forms of traditional artistic expression, with unique vivacity and creativity. Art forms such as the symphony, which means singing, playing instrumental, and dancing together, are called taurya-trikam in Sanskrit. The Guru–Shishya Parampara, a succession from Guru to disciple, has been an inescapable part of the education system in Indian classical dance. The evolution of dance to date results in the modernization of teaching techniques. With society’s increasing interest in Indian classical dance styles and potential buyers of the services, it is becoming difficult for new-age learners to keep up with perfection and balance studies. Immersive technology is currently widely used in instruction-based learning. Recently, a fundamental shift in the performing arts system has been observed with the adaptation of technology and immersive media for connecting dancers to large platforms via live performances and social media. This research paper aims to comprehend the educational techniques, challenges faced, and the fundamentals of dance practitioners who are pursuing classical forms. The proposed study is based on a systematic framework and implication of the design thinking process. Results are established through survey analysis, ideation, research methodology, and usability testing. The proposed design schema primarily focuses implementation of contemporary technologies such as AI, AR, blockchain, and emotive interaction design, delivering a unique combination of functionality and culture. Keywords Human–Computer Interaction · AI/AR · Online Learning · Indian Classical Dance · Kathak
G. Ghodke · P. Ranade (B) Symbiosis International (Deemed University), Symbiosis Institute of Design, Pune, Maharashtra, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 689, https://doi.org/10.1007/978-981-99-2322-9_31
419
420
G. Ghodke and P. Ranade
1 Introduction Cultural diversity and preservation contribute to sustainable development goals. Dance is a devoted part of Indian culture. India’s performing arts market is within the age range of 8–25 and $3.8 billion [1]. This paper emphasizes the educational aspect of classical dancers willing to pursue a career or learn to preserve the Guru– Shishya Parampara and the rich heritage of Indian classical dance. People are trying to explore their interests and are shifting from traditional professions to their desired ones. Dancing provides a broader scope for beneficial opportunities, e.g., good mental and physical well-being, stability, and career goals [2]. A survey gives an overview of how dancers face challenges related to practicing, learning, knowing the theoretical aspects, and balancing it with the routine [3]. With the revolutionary advancement in technology, the cultural and performing arts sector has grown considerably. Artificial intelligence in education proposes a range of potentials to the teacher and the student in the teaching–learning process. According to UNESCO, artificial intelligence has the supremacy to transform education systems and services [4]. The paper focuses on issues such as resolving long-term learning retention and poor efficiency in mastering dance. A constructive approach such as AR and AI-based dance will help self-analyze and learn dance styles efficiently toward mastering them, which is also cost effective. Some influential factors were determined to report the specific problems by bridging the knowledge gap on AR-based systems for self-learning in dance [5]. The above-mentioned vital points are briefly discussed in the paper. It will be beneficial while determining the critical factors for the design solution proposal. The design thinking approach comforts throughout the phases. It aims to ensure the effectiveness of educational and training-based systems in Indian classical dance forms using contemporary technologies.
2 Design Research and Methodology The design thinking process has been followed throughout the phases to develop and propose the design concept. The data was collected using the mixed method approach. The research phase began with secondary research to understand the topic and user behaviors through the literature review, collecting current statistics, Five Why Chart, affinity mapping, data models, and competitor’s analysis. However, primary research methods include contextual inquiry, user survey analysis, and interviews. The survey conducted was circulated with the dance performing artist. However, the contextual inquiry was conducted in the dance academies in Mumbai, India, with dancers of varied age groups. A fishbone diagram (Ishikawa Cause & Effect Diagram) is plotted, as shown in Fig.1, to understand the potential causes of a problem area. It is usually termed a cause-and-effect diagram. The project’s scope was identified on the journey of proposing a solution.
AI/AR and Indian Classical Dance—An Online Learning System …
421
Fig. 1 Cause-and-effect diagram for introducing dance system for education
3 Secondary Research 3.1 Literature Review Several cultural heritage and Indian classical art forms related to research documentation are available. A theoretical framework was formed to understand the various aspects related to the history of Indian classical dance, traditional learning methods, interruptions in learnings due to the pandemic situation, challenges faced by classical dance performers, benefits of practicing dance, etc. The information is collected from JStor, Google Scholar, WoS, and Scopus databases. Further, selective data is analyzed to fulfill the requirement of the paper. It provides a wide range of statistics on the correlation of dance, design and the need for technology intervention, and benefits. The findings of the literature review have been discussed below.
3.2 Indian Classical Dance—Traditional Learning Methods and Benefits The history of the “Kathak” dance dates back to Vedic times. The word Kathak is derived from “Katha”, which means storytelling or narration. The Kathakars were known as the narrators who used to travel across to reach out to people and narrate the great Indian epics. The medium of depiction was linked, and a strong bond between dance, poetry, and music was formed [6]. Dominika [7] explains the relationship between dance movement and culture. She states the importance of dance in terms
422
G. Ghodke and P. Ranade
of socialization and its benefits. Earlier, dance was considered an informal education accompanied by traditional cultural activities, which resulted in the globalization of society. There is a correlation between appearance, dressing, and makeup with dance. With the help of Guru–Shisya Parampara, the transfer of knowledge from generation to generation via altruistic ways significantly impacted society. Dance acts as a bridge between India’s history and its audience. An enormous amount of expertise was portrayed using Sangeet Nataka, which means musical drama as a medium. Many Indian classical art forms are described in the ‘Abhinaya Darpana’ and ‘Natyashastra’ [8]. Dance combines physical, emotional, intellectual, and social human activity. Mudras are classified sets of body movements and gestures in Yoga and classical dance forms. To be more specific, they are hand gestures used to express emotions. The pressure, accompanied by dance movements and gestures, results in coping with stress and physical imbalance. A study states that dance movement therapy helped children between 7 and 11 years in a partial hospital program [9]. Dance offers an energetic, non-competitive form of exercise that positively impacts physical health, mental health, and emotional well-being, which results in holistic growth in children. The ideology of dance is based on the correlation between body and mind. Expression, body balancing and movements, and muscle constriction strongly influence therapeutic movements [2].
3.3 The Digital Challenges of Classical Indian Arts Artificial intelligence serves a purpose in education and learning systems. Some effects of AI, such as administration and instruction-based learning, are considered preliminary. AI is a field of innovation and expansion implemented in machines, IoT-based systems, and artifacts with human intelligence, adaptability, and decisionmaking capabilities. The study says that AI is extensively adopted and used in educational institutes [10]. The implementation of AI is beneficial to various domains. The in-depth research describes the applications of AI-based learning in dance. Some recognized problem areas in preserving art depend on multiple sources to track their progress, adaptability with learning systems, and time management. AI-based learning includes human action recognition via sensors, training, and body movement identification. Machine learning allows recognizing dancers’ training and performances and sets algorithms for detailed analysis reports [11]. The author defines how AI’s deep learning (DL) approach is helpful in learning dance. With the development of AI and DL, the neutral network serves a meaningful purpose for preserving dance movements from generation to generation. It is a trained pattern that generates the original information without altering it. The model is trained and self-supervised with audio and video [12].
AI/AR and Indian Classical Dance—An Online Learning System …
423
4 Primary User Research An online user survey was curated to understand the fundamentals of the target group (dancers within the age group of 15–30 years). It helped to understand people’s perceptions of challenges faced, the scope of improvement, and areas of interest among 45 samples. The research provides insights into the intervention of dance and technology (AI, blockchain) to help improve the overall experience and comprehend the user’s opinion on contemporary technologies. From the survey, it is understood that 84.2 percent of respondents use mirrors during dance practice, and 95.6 percent feel that dance helps maintain a work-life balance. Dance learners and practitioners face challenges such as knowing theory and finding resource material.
4.1 Defining Users The problem area defines target users based on their age, profession, background, and facing challenges. Primary user groups categorized as the students who join the platform through the app, and students were taken into consideration with their roles, e.g., to learn–practice classical dance, to use various resources (lehra, theory), and to keep track of progress. Secondary users can be dance class owners, teachers, and experts. They can update dance lessons, check students’ tracks, formation, and other stage setting guides. Tertiary users can be developers or tech-team, provide support to upgrade the algorithms, app/product maintenance and development.
4.2 Competitor’s Study A competitor’s study was conducted to understand direct and indirect competitors and the scope and areas of improvement. Benchmarking is used to gather and analyze data stated in Table 1. Direct competitors are apps/system designs contributing to the Indian classical dance styles, and indirect competitors are applications or productbased designs. Different dance styles and techniques are used for practice sessions.
4.3 Contextual Inquiry Contextual inquiry gathers responses that include in-depth observations and interviews of user groups to understand work practices, behaviors, and mental models. Based on the outcome, varied contextual design models help document user data from different aspects. Contextual system models were made, such as the flow, cultural, and sequence models. The flow model is presented in Fig. 2. The flow model signifies
424
G. Ghodke and P. Ranade
Table 1 Benchmarking for specifications of competitors Features
Competitor’s Kathak Studio
Lehra
Steezy
Dance Fit Studio Lite
Dancy
Dance training step instructions
✘
✘
✔
✔
✔
Lehra track for dance
✔
✘
✔
✘
✔
Formation technique
✘
✘
✘
✘
✘
Informational video
✘
✘
✔
✔
✔
Technology intervention
✘
✘
✘
✘
✘
Progress analysis
✘
✘
✘
✘
✘
Personalization
✘
✘
✘
✔
✘
Gamification
✘
✘
✘
✘
✘
Multiple dance facilities
✘
✘
✘
✔
✘
Video-based learning
✘
✘
✔
✘
✔
Tempo adjustment
✔
✔
✘
✘
✘
AI/AR/VR/3D model
✘
✘
✘
✘
✔
Dance level analysis
✘
✘
✔
✘
✘
Default usage for all screen sizes
✘
✘
✘
✔
✘
Total features available
2
1
5
5
5
the connection between target groups and the system and focuses on the steps and attributes. The cultural model (Fig. 3) aims to know the values of the context user group and influencing factors. The sequential model helps understand the steps associated with the user’s trigger, intent, and pain points. According to the study and mapping of the models, light was shed on the users’ pain points. A widespread application-based system with intelligent assistive products has been proposed.
4.4 Conceptualization and Ideation Brainstorming methods, e.g., scamper, mind mapping, and analogous thinking, were implemented during the ideation phase. Further, an application and product-based solution which works on syncing data expertise is proposed based on the fundamental needs of the target group. The design concept focuses on contemporary technology intervention with features like artificial assistance for dance pose correction, sensors for movement tracking, blockchain for user authentication to avoid scams of teaching academies, gamification for user engagement, personalization, and detailed process tracking. Table 2 describes the thought process toward ideation using the SCAMPER model. The application is primarily designed for dancers ranging from beginners to
AI/AR and Indian Classical Dance—An Online Learning System …
Fig. 2 Flow model of dance practitioners and system
Fig. 3 Cultural model—user groups and influencing factors
425
426
G. Ghodke and P. Ranade
professionals. The use of interaction design benefits, making the app accessible to users in terms of screens, processes, content, etc. Figure 4 shows the following features which the proposed mobile application will have. • It will guide students to learn the classical dance form Kathak with free resources. • It will offer access to all Kathak Taal contents and practice sessions for particular steps, i.e., Tukda (Tihaai, Kavita, Toda). • In accordance with practice sessions, it can scan dancers’ movements and provide appropriate feedback via artificial intelligence. • It syncs the data across devices for user accessibility. Smart mirror as a product works on the execution of IoT in the dance (Kathak) domain, which would help users identify their mistakes while dancing and smart assistance to provide feedback. The application aims to offer a seamless experience for users to help them deal with the educational aspect of dancing and pursue it in a balanced way. Sequential Model—how to use the product is presented in Fig. 5. All proposed features and concept sketches are tested before actual implementation. Some methods used for formative testing are performance tests and card sorting to know users’ behavior toward the app’s architecture. Visual designs were anticipated when people responded to the low-fidelity sketches.
4.5 Product (Smart Mirror) Mirror serves many purposes for dancers. It provides a user dashboard to keep progress track and real-time feedback while dancing in front of the mirror, e.g., mudra, pose correction, and formation techniques. Also, AI supports costume trials according to performances and makeup trials based on user recommendations, resulting in time saving for dancers. The functionality and the technological implementation of the product are presented in the form of artifact and physical models, presented in detail in Fig. 6.
4.6 Usability Testing To test the proposed design with users, a system usability test was conducted. The brief, including the concept, pain points, and a prototype, was provided to the user. The score is calculated using the SUS score formula. Tables 3 and 4 denote the results of the system usability scale (SUS) and score.
Combine
Can they be combined to create a more efficient customer experience?
The intervention of technology and culture
Functionality and aesthetics
Substitute
What can you substitute or change—whether that’s your product, problem, or process?
Using a good quality plane mirror for perfect reflection
A touch-efficient LCD screen
SCAMPER
Table 2 SCAMPER model and brainstorming
Can you change the process to work more efficiently?
Modify
A default adaptable Algorithm service via app/ updating for mirror efficiency
By using analogous Implementing thinking some good AI assistance
How can we make the process more flexible?
Adapt
Costume trials, makeup trials, significance of particular tukda
Can be used in varied versions
What benefits would be gained by using the product elsewhere?
Put to another use
Could your team rearrange or interchange elements to improve results?
Reverse
Providing guidelines for usage
Try reverse engineering for a theory-based learning process
The navigation Use futuristic of the app approach
What can be removed or simplified?
Eliminate
AI/AR and Indian Classical Dance—An Online Learning System … 427
428
G. Ghodke and P. Ranade
Fig. 4 Features — proposed design concept
Fig. 5 Sequential model—how to use the product
4.7 Discussion Secondary research methods, e.g., literature review, contribute to understanding indepth knowledge about the education aspect and evolution of AI in dance, etc. However, the implementation of the concept is seen in India. Conducted user surveys enlightened the fact that there is a need for technological intervention in performing arts. Dance learners and dance educators encountered difficulties such as time management, training, and practicing. It was observed that the mirror is extensively used by most dance practitioners through which they analyze their expressions, gestures, body movements, positions, dance formation, etc. The competitor’s analysis highlights the current industry trends and already-established applications and products. Direct and indirect competition proves valuable while offering design concepts keeping flaws in mind. Conceptual inquiry states essential factors corresponding to the connection between users and the system, influencing users’ cultural
AI/AR and Indian Classical Dance—An Online Learning System …
429
Fig. 6 Artifact and physical model with details of functionality and technology implementation
Table 3 User score documentation with calculations Criteria
User 1
User 2
User 3
User 4
User 5
User 6
User 7
Frequency
4
5
5
5
3
5
4
Complexity
2
1
1
1
1
1
2
Easy to use
4
5
4
5
3
4
4
Assistance
1
1
4
3
2
1
1
Functionality
5
5
4
4
3
4
4
Inconsistency
1
1
1
1
1
1
1
Learning
5
5
3
4
4
3
5
Difficulty
1
1
3
1
2
1
1
Confidence
5
5
4
4
4
4
5
Knowledgeable Total
1
3
3
2
2
1
3
37
36
28
34
28
35
34
and mental models. It would be beneficial for dance practitioners to have an intelligent self-learning and analyzing-based model that saves time. Mirror serves a primary purpose while dancing and has unique opportunities like sensors for body movement recognition and progress tracking. It is possible with AI-based learning and associated theory-based modules, providing scope for proposing the concept. After the formative testing with users, some design changes were made. Later, the summative testing SUS method was used (Table 3). The score of SUS falls under the “A” category with a score of 82.14, (Table 4) which is excellent and in the acceptable range
430 Table 4 Final system usability score
G. Ghodke and P. Ranade
Participant
Calculate
Final score
User 1
37 × 2.5
92.5
User 2
36 × 2.5
90
User 3
28 × 2.5
70
User 4
34 × 2.5
85
User 5
28 × 2.5
65
User 6
35 × 2.5
87.5
User 7
34 × 2.5
85
Average score = 82.14
on the scale. It states that the design is efficient in terms of accessibility, desirability, navigation, and usability to satisfy the goal.
5 Conclusion and Future Scope Performing arts in the Indian context maintains the correlation between society and culture. Classical dance forms are considered therapy and part of art, music, and literature dating back to ancient times. Nowadays, people perceive dance as more than a hobby and are inclined toward the career aspect of it [13]. Curated survey and research methods state upcoming challenges for dance practitioners with time management, mastering the studies, self-doubting, etc. This paper proposes an endto-end concept proposal for dance practitioners to learn, practice, and self-analyze dance movements. With the help of contemporary technologies such as AI, personalization, blockchain, keeping information preserved, and smart assistance, users can get apt analysis for the dance postures, understand theories, validate the authenticity, and work on expressions and dance formations. Mirror being a day-to-day part of dancers’ life, it is easy to adapt. Smart mirror delivers exclusive features of selfanalyzing yourself without losing the authenticity of dance and helps overcome the challenges with expressions, formation techniques, mudras, and gestures. The design proposal was tested with the specific target group (dancers). With the changing times, it will be advantageous for dancers to learn desired dance forms and excel. The actual implementation of a design concept consisting of a smart mirror will turn out to be helpful in the coming times. It serves both the purposes of the problem area—overcome challenges and preserve the legitimacy of the culture. Not only Indian classical dance forms like Kathak, but the design proposal caters to providing a broad future scope to all performing arts.
AI/AR and Indian Classical Dance—An Online Learning System …
431
References 1. Kafqa, Performing arts market India, 9 May 2022 [Online]. Available: https://www.financial express.com/lifestyle/performing-arts-market-india-to-hit-7-billion-by-2027/2517796/ 2. Bhargava (2020) Therapeutic use of mudras in dance/movement therapy with children. Express Therapies, 16 May 3. Anand M (2021) The digital challenges of classical Indian. INFOCUS, 4, January–March 4. Conde-Zhingre LE (2022) Impact of artificial intelligence in basic general education in Ecuador. Scopus, 70, 7 June 5. Iqbal J (2022) Virtual reality. Acceptance of dance training system based on augmented reality and technology acceptance model. 26(1):33–54, 22p, March 6. Jog G (2021) Indian dance education, 17 October [Online]. Available: https://www.gaurijog. com/indian-dance-education/kathak/history-of-kathak-dance/ 7. Byczkowska-Owczarek D (2019) Dance as a sign: discovering the relation between dance movement and culture. Kultura i Społecze´nstwo, p. 74, January 8. Chatterjea A (1996) Dance research in India: a brief report. Dance Res J 28(1):91 9. Das R (2022) Why parents must nurture their child’s hobbies and interests [Online]. Available: https://www.educationworld.in/why-parents-must-nurture-their-childs-hobbies-and-interests/ 10. Chen ZL (2020) Artificial intelligence in education: a review. IEEE Xplore 8:75264–75278, 17 April 11. Li X (2021) The art of dance from the perspective of artificial. J Phys: Conf Series, 9 12. Liu YCKX (2022) The use of deep learning technology in dance movement generation, Front Neurorob, 5 August 13. Payal (2018) Hobby classes & benefits for children—dance, music, art & craft [Online]. Available: https://www.parentune.com/parent-blog/hobby-classes-for-children/204
Performance Analysis of Classification Algorithms for the Prediction of Cardiac Disease N. Jagadeesan
and T. Velmurugan
Abstract Cardiovascular disease instances are increasing exponentially every day in our digital age, unlike any other time in history. Early detection of heart disease, proper management, adoption of a healthy lifestyle, and additional research can all contribute to the early prevention of many cardiovascular illnesses. Clinical data mining is particularly effective when it comes to examining ambiguous data from specific clinical information collections. This study intends to present contemporary approaches to knowledge discovery in databases currently used for the classification of cardiovascular disease. The feature relevance analysis and the Precision Scrutiny are applied to the cardiovascular dataset to compare and analyze several classification methods in order to determine the most effective classification rule. The study project also aims to filter the data so that healthy persons and those with cardiovascular disease are appropriately identified. The k-Nearest Neighbor (k-NN), JRip, Random Forest, and AdaBoostM1 algorithms are used in this study to identify and examine heart disease in the selected dataset. The dataset was compiled using available web resources. Finally, in this analysis, the optimal technique for heart disease prediction is recommended. Keywords k-Nearest Neighbor (k-NN) algorithm · Random Forest algorithm · JRip algorithm · AdaBoostM1 algorithm · Classification techniques
N. Jagadeesan (B) Department of Information Technology and B.C.A, Dwaraka Doss Goverdhan Doss Vaishnav College, Chennai 600106, India e-mail: [email protected] T. Velmurugan Research Department of Computer Science, Dwaraka Doss Goverdhan Doss Vaishnav College, Chennai 600106, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 689, https://doi.org/10.1007/978-981-99-2322-9_32
433
434
N. Jagadeesan and T. Velmurugan
1 Introduction Cardiovascular disease (Cvd) represents one of the greatest agonizing and devastating ailments a person can face. Cvd currently causes more deaths than any other disease and affects people of all ages. High blood pressure (BP), considerably high cholesterol, and palpitations are the primary causes of underlying cardiovascular disease [1]. Furthermore, certain factors are unchangeable. Other risk factors for cardiovascular disease include alcohol and cigarette use. Many physical activities may indeed be affected by a heart problem. Tobacco use, high blood pressure, age, family history of Cvd, high cholesterol, and a poor diet are all risk factors for developing cardiovascular disease. Cardiovascular disease is caused by an increase in low-density lipoprotein (LDL) cholesterol in the body or blood. The pressure in the arteries during cardiac contraction and relaxation is known as systolic and diastolic blood pressure, respectively. Tobacco’s function in cholesterol accumulation in the artery is responsible for 40% of heart disease-related deaths. According to the World Health Organization (WHO), there could be around 30, 000,000 heart disease-related deaths by 2040 [2]. As the digital world evolves, technological solutions provide vast amounts of erroneous data, enabling the center of control. Sifting through this type of information for excellency can be challenging even for specialists. Deep Learning (DL), Machine Learning (ML), Data Mining (DM), and Artificial Intelligence (AI) are all relatively new and promising approaches for detecting relationships or classifying large datasets [3]. Clusters of various items or groups are given tags or classes using the supervised machine learning process known as classification. The classification procedure consists of two parts. The first step is model creation, which is well-defined as a database analysis of training records. In stage 2, categorization is done out utilizing the model of the created model. The percentage of correctly categorized test samples or records determines how accurately the classification is performed. The key objectives of this research work are as follows: • Heart disease can be accurately predicted at an early stage using the most effective categorization method. • Presentation of critical facts with care taken to identify underlying and recurring themes in the highlights. • The use of various measurement reduction techniques to enhance the size of the element space. • Correlation of multiple machines learning (ML) models to choose the optimum model for detecting heart disease. The format of this article is as follows. The second section discusses the definition of related research in the realm of cardiovascular. Section 3 looks at the materials and techniques, dataset descriptions, and classification algorithms used in this study. Section 4 discussed the results of the experiment of the classifier algorithms. Section 5 ends the research article and gives suggestions for future research.
Performance Analysis of Classification Algorithms for the Prediction …
435
2 Related Works Academics recently conducted substantial study to predict cardiovascular disease utilizing cardiovascular disease information. The majority of the critical characteristics, methodologies, limits, and benefits that distinguish our endeavor from others are explained below. This section summarizes various technical and review publications on the application of data mining techniques to the identification of coronary heart disease. Aggrawal and Pal [4] suggested a sequential feature selection technique for identifying mortality events during treatment in patients with heart disease in order to determine the most important attributes. According to the test results, the sequential feature selection technique outperforms the Random Forest classification algorithm by 86.67%. Gao et al. [5] created a strategy for predicting cardiovascular illness by combining ensemble approaches (boosting and bagging) with feature extraction algorithms (LDA and PCA). According to the study’s conclusions, the bagging ensemble learning algorithm with DT and PCA feature extraction generated the greatest results. Takci [6] used a variety of feature extraction techniques as well as 12 separate classification techniques to predict cardiovascular crises. Researchers conclude that, of the four different feature selection strategies, the ReliefF strategy provides the best model accuracy in terms of mean accuracy value. Latha and Jeeva [7] used ensemble classification and feature selection approaches to construct a strategy for predicting the risk of heart disease. It is demonstrated here how an ensemble classification method can increase a weak classifier’s accuracy by up to 7%. By integrating Chi-square and principal component analysis, G’arate-Escamila and colleagues [8] presented a hybrid dimensionality reduction model for predicting cardiovascular disease (CHI-PCA). After assessing the performance of the proposed method, the highest efficiency was discovered. Spencer et al. [9] evaluated four different feature selection algorithms using four regularly used heart disease datasets. The authors emphasize that the advantages of feature selection differ depending on the machine learning approach used for the cardiac datasets. Senan and colleagues [10] developed a diagnostic algorithm for diagnosing chronic renal disease using a dataset of 400 patients with 24 characteristics, and all classification algorithms performed well. Almansour et al. [11] used a stochastic detailed study plan to test two classifiers, SVM and ANN, and raise their parameters to aid in the early identification of chronic kidney disease. According to the findings, improving classifier accuracy includes training classifiers with relevant features chosen using a variety of feature selection methodologies. Throughout the investigation, Velmurugan et al. used the Named Entity Recognition (NER) method to discover phrases similar to the corpus of cardiac disorders in order to mine its relevance in clinical reports and other applications [12]. Effective Heart Illness Prediction by S. Mohan et al Discovering Relevant Variables Using Hybrid Machine Learning Techniques is an innovative approach to detecting relevant variables using machine learning techniques. This enhanced the prediction of cardiovascular disease. Several feature combinations and well-known classification algorithms are used to generate the prediction model [13]. With an accuracy of 88.7%,
436
N. Jagadeesan and T. Velmurugan
the prediction model for heart disease that combined a hybrid Random Forest with a linear model fared better.
3 Material and Methods Many researchers have examined heart disease data using classification algorithms to classify datasets with high accuracy and efficiency of learning algorithms such as Naive Bayes, SVM, k-NN, and Random Forest [14]. The study’s goal is to filter the data so that healthy people and those with cardiovascular disease can be distinguished. The heart disease data in this research project is classified using classification algorithms. In this study, the k-NN, JRip, Random Forest, and AdaBoostM1 algorithms are used to identify and analyze heart disease from the selected dataset. The implementation of a classification algorithm in Python involves several steps. First, the data must be gathered and pre-processed, which includes cleaning, normalizing, and encoding the data. Next, the data is split into training, validation, and test sets. After this, the model is built using a suitable classification algorithm, such as k-NN, JRip, Random Forest, and AdaBoostM1 algorithms. Finally, the model is evaluated using accuracy metrics such as precision, recall, and F1 score. After making any necessary improvements or adjustments, the model is ready to be deployed in production. The Architecture of proposed work is shown in Fig. 1.
Fig. 1 Architecture of proposed work
Performance Analysis of Classification Algorithms for the Prediction …
437
3.1 Description of Dataset A description of the algorithms used to classify heart disease. This page goes over the dataset. This study employed a dataset of 303 cases and 14 distinguishing characteristics [15]. The dataset is summarized in detail in Table 1. The label of the output characteristic (num) is divided into two groups to show the presence or absence of heart disease. These data records are created in an Excel data sheet and saved as CSV files. This study examines these data. The supervised learning approaches are applied in a sequential manner. The results of the confusion matrix are displayed as a 2 × 2 matrix. The confusion matrix is a useful tool for determining how well a classifier recognizes different types of tuples. The importance of true positive, true negative, false positive, and false negative values is demonstrated here. The error rate is used to calculate the classifier’s accuracy, and the best classifier algorithm is discovered through a comparative study.
3.2 Classification Algorithms The classification algorithm is a supervised learning method that classifies new results based on training data [16]. A classification algorithm organizes new findings into one of several classes or groupings after learning from a training sample or collection of data. Classes are also referred to as goals, labels, and categories. Various classification techniques are used by researchers to forecast and investigate cardiac illness; this section discusses the use of classification algorithms such as k-NN, JRiP, Random Forest, and AdaBoostM1 for the same purpose. In pattern recognition, the k-Nearest Neighbor (k-NN) classifier is widely used [17]. Relationship learning is used by k-Nearest Neighbor classifiers to compare a given test tuple and build tuples that are comparable to it. The preparation tuples are reflected in the n characteristics. All of the prepared tuples are saved in an ndimensional example space because each tuple represents a point in an n-dimensional space. A k-Nearest Neighbor classifier searches the example space for the K tuples that are most similar to the ambiguous tuple when given an ambiguous tuple. The strange tuple’s nearest neighbor is one of these k-preparing tuples. To create categorizing rules, the Rule Learners Classification Algorithm [18] is used. The Repeated Incremental Pruning to Produce Error Reduction (RIPPER) method is used by the JRip classifier. JRip classifiers analyze the classes and develop a preliminary set of rules for the class using incremental reduced-error pruning. The Random Forest classifier is a troupe computation based on bootstrap collection. Troupe approaches organize things by employing a large number of calculations of the same or different types (i.e., a group of SVM, guileless Bayes or choice trees, for instance.). Deep choice trees can suffer from overfitting, but Random Forests avoid this by building trees from random subsets. The basic reason is that by averaging a large number of projections, it balances out biases. Random Forest adds unpredictability to the
438
N. Jagadeesan and T. Velmurugan
Table 1 Description of dataset ID Name
Type
A
AGE
Quantitative Years of age
Description
Values
B
SEX
Categorical
C
Type of chest pain CP Categorical
D
TRESTBPS
Quantitative Blood pressure at rest (in Mean:131.6 from 94 to mmHg on admission to 200 the hospital)
E
CHOL
Quantitative Serum cholesterol Mean: 246.6 ranging 126 (milligrams per deciliter) to 564
F
FBS
Categorical
Fasting blood sugar level 1 = 45 more than 120mg/dl 0 = 258 (1=true; 0=false)
G
RESTECG
Categorical
Resting 0 = 151 electrocardiographic 1=4 data (0 = normal; 1 = 2 = 148 aberrant ST-T wave; 2 = probable or definite left ventricular hypertrophy according to Estes’ criteria, month of activity ECG reading)
H
THALACH
Quantitative Attained maximum heart Mean:149.6 between 71 rate to 202
I
EXANG
Categorical
J
OLDPEAK
Quantitative Exercise-induced ST depression in comparison to rest
K
SLOPE
Categorical
The slope of the peak 1=142 2=140 3=21 exercise ST section (1 = uphill, 2 = level, 3 = downhill).
L
CA
Categorical
The number of enormous 0=176 1=65 vessels (0 to 3) 2=38 3=24
M
THAL
Categorical
The condition of the heart (3 = normal; 6 = fixed defect; 7 = reversible defect)
Mean: 51.9 from 28 to 77
Sexuality (1 equals male; 1 = 206 0 equals female) 0 = 97 Type of chest pain (1=typical angina, 2=atypical angina, 3=non-anginal pain, and 4=asymptomatic)
Angina induced by exercise (1 = Yes; 0 = No)
1 = 23 2 = 50 3 = 86 4 = 144
0 = 204 1 = 99 Mean:1.03 spanning 0 to 6.2
3=168 6=18 7=117
(continued)
Performance Analysis of Classification Algorithms for the Prediction …
439
Table 1 (continued) ID Name
Type
Description
N
Categorical
Heart disease diagnosis 1=139:0=164 (1=presence; 0=absence)
TARGET
Values
model’s tree construction. Instead of the primary element, it looks for the best element among a random selection of highlights when dividing a hub. This product is available in several formats, each of which produces a superior model [19]. The AdaBoostM1 technique boosts the effectiveness of weak binary classifiers by increasing training on incorrectly classified data. Adaptive boosting (Adaboost) is a common boosting strategy that weighs a number of weak classifiers using classification error [20].
4 Experimental Results Because heart disease (HD) is one of the leading causes of death in humans, this research effort includes medical data on HD. The algorithms used to analyze the datasets in this work are k-Nearest Neighbor (k-NN), Rule Learners Classification Algorithm, Random Forest, and AdaBoostM1 because they provide superior accuracy for medical datasets. All of the values from the attributes that were used are pre-processed prior to categorization. This paper compares the classification accuracy of the Rule Learners Classification Algorithm, the Random Forest algorithm, and the AdaBoostM1 algorithm. Furthermore, the TP Rate, FT Rate, and precision analyses are performed. The formulas [21] listed below are used to compute various metrics. The formula defines precision P as the percentage of accurately predicted positive cases. Precision = A/(A + C)
(1)
The True Positive Rate (A), True Negative Rate (B), False Positive Rate (C), and False Negative Rate (D) are used in this example. According to the equation, recall, sensitivity, or true positive rate (TPR) is the percentage of positive cases that were correctly identified. Recall = A/( A = D)
(2)
The percentage of total predictions that were accurate is known as accuracy. The equation is used to calculate it. Accuracy = A + B/(A + B + C + D)
(3)
440
N. Jagadeesan and T. Velmurugan
True Negative Sensitivity (TN) is the proportion of all positive records that were correctly classified. Sensitivity = A/(A + D)
(4)
Specificity is the percentage of correctly classified positive records out of all positive records. F=
2 ∗ Precision ∗ Recall Precision + Recall
(5)
The trade-off between a classifier’s true positive rate (TPR) and false positive rate is shown graphically below (FPR). TPR = properly categorized positives/total positives FPR = erroneously classified negatives/total negatives The TPR is represented on the y axis, while the FPR is plotted on the x axis. The distribution of datasets for parameters such as Age, Gender, Trestbps, ChlThalch, oldpeak, Thal, and Target is depicted in Fig. 2. This graph depicts data from healthy people in the blue zone and heart disease patients in the red zone. The experimental findings of fundamental classifiers are discussed in this section. Using the aforementioned methodologies and the provided feature importance, the experiment was carried out on both the training and test datasets. The algorithms k-NN, Rule Learners Classification Algorithm, Random Forest, and AdaBoostM1 correctly identify them. The results of the k-NN technique are shown in Table 2. While the precision rate for healthy classes is low, and it is extremely high for HD. Table 2’s accuracy
Fig. 2 Distribution of data
Performance Analysis of Classification Algorithms for the Prediction …
441
performance based on categorization measures is depicted graphically in Fig. 3. All rates, including TP, FP, Precision, Recall, F-Measure, and ROC, are extremely high for the HD group and extremely low for the healthy group. Nonetheless, it is the weighted average of the class. Precision received a score of 0.762, Healthy received a score of 0.500, and all other statistical measures are mentioned in HD. Table 3 shows similar results for the JRip technique, and Fig. 4 depicts a graphical representation of the classification metrics values’ performance. The precision yield in the HD class was 0.763, while in the Healthy class it was 0.733, indicating that all rates (TP, FP, Precision, Recall, F-Measure, and ROC) are high in the HD class and very low in the Healthy class. Nonetheless, it is a weighted average for the class. The Random Forest results from this investigation are shown in Table 4. HD has been shown to have a high accuracy rate when compared to healthy classes. Figure 5
Fig. 3 k-Nearest Neighbor
Table 2 Results of k-Nearest Neighbor (k-NN) Class
TP rate
FP rate
Precision
Recall
F-measure
ROC
HD
0. 774
0.517
0.762
0.768
0.259
0.713
Healthy
0.483
0.226
0.500
0.491
0.259
0.449
Weighted average
0.681
0.424
0.678
0.681
0.680
0.629
Table 3 Results of rule learners classification Class
TP rate
FP rate
Precision
Recall
F-measure
ROC
HD
0.935
0.621
0.763
0.935
0.395
0.758
Healthy
0.379
0.065
0.733
0.379
0.395
0.476
Weighted average
0.758
0.443
0.754
0.758
0.395
0.668
442
N. Jagadeesan and T. Velmurugan
Fig. 4 Rule learners classification
depicts the accuracy performance for the entries in Table 4 based on categorization measures. Furthermore, true positive, false positive, accuracy, recall, F-measure, and ROC rates are all high for the HD class but extremely low for the healthy class. However, it is a weighted average for the entire class. Precision received an HD score of 0.894, while Healthy received a score of 0.640. The values of all additional statistical indicators are listed. The AdaBoostM1 algorithm results are shown in Table 5. HD has been shown to be highly accurate when compared to healthy classes. Table 5 accuracy performance based on categorization measures is depicted graphically in Fig. 6. Furthermore, the HD group has higher true positive, false positive, precision, recall, F-measure, and ROC rates, whereas the healthy group has lower rates in all of these categories. It is, however, a weighted average for the entire class. Precision received an HD rating of 0.877, while Health received a rating of 0.804. The outcome of each quantitative indicator is provided. The variables used to forecast the error reports of all four algorithms are listed in Table 6, and the data is graphically shown in Fig. 7. When compared to other algorithms, the Random Forest algorithm has the smallest mean absolute error (MAE). Kappa scores between 0.4 and 0.75 indicate thoughtful moderate to good agreement, while values above 0.75 indicate excellent agreement and values below 1.0 indicate Table 4 Results of Random Forest Class
TP rate
FP rate
Precision
Recall
F-measure
ROC
HD
0.871
0.345
0.894
0.871
0.857
0.537
Healthy
0.655
0.129
0.640
0.655
0.679
0.537
Weighted average
0.802
0.276
0.809
0.802
0.800
0.537
Performance Analysis of Classification Algorithms for the Prediction …
443
Fig. 5 Random forest precision graph Table 5 Results of AdaBoostM1 Class
TP rate
FP rate
Precision
Recall
F-measure
ROC
HD
0.806
0.241
0.877
0.806
0.840
0.544
Healthy
0.759
0.194
0.647
0.759
0.698
0.544
Weighted average
0.791
0.226
0.804
0.791
0.795
0.544
Fig. 6 AdaBoostM1 precision graph
444
N. Jagadeesan and T. Velmurugan
complete agreement among all raters. The k-Nearest Neighbor (k-NN) strategy has a lower (0.2593), and the Random Forest approach has a higher (5460) kappa statistic. In order to assess prediction accuracy, the performance of algorithms is compared in this section using cases that were correctly classified, instances that were wrongly classified, and the amount of time it took the algorithms to obtain the final findings. The algorithms that can correctly classify the greatest number of examples for the given dataset will be the ideal and efficient algorithm for the prediction operation, taking correctly classified instances into account. Table 7 compares the performance of supervised algorithms in terms of correctness, erroneous classification, and time in milliseconds. While evaluating the data in Table 7, the accuracy values obtained by the k-NN model, Rule Learners Classification Algorithm, Random Forest, and AdaBoost M1 algorithms were 68.10%, 80.20%, 81.10, and 75.80%, respectively. The Random Forest algorithm and Rule Learners Classification Algorithm achieved the highest accuracy of this research effort when compared to the other two classification models. Table 6 Error reports Statistic
k-NN
Rule learners classification
Random forest
AdaboostM1
Kappa statistic
0.2593
0.3612
0.5460
0.5403
Mean absolute error
0.3188
0.3725
0.1978
0.2584
Root mean squared error
0.5644
0.4306
0.4447
0.3913
Relative absolute error (%)
73.47
85.84
45.59
59.56
Root relative squared error (%)
98.16
92.42
95.45
83.97
Fig. 7 Error reports
Performance Analysis of Classification Algorithms for the Prediction …
445
Table 7 Comparison of Supervised Algorithm results Classification model
Correctly classified (%)
Incorrectly classified (%)
Time taken (in Milliseconds)
k-Nearest Neighbor (k-NN)
68.10
31.80
0.01
Rule learners classification algorithm
80.20
19.70
0.22
Random forest
81.10
18.90
0.23
AdaBoostM1
75.80
24.10
0.28
The length of the input determines how long it takes an algorithm to run, which is referred to as temporal complexity. It determines how long each algorithmic code statement takes to run. It will not investigate how long an algorithm takes to run. In the technological world, computer power and speed have increased dramatically. The computer’s hardware configuration may influence how time-consuming the algorithm is. Table 7 displays the Time Complexity of each algorithm. The accuracy of AdaBoostM1 was determined in 0.28 seconds, JRip’s in 0.22 seconds, Random Forest’s in 0.23 seconds, and k-accuracy NN’s in 0.1 seconds. The Random Forest algorithm is considered despite the fact that it required significantly less processing time than other algorithms when compared to the k-NN algorithm due to the number of examples successfully classified.
5 Conclusions Finding the disease in a medical data set using computer-oriented techniques is typically a time-consuming operation. Some machine learning algorithms were utilized in this study to identify heart disease in the dataset. The accuracy of categorization algorithms is explored in this paper using the parameters specified in the hearing illness dataset. To detect disease, specific characteristics are examined. The experiment employed four common data mining techniques: k-Nearest Neighbor (k-NN), Rule Learners Classification Algorithm, Random Forest, and AdaBoostM1. According to the experimental results of this work, the Random Forest method outperforms the other algorithms in terms of accuracy. Furthermore, the algorithm is significantly faster than previous strategies. Future research should combine multiple categorization algorithms with additional feature selection techniques to improve prediction accuracy. Furthermore, other algorithms will be used in the future for the same reason.
446
N. Jagadeesan and T. Velmurugan
References 1. Mirmozaffari Mirpouya, Alinezhad Alireza, Gilanpour Azadeh (2017) Data mining apriori algorithm for heart disease prediction. Inter J Comp Comm Instrument Eng 4(1):20–23 2. Srivastava K, Choubey DK (2020) Heart disease prediction using machine learning and data mining. Inter J Recent Tech Eng 9(1):212–219 3. Ayon SI, Islam MM, Hossain MR (2020) Coronary artery heart disease prediction: a comparative study of computational intelligence techniques. IETE J Res, 2488-2507. https://doi.org/ 10.1080/03772063.2020.1713916 4. Aggrawal R, Pal S (2020) Sequential feature selection and machine learning algorithm-based patient’s death events prediction and diagnosis in heart disease. SN Comp Sci 1(344). https:// doi.org/10.1007/s42979-020-00370-1 5. Gao X-Y, Ali AA, Hassan HS, Anwar EM (2021) Improving the accuracy for analyzing heart diseases prediction based on the ensemble method. Complexity, Article ID. 6663455. https:// doi.org/10.1155/2021/6663455 6. Takci Hidayet (2018) Improvement of heart attack prediction by the feature selection methods. Turkish J Elect Eng Comp Sc 26:1–10. https://doi.org/10.3906/elk-1611-235 7. Latha CBC, Jeeva SC (2019) Improving the accuracy of prediction of heart disease risk based on ensemble classification techniques. J Inform Med Unlock 16. https://doi.org/10.1016/j.imu. 2019.100203 8. Karen Garate-Escamila A, Hassani AE, Andres E (2020) Classification models for heart disease prediction using feature selection and PCA. J Inform Med Unlock 19:100330–100351 9. Spencer R, Thabtah F, Abdelhamid N, Thompson M (2020) Exploring feature selection and classification methods for predicting heart disease. J Digital Heal 6. https://doi.org/10.1177/ 2055207620914777 10. Senan EM, Al-Adhaileh MH, Alsaade FW (2021) Diagnosis of chronic kidney disease using effective classification algorithms and recursive feature elimination techniques. J Health Eng, Article ID 1004767. https://doi.org/10.1155/2021/1004767 11. Almansour NA, Syed HF, Khayat NR (2019) Neural network and support vector machine for the prediction of chronic kidney disease: a comparative study. Comp Biol Med 109:101–111 12. Velmurugan T, Latha U (2021) Classifying heart disease in medical data using deep learning methods. J Comp Comm 9:66–79. https://doi.org/10.4236/jcc.2021.91007 13. Mohan S, Thirumalai C, Srivastava G (2019) Effective heart disease prediction using hybrid machine learning techniques. IEEE Access 7:81542–81554 14. Jagadeesan N, Velmurugan T (2021) Impact of classification algorithms for the prediction of heart disease: a survey. International Symposium on Innovation in Information Technology and Application 5(1):417–430 15. Bhuvaneswari NG (2012) Cardiovascular disease prediction system using genetic algorithm and neural network. Comp Comm Appl 5. https://doi.org/10.1109/ICCCA.2012.6179185 16. Masethe Hlaudi, Masethe Mosima (2014) Prediction of heart disease using classification algorithms. Lecture Notes in Engineering and Computer Science 2:809–812 17. Jabbar MA, Deekshatulu BL, Chandra P (2013) Classification of heart disease using k-nearest neighbor and genetic algorithm. Proc Tech 10:85–94. https://doi.org/10.1016/j.protcy.2013. 12.340 18. Almustafa KM (2020) Prediction of heart disease and classifiers sensitivity analysis. BMC Bioinform 21. https://doi.org/10.1186/s12859-020-03626-y 19. Pal M, Parija S (2009) Prediction of heart diseases using random forest. J Phys: Conf Ser. https://doi.org/10.1088/1742-6596/1817/1/01 20. Deivendran G, Balaji SV, Paramasivan B, Vimal S (2021) Coronary illness prediction using the AdaBoost algorithm in sensor data analysis and management: the role of deep learning. IEEE, pp 161–172. https://doi.org/10.1002/9781119682806.ch10 21. Setiawan W, Damayanti F, (2020) Layers modification of convolutional neural network for pneumonia detection. J Phys: Conf Series. https://doi.org/10.1088/1742-6596/1477/5/052055
Fractional Order Controller Design Based on Inverted Decouple Model and Smith Predictor R. Hanuma Naik, P. V. Gopikrishna Rao, and D. V. Ashok Kumar
Abstract This paper proposes a Fractional Order Proportional Integral Derivative Controller (FO-PID) for multivariable processes. Due to the more number of tuning parameters involved, it is very intricate to develop systematic tuning rules for the controller. Most of the methods available for tuning of FO-PID deal with only SingleInput–Single-Output (SISO) systems. To overcome these limitations, this paper is planned to find out the analytical tuning rules of the FO-PID controller for multivariable processes. The Inverted Decoupled Smith Predictor (IDSP) is used to contract with interrelations amid the process variables available in a multivariable system, in addition to multiple delay terms. Based on the internal model structure, the analytical tuning method is derived. This scheme allows the use of a controller for each loop individually. The decoupled FO-PID controller will be tuned using the frequencyresponse approach. To validate the proposed algorithm, simulation results are also included in this paper. Keywords Decentralized controller · Interaction · Inverted decoupler · MIMO process variables · Robustness · Smith predictor
1 Introduction Multivariable framework control is known to be more difficult to plan when contrasted with scalar processes. This is principally because of the presence of interconnections and directionality in such frameworks [1–3]. This restricts the extent of utilization of most parametric model-based plan calculations to SISO applications. For the last many years, a very few strategies for settling multivariable control issues have been proposed for traditional PID regulators for MIMO processes by acquainting a detuning factor with meet the soundness and execution of the multi-closed-loop control framework. R. H. Naik (B) · P. V. G. Rao · D. V. A. Kumar RGM College of Engineering and Technology, Nandyal 518501, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 689, https://doi.org/10.1007/978-981-99-2322-9_33
447
448
R. H. Naik et al.
The multi-loop PI/PID controller gives better control execution to the processes with low interaction, but it neglects to give sensible control execution when the intercombinability among the loops is huge. In such a case, decoupler-based control is chosen for the control of MIMO processes. There are three kinds of decoupler that are more frequently used in design of control systems for MIMO processes like ideal, inverted, and simplified [4]. The ideal decoupler is shaped utilizing the reversal of the model which might bring about complex elements. The simplified decoupler is generally used to foster an equivalent transfer Function (ETF) model and a decentralized regulator is intended for the relating ETF model. In any case, inverted decoupler gives the upsides of both ideal and simplified decouplers and works on the strength of the process. A large portion of the decoupled control calculations accessible in the writing have restricted use with customary PID regulators [5–7]. Recently, the FO-PID regulator, which was first proposed [8], has drawn in additional consideration from numerous specialists in the pitch of control processes. The FO-PID has five tuning boundaries including the gains of proportional, integral, derivative controllers. Because of additional tuning boundaries, it is more intricate to determine insightful tuning procedures for the regulator. Various tuning techniques have been proposed to take care of this sort of issue [9, 10]. Despite this, the greater part of them are familiar to control SISO processes only. In this work, an Inverted Decoupling-Smith indicator (IDSP) arrangement for multivariable plants/processes is planned to manage interconnections between process factors as well as different time deferrals of the loops. The feasibility issue assumes a significant part in carrying out a decoupler due to the necessities of being steady and appropriate of its entire model. To beat this issue, subsequently, numerous scientists proposed estimation approaches like the Prediction Error Method (PEM), direct least square in the frequency space, and Coefficient Matching (CM) [11]. In any case, these referenced strategies are appropriate for diminishing integer order models. In order to improve the unique way of performance of a decoupled process, fractional order plants/processes are dealt with to be the same transfer function elements of the decoupled components. Thus, the entire regulator arrangement utilized in this work is known as Fractional Inverted Decoupling-Smith Predictor (F-IDSP). The Particle Swarm improvement (PSO) calculation for the estimation has been proposed [12, 13], which is utilized to figure out the boundaries of estimated fractional order capacities. Furthermore, to work on the exhibition of processes, a FO-PID regulator is proposed for decoupled processes. In recent years, different techniques for FOPI/FO-PID configuration have been proposed in the field of process loop control. Among those, the two unique methodologies: (i) the Internal Model Control (IMC)-based method and (ii) frequencydomain robustness-based control are most widely used. The first reason is IMC plans to lessen the quantity of tuning boundaries, and typically, there is a single tuning parameter based on peak overshoot (Mp) or sensitivity function (Ms). The subsequent one is based on robustness specifications like gain and phase margins.
Fractional Order Controller Design Based on Inverted Decouple Model …
449
The biggest weakness of the second option is that those limitations are sufficient to tackle three tuning boundaries to a FOPI regulator. The IMC with decoupler-based FO-PID configuration has been addressed in [14]. In recent years, different control calculations of FOPI regulators for TITO processes have tended to be by scientists [15–17]. A fractional filter utilized alongside an IMC regulator for multivariable loop control and to work on the presentation by diminishing the intercombinability has been proposed [18]. In this paper, IDSP-based FO-PID configuration is embraced for TITO processes. The benefit of IDSP is that it involves the Smith indicator for eliminating the dead time in an ETF. It limits the challenges in the acknowledgment of decoupler in real time. In this manner, the delay-free models of it are dealt with in the plan of regulators. The FO-PID regulators are planned utilizing insightful tuning relations [13]. This paper is prearranged as follows: In Sect. 2, basics of fractional order calculus are presented. In Sect. 3, the structure of decoupled TITO processes is described. Section 4 presents a fractional order controller design for the TITO system. In Sect. 5, the evaluation of performance is described. To evaluate the proposed approach, case studies with simulation results are included in Sect. 6. Finally, the conclusions are given in Sect. 7.
2 Fractional Order Calculus (FOC) The FOC is as old as the whole number one though up to as of late its application was solely in arithmetic. Numerous genuine frameworks have better modeled with FOC differential conditions as it is an appropriate apparatus to divide issues of fractal aspect, with large “memory” and noisy behavior. Those attributes have drawn to the specialist’s advantage in recent years, and presently, it is an apparatus utilized much in every area of science. The natural thought of FOC is really old as integer order (IOC); it was well described in a letter composed by Leibniz to L’Hopital [19]. It is a theory of the IOC to a different order of real or complex. The generalization of FOC is analytically expressed as:
Dα =
⎧ ⎪ ⎪ ⎨
dα ,α dt α
>0 1, α = 0
⎫ ⎪ ⎪ ⎬
t ⎪ ⎪ ⎪ ⎩ (dτ )−α , α < 0 ⎪ ⎭
(1)
a
With α ∈ Rα ∈ R. Its applications in designing are deferred because FOC has various definitions, there is anything but a straightforward mathematical understanding, and the IOC appears, from the beginning, to be sufficient to tackle designing issues. The description of FOC as per the Riemann–Liouville [20] is written as:
450
R. H. Naik et al.
a Dcα
d 1 f (t) = (n − α) dt n
t c
f (τ ) dτ, (t − τ )α−n+1
n−1 40
215
21.18
3.00
205
< 5 cm
0
20.20 0
High amniotic
> 25 cm
52
5.12
Preterm
(Before 37 weeks)
43
4.24
Full term
(After 37 weeks)
C-section Hours_in_labor
0 (h) 02.00–8:00 (h)
0
0
500
49.26
268
26.40
47
4.63
9.00–15.00 (h)
450
44.33
16:00–28:00 (h)
250
24.63
7 Results and Discussion Using the available statistics, the predicted prevalence of high-risk pregnancies is calculated. Table 1 summarizes the characteristics of the primary parameters taken into account. The average age of pregnant women was 29.822 years old, with a range of 15–40 years. The average BMI was 23.225 kg/m2 . Low and high amniotic levels (less than 5 and greater than 25) are reported and 5.12%, respectively. The table also includes information such as the range, number of instances, and percentages. Figure 3 depicts comparison graphs of several disorders affecting pregnant women. Figures 4, 5, and 6 depict the prediction risk vs actual value of linear regression, random forest, and KNN. Python is used to compute these values. In the model result, the projected value accuracy is 39.58% with linear regression, 87.06% with random forest regression, 76.58% with KNN regression, and 85.71 with the SVM method. The study comprised 1015 moms in total. Table 1 shows the characteristics of the research individuals.
662
Fig. 3 Comparison graph of different parameters
Fig. 4 Predicted risk value versus actual risk value for random forest regression
Priyanka et al.
Comparative Analysis of High-Risk Pregnancy Prediction Using …
663
Fig. 5 Predicted risk value versus actual risk value for linear regression
Fig. 6 Predicted risk value versus actual risk value for KNN regression
8 Conclusion It is critical to keep advancing this area of ML research in order to find answers with broad therapeutic applications and to avert infant problems. Because interpretable ML systems are built on real-world data and outcomes, they highlight the most essential factors for clinicians, and the outcome is objective. The four implemented algorithms in the machine learning system were SVM, random forest regression, Knearest neighbor, and linear regression. The correctness of each algorithm’s performance was assessed. The random forest, with an accuracy of 87.06% for predicting high-risk pregnancy, was the algorithm with the best efficacy based on multiple factors (Table 2). Similarly, the SVM achieved an accuracy of 85.71%. KNN has a score of 76.58%, whereas linear regression has a score of 39.58%. In future, we will work on different machine learning and deep learning models with more parameters to improve these outcomes. Table 2 Comparison of different models accuracy result
Model
Accuracy score
Random forest
0.8706
Linear regression
0.3958
KNN
0.7658
SVM
0.8571
664
Priyanka et al.
References 1. Ren Z et al (2018) Maternal exposure to ambient PM10 during pregnancy increases the risk of congenital heart defects: evidence from machine learning models. Sci Total Environ 630:1 2. Dinh A et al (2019) A data-driven approach to predicting diabetes and cardiovascular disease with machine learning. BMC Med Inform Decis Making 19(1):1–5 3. Zheng T et al (2019) A simple model to predict risk of gestational diabetes mellitus from 8 to 20 weeks of gestation in Chinese women. BMC Preg Childbirth 19(1):1–0 4. Mujumdar A, Vaidehi V (2019) Diabetes prediction using machine learning algorithms. Procedia Comput Sci 165:292–299 5. Sonar P, Jaya Malini K (2019) Diabetes prediction using different machine learning approaches. In: 2019 3rd international conference on computing methodologies and communication (ICCMC), Mar 27. IEEE, pp 367–371 6. Prema NS, Pushpalatha MP (2019) Machine learning approach for preterm birth prediction based on maternal chronic conditions. In: Emerging research in electronics, computer science and technology 2019. Springer, Singapore, pp 581–588 7. Betts KS et al (2019) Predicting common maternal postpartum complications: leveraging health administrative data and machine learning. BJOG Int J Obstet Gynaecol 126(6):702–709 8. Birjais R et al (2019) Prediction and diagnosis of future diabetes risk: a machine learning approach. SN Appl Sci 1(9):1–8 9. Jhee JH et al (2019) Prediction model development of late-onset preeclampsia using machine learning-based methods. PLoS One 14(8):e0221202 10. Hinkle SN et al (2021) Nutrition during pregnancy: findings from the National Institute of Child Health and Human Development (NICHD) fetal growth studies–singleton cohort. Curr Dev Nutr 5(1):nzaa182 11. Akazawa M et al (2021) Machine learning approach for the prediction of postpartum hemorrhage in vaginal birth. Sci Rep 11(1):1–7 12. Wang S, Pathak J, Zhang Y et al (2019) Using electronic health records and machine learning to predict postpartum depression. In: MEDINFO 2019: health and wellbeing e-networks for all. IOS Press, pp 888–892 13. Hoffman MK, Ma N, Roberts A (2021) A machine learning algorithm for predicting maternal readmission for hypertensive disorders of pregnancy. Am J Obst Gynecol MFM 3(1):100250 14. Sufriyana H et al (2020) Comparison of multivariable logistic regression and other machine learning algorithms for prognostic prediction studies in pregnancy care: systematic review and meta-analysis. JMIR Med Inform 8(11):e16503 15. Li YX et al (2021) Novel electronic health records applied for prediction of pre-eclampsia: machine-learning algorithms. Preg Hypertens 26:102–109 16. Du Y et al (2022) An explainable machine learning-based clinical decision support system for prediction of gestational diabetes mellitus. Sci Rep 12(1):1–4 17. Ye Y et al (2020) Comparison of machine learning methods and conventional logistic regressions for predicting gestational diabetes using routine clinical data: a retrospective cohort study. J Diabetes Res 12:2020 18. Zhong Y et al (2022) Use of machine learning to estimate the per-protocol effect of low-dose aspirin on pregnancy outcomes: a secondary analysis of a randomized clinical trial. JAMA Netw Open 5(3):e2143414 19. Islam MN et al (2022) Machine learning to predict pregnancy outcomes: a systematic review, synthesizing framework and future research agenda. BMC Preg Childbirth 22(1):1–9 20. Metz TD et al (2022) Association of SARS-CoV-2 infection with serious maternal morbidity and mortality from obstetric complications. JAMA 327(8):748–759. https://doi.org/10.1001/ jama.2022.1190 21. Allahem H, Sampalli S (2022) Automated labour detection framework to monitor pregnant women with a high risk of premature labour using machine learning and deep learning. Inform Med Unlock 1(28):100771
Comparative Analysis of High-Risk Pregnancy Prediction Using …
665
22. Fergus et al (2013) Prediction of preterm deliveries from EHG signals using machine learning. PloS One 28, 8(10):e77154 23. Li S et al (2022) Improving preeclampsia risk prediction by modeling pregnancy trajectories from routinely collected electronic medical record data. NPJ Dig Med 5(1):1–6 24. Hochman E et al (2021) Development and validation of a machine learning-based postpartum depression prediction model: a nationwide cohort study. Depress Anxiety 38(4):400–411 25. Joshi RD, Dhakal CK (2021) Predicting type 2 diabetes using logistic regression and machine learning approaches. Int J Environ Res Public Health 18(14):7346 26. Mari´c I et al (2020) Early prediction of preeclampsia via machine learning. Am J Obst Gynecol MFM 2(2):100100 27. Zhang Z et al (2022) Machine learning prediction models for gestational diabetes mellitus: meta-analysis. J Med Internet Res 24(3):e26634 28. Chatterjee A et al (2020) Identification of risk factors associated with obesity and overweight—a machine learning overview. Sensors 20(9):2734 29. Liu L, Jiao Y, Li X, Ouyang Y, Shi D et al (2020) Machine learning algorithms to predict early pregnancy loss after in vitro fertilization-embryo transfer with fetal heart rate as a strong predictor. Comput Methods Progr Biomed 196:105624 30. Bodnar LM et al (2020) Machine learning as a strategy to account for dietary synergy: an illustration based on dietary intake and adverse pregnancy outcomes. Am J Clin Nutr 111(6):1235–1243 31. Zhang W et al (2020) Machine learning models for the prediction of postpartum depression: application and comparison based on a cohort study. JMIR Med Inform 8(4):e15516 32. Rani S, Masood S (2020) Predicting congenital heart disease using machine learning techniques. J Discr Math Sci Cryptogr 23(1):293–303 33. Arabi Belaghi R et al (2021) Prediction of preterm birth in nulliparous women using logistic regression and machine learning. PLoS ONE 16(6):e0252025 34. Koivu A, Sairanen M (2020) Predicting risk of stillbirth and preterm pregnancies with machine learning. Health Inf Sci Syst 8(1):1–2 35. Chu R et al (2020) Predicting the risk of adverse events in pregnant women with congenital heart disease. J Am Heart Assoc 9(14):e016371 36. Xiong Y et al (2022) Prediction of gestational diabetes mellitus in the first 19 weeks of pregnancy using machine learning techniques. Jo Maternal-Fetal Neonatal Med 35(13):2457–2463 37. He F et al (2021) A machine learning model for the prediction of down syndrome in second trimester antenatal screening. ClinicaChimicaActa 521:206–211 38. Lakshmi et al.(2015) A comparative study of classification algorithms for risk prediction in pregnancy. In: TENCON 2015–2015 IEEE region 10 conference. IEEE, pp 1–6 39. Lakshmi et al (2016) A study on C. 5 decision tree classification algorithm for risk predictions during pregnancy. Procedia Technol 24:1542–1549 40. Wanriko S et al (2021) Risk assessment of pregnancy-induced hypertension using a machine learning approach. In: 2021 joint international conference on digital arts, media and technology with ECTI northern section conference on electrical, electronics, computer and telecommunication engineering, 3 Mar 2021. IEEE, pp 233–237 41. Zhang Y et al (2021) Development and validation of a machine learning algorithm for predicting the risk of postpartum depression among pregnant women. J Affect Disord 279:1–8 42. Lee SM et al (2022) Nonalcoholic fatty liver disease and early prediction of gestational diabetes mellitus using machine learning methods. Clin Mole Hepatol 28(1):105 43. Singh A et al (2022) Prediction of abnormal pregnancy in pregnant women with advanced maternal age and pregestational diabetes using machine learning models. In: 2022 12th international conference on cloud computing, data science & engineering (confluence), 27 Jan 2022. IEEE, pp 262–267 44. Thong EP et al (2022) Optimising cardiometabolic risk factors in pregnancy: a review of risk prediction models targeting gestational diabetes and hypertensive disorders. J Cardiovascular Dev Dis 9(2):55
A Customizable Mathematical Model for Determining the Difficulty of Guitar Triad Chords for Machine Learning Nipun Sharma and Swati Sharma
Abstract Music recommendation systems (MRS) for guitar learning have gained traction and popularity during pandemic times and aids in both self-learning and remote learning. The research area gained immense importance because of the changed learning and instructing paradigm during COVID-19 pandemic. Moreover, guitar is one of the toughest instruments to play, and the growing number of guitar learners makes system design for guitar learning a very targeted and viable research area. This paper proposes a novel customizable mathematical model for determining the difficulty of guitar triad chords for machine learning and suggesting easier triad chords that can help a lot of beginner guitarists. The work proposed in the paper is easily comprehensible as it determines the triad chord difficulty on Likert scale which proposes 1 as the easiest and 5 as the most difficult. The proposed MRS effectively aids in theoretical learning and deeper understanding of chords which are spread over the entire fretboard of the guitar and are played across various octaves. This knowledge makes the guitar playing more flavorful and effortless. Its key feature is to help a beginner guitar player understand that a single chord, more specifically in the proposed MRS case, and a triad chord can be played in 12 different variations across various locations on the fretboard with just a basic understanding of relationship between the notes. Keywords Interactive systems · Music recommendation system · Guitar learning · Machine learning · Triad chords
1 Introduction During the period of COVID-19 global pandemic from March 2020 to present day, the paradigm of teaching and learning has shifted drastically. Online and automated teaching methodologies have emerged and evolved in parabolic growth trends. Some of the teaching or learning systems employ human’s interactions complemented with N. Sharma (B) · S. Sharma Presidency University, Bangalore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 689, https://doi.org/10.1007/978-981-99-2322-9_51
667
668
N. Sharma and S. Sharma
advanced teaching aids like 3D animations and smart boards [1]. However, some selfpaced learning systems are completely automated, and they work with the user’s input and then predict the recommendations in learning. These intelligent recommendation systems cater to a wide variety of skills such as imaging, photography, painting, web designing, soft computing, game developments and music. Machine learning has played an important role in designing state-of-the-art recommendation systems for learning a skill or rather more precisely to master a skill. With the majority of the students acquiring skills from the Internet, the online space is thronged with all sorts of teaching applications for users of all age groups and skill sets. It is said that music is the food for the soul, so a good and robust recommendation system for music has a daunting task at hand. The task of designing a successful music recognition and recommendation system involves a multilayered series of tasks at different stages that include music notes analysis and synthesizing them for the recommendation system, audio note representation, models used to analyze these recognition tasks and predicting recommendations based on user input [2].
2 Literature Review A comprehensive literature review of the state of the art in the guitar learning domain is presented in [3]. It traces the development progresses in the guitar playing techniques and chord recognition in a chronological order. General music recognition and recommendation tasks are performed by notes analysis and instruments classification [4–9]. Research in this specific domain can be tracked from earlier systems using augmented reality (AR) display [10]. The system is helpful in assisting the guitar player by tracking the pose and position of holding the guitar and then presenting a visual guide that may support corrections. The design of the support system is an efficient example of using marker and edge-based tracking in AR for teaching guitars to beginners. The system in [11] incorporated deep networks consisting of several convolutional layers with affine layer, 6D softmax layer and radial basis function (RBF) layer to produce guitar tablature, where the input is music audio and output is human readable chord representation notation that can be widely and easily comprehended by the masses. Guitar strumming teaching system developed in [12], known as strummer, is an interactive guitar chord practice system for training musical chords using the data-driven approach. The system determines the difficulty of transition from one chord to the other. Linear regression model is applied to find the difficulty of transition which is mapped to difficulty level on 5—Likert scale (1: easiest and 5: difficult). The work in [13] proposes a novel representation which collaborates variational auto-encoder (VAE) with Gaussian process (GP) subsequently denoted as GP-VAE. Database is classified into seven playing techniques, named as: normal, muting, vibrato, pull off, hammer-on, sliding and bending. Each technique has 1000 epochs. The proposed GP-VAE is claimed to be beneficial for a class relevant discriminative task. An interesting system development illustrated in [14] estimates the guitar string, fret and plucking position. It uses parametric pitch estimation to extract the
A Customizable Mathematical Model for Determining the Difficulty …
669
location, where the hands are interacting with the fretboard of guitar and string. The system uses a feature set of three parameters, and these parameters are estimated with nonlinear least squares (NLS) pitch estimators. The chord recognition system proposed in [15] uses a transform domain approach using discrete sine transform (DST). Its aim is to investigate the influence of sampling frequencies which do not follow Shannon sampling theorem. This system proposes the input as an isolated wave format of recorded chord signal. The said chord signals are recorded employing sampling frequencies from 2500 to 78 Hz. The guitar ontology system in [16], a goal-oriented form of description, is employed with a focus on classical guitar. A method is presented that annotates the ontology knowledge onto musical structures. It aims to bridge the gap between humans’ understanding of rendering knowledge and computer processing in order to build and develop an interactive and knowledge intensive system. The system in [17] determines a correct chord label by observing “altered” notes. The allocated notes are often reduced, increased and changed to other notes in real musical concerts and compositions. As a result of omitting, inversions and tension voicing, the allocation of notes is not the same as the definition of chords. The system aims to provide solutions to such discrepancies by constructing and applying a searching tree for chord labels and chord progressions database. Guitar playing technique (GPT) classification is yet another interesting area of research that involves various GPTs like normal, muting, vibrato, sliding, hammer-on, pull offs and bending. The system in [18] endeavors to automatically segregate GPTs. Spectral temporal receptive field (STRF)-based scale and rate descriptor constructed system identify GPTs which results in very high recognition rates. In the system presented in [19], the generation of chord progression from symbolic representation as a prediction problem is formulated. Neural attention mechanism has been incorporated to investigate the overall performance of the system. In order to generate candidate chords from chord progression sequences, an LSTM-based neural network is employed, along with a multi-modal interface deploying a Kinect device. An interesting work is observed in the paper [20] that illustrates the dynamic generation of fingerings on the basis of user configurable parameters, thus can accommodate novel chords, unusual number of strings and frets besides disability or medical condition of the user. For guitars with extended fretboards, only first 12–14 frets are considered to speed up the search time. The datasets used in the literature review are presented in [21–23].
3 Basic Music Theory for Playing Guitar Chords Guitar chords have many variations like open chords, barre chords, power chords and triads. Out of these, triads are the simplest with only three notes to be played in a chord. A chord is a group of notes played simultaneously rather than sequentially, like a melody. A scale is a pattern made out of notes using whole and half-steps. There are seven notes (musical alphabets) present in guitar, viz. C-D-E-F-G-A-B. The distance or spacing is uneven which is known as musical distance. This distance is filled by
670
N. Sharma and S. Sharma
some other notes (known as sharp notes), and the resulting uniform distance scale looks like the pattern in Table 1. For example, to construct a major scale for A, the pattern will start from A and select notes from Table 1 after A at a distance of steps mentioned in theory of major scale which is traversing (whole, whole, half, whole, whole, whole, half) distance. So, the A major scale selects A followed by B which is whole step away, followed by C# which is whole step away from B, followed by D which is half-step away from C#, followed by E which is whole step away from D and followed by F# which is whole step away from E and G which is half-step away from F#. So, the A major scale becomes the pattern given in Table 2. Similarly, the minor scale is constructed by selecting some notes from Table 1 according to minor scale theory. There are several types of triad chords depending on the different intervals between the notes. The construction of the chord begins with the selection of a root note. For example, the A major triad chord is formed by selecting only three notes from the A major scale given in Table 2. These notes are A, the root note or the bass note, followed by the third and fifth note. So, A major triad chord consists of A (first note also known as the root note or the bass note), C# (third note) and E (fifth note) notes. While forming other triad chords, a minor third note is half-step or semitone below third note. For example, minor third in case of A minor triad is C which is half-step or semitone below C#. Diminished fifth note is also half-step or semitone below the perfect fifth selected. For example, diminished fifth note of E is D#. Augmented fifth note is half-step or semitone above the fifth note selected. For example, augmented fifth note of E is F. These half-steps can be referenced from Table 1. Now, there are several types of triad chords depending on the different intervals between the notes. Triads have four variations: (a) Major Triad (constructed using root note—major third note and perfect fifth note). A major triad will have the notes—A, C#, E. (b) Minor Triad (constructed using root note—minor third note and perfect fifth note). A minor triad will have the notes—A, C, E. (c) Diminished Triad (constructed using root note—minor third note and diminished fifth note). A-diminished triad will have the notes—A, C, D#. Table 1 Uniform distance scale of notes each separated by half-step or semitone C
C#
D
D#
E
F
F#
G
G#
A
A#
B
Table 2 A major scale derived from selected notes from Table 1 according to major scale theory A
B
C#
D
E
F#
G
A Customizable Mathematical Model for Determining the Difficulty …
671
(d) Augmented Triad (constructed using root note—major third note and augmented fifth note). A-augmented triad will have the notes—A, C#, F. Each triad further has three positional variations: (a) Root position (the root note is the bass note followed by third and fifth note) (b) First inversion (the third note is the bass note followed by the fifth note and root note) (c) Second inversion (the fifth note is the bass note followed by root note and the third note). Furthermore, they can be played with bass on sixth, fifth, fourth or third string which makes it confusing in terms of the number of choices we get to play a particular chord. For example, A major triad can be played in 12 different ways or positions all over the fretboard. Now determining the best fit position of a chord in a specific form of guitar playing style is purely subjective and depends upon a person’s skill level and finger dexterity level. So, the training set of all these possible chord positions needs to be prepared so as to analyze the difficulty level of the same chord at different positions. This needs to be done extensively for all possible chords for their triads.
4 Design Procedure of Proposed MRS Here at this conjecture, a deterministic index named as “finger dexterity index” (FDI) is proposed which aims at providing the user with a more convenient numerical value assigned to the difficulty of playing a chord. As discussed already, it is subjective to define the difficulty level of the chord playing as it depends on various factors like barring the fret, positions of notes on the fret, distance of notes on the fret, age of the guitar player, arm and finger strength of the guitar player, etc. At the initial phase of designing the proposed Intelligent Triad FDI for Cadence and chord progression recommendation, we can incorporate only a few widely agreeable weighted difficulty parameters; as in this case, two parameters that are considered are fret positions and string positions that would define the difficulty level of chords with an assigned numerical value. The entire system is proposed to be built on neural networks with reinforcement learning on multiple training input variables yielding a single numerical value depicting difficulty at the output. The design of formulation of FDI that calculates the difficulty value of the triad chord is a three-step process explained in Fig. 1 and in Sects. 4.1, 4.2 and 4.3.
5 Calculation of Fret Difficulty Index In the first step, we calculate difficulty based on fret positions using the fret difficulty index shown in Fig. 2 by formulating two parameters S FP and W FP . Figure 2 shows the weights of fret positions governed by the fret difficulty index. It is computed for
672
N. Sharma and S. Sharma
Fig. 1 Design procedure flow graph
15 frets normalized with respect to the 15th fret and an increase of 5 percent difficulty increase as we move up the fretboard with each fret increment. The sum of weights of notes used on the frets for a particular triad chord are calculated as SFP . It is also essential to calculate the separation distance of the frets used to play a triad which is given by W FP,weight of maximum fret separation (distance between farthest frets for a particular triad chord).
Fig. 2 Fret difficulty index
A Customizable Mathematical Model for Determining the Difficulty …
673
6 Calculation of String Difficulty Index In the second step, we calculate difficulty based on string positions using the string difficulty index as shown in Fig. 3 by formulating two parameters S SP and W SP . The string difficulty index is normalized with respect to high e string as 1 and 10% increase in the strings as we go from e-B-G-D-A-E (the thickest string). The sum of weight of strings used on the fretboard to play a particular triad chord is used to calculate SSP depending upon which strings are pressed for the chords. It is also essential to calculate the separation distance of the strings used to play a triad which is given by WSP denoting weight of maximum string separation (distance between farthest strings). Calculation of Difficulty of Chord In the third and final step, we amalgamate both the difficulty parameters into one proposed formula. Using the above-discussed rational, logical and configurable four parameters, a formula is proposed to calculate the final difficulty value of the triad chord. The FDI can be calculated as: FDI = (SFP × WFP ) + (SSP × WSP ),
(1)
where S FP denotes sum of weights of fret positions, S SP denotes sum of weights of string positions, W FP represents weight of maximum fret separation (distance between farthest frets), and W SP denotes weight of maximum string separation (distance between farthest strings).
Fig. 3 String difficulty index
674
N. Sharma and S. Sharma
7 Proposed Music Recommendation System for Guitar Triad Chords A complete system essentially consists of various key components. However, to limit the complexity and scope of discussion, these are classified into two key components—music information retrieval and recognition at the initial stage [24] and music data processing, prediction and recommendation at the final stage. The recommendation systems reviewed in this paper do not calculate the difficulty of chord playing depending upon the position of the chord. It has motivated us to propose a system shown in Fig. 4 that can calculate the difficulty of chord triads and suggest easier chords. The proposed MRS recognizes the played chord and then displays the following information: triad chord name, triad chord type, octave played and numeric difficulty value (calculated on basis of proposed FDI). In the next step, it displays the following suggested information: possible 12 ways of playing the same chord in graphical format, easier version of same chord on the same octave with lower numeric difficulty value and overall easiest finger position across all 12 variations. An illustration of the proposed index is applied on the A major triad chord in this section. As discussed earlier, A major triad chord can be played at 12 different positions on the guitar with all possibilities. So, taking into account all the possibilities with their frets used and strings used, the calculations are performed using Eq. (1). The results are given in Tables 3, 4 and 5 for a major root position, a major first inversion position and a major second inversion position, respectively. The calculations in Table 3 correspond to A major chord in root position. The results in Table 3 illustrate that the most difficult position is when the bass is at 6 and the easiest position is when the bass is at 3. The mapping of FDI score of A major triad chord root positions with the triad chord positions is shown in Fig. 5. The calculations in Table 4 correspond to A major chord in the first inversion position. The results in Table 4 illustrate that the most difficult position is when the bass is at 5 and the easiest position is when the bass is at 3. It proves that, unlike the previous case, it is not necessary that the bass 6 position will be the most difficult. The mapping of FDI score A major triad chord first inversion with the triad chord positions is shown in Fig. 6. The calculations in Table 5 correspond to A major chord in the second inversion position. The results in Table 5 illustrate that the most difficult position is when the bass is at 6 and the easiest position is when the bass is at 3. The mapping of FDI score of A major triad chord second inversion with the triad chord positions is depicted in Fig. 7. As observed from the chord positions in Figs. 1, 2, 3, 4 and 5 and the calculated FDI, A major triad chord with root at 6 is the most difficult to play. However, A major triad chord second inversion with bass at 4 gets the lowest FDI score and hence is the easiest to play. Built on top of the FDI, this paper proposes a system which could provide users with some suggestive chord progressions and cadences which ought
A Customizable Mathematical Model for Determining the Difficulty …
675
Fig. 4 Proposed MRS for guitar triad chords Table 3 Finger dexterity index for A major triad chord in root position Chord
Strings used
Frets used
A major root position (bass at 6)
6, 5, 4
A major root position (bass at 5)
S SP
W SP
S FP
W FP
FDI
5, 4, 2 4.4056
0.2795
5.2247
0.2567
2.5730
5, 4, 3
12, 11, 9
4.0051
0.2541
3.7131
0.1824
1.6949
A major root position (bass at 4)
4, 3, 2
7, 6, 5 3.6410
0.2309
4.6575
0.1513
1.5462
A major root position (bass at 3)
3, 2, 1
14, 3.3200 14, 12
0.2099
3.2576
0.1075
1.0477
676
N. Sharma and S. Sharma
Table 4 Finger dexterity index for A major triad chord in the first inversion position Chord
Strings used
Frets used
A major first inversion position (bass at 6)
6, 5, 4
A major first inversion position (bass at 5)
S SP
W SP
S FP
W FP
FDI
9, 7, 7 4.4056
0.2795
4.2948
0.1373
1.8214
5, 4, 3
4, 2, 2 4.0051
0.2541
5.4815
0.1753
1.9786
A major first inversion position (bass at 4)
4, 3, 2
11, 9, 10
3.6410
0.2309
3.8317
0.1245
1.3181
A major first inversion position (bass at 3)
3, 2, 1
6, 5, 5 3.3200
0.2099
4.8088
0.0775
1.0698
Table 5 Finger dexterity index for A major triad chord in the second inversion position Chord
Strings used
Frets used
A major second inversion position (bass at 6)
6, 5, 4
A major second inversion position (bass at 5)
S SP
W SP
S FP
W FP
FDI
12, 4.4056 12, 11
0.2795
3.5307
0.0579
1.4357
5, 4, 3
7, 7, 6 4.0051
0.2541
4.5061
0.0738
1.3506
A major second inversion position (bass at 4)
4, 3, 2
2, 2, 2 3.6410
0.2309
5.6568
0.0000
0.8410
A major second inversion position (bass at 3)
3, 2, 1
9, 10, 9
0.2099
3.9562
0.0638
0.9496
3.3200
Fig. 5 A major triad chord root position finger positions with calculated FDI
to be by far the easiest to play depending upon the input key or scale that user wants to play.
A Customizable Mathematical Model for Determining the Difficulty …
677
Fig. 6 A major triad chord first inversion position finger positions with calculated FDI
Fig. 7 A major triad chord second inversion position finger positions with calculated FDI
8 Conclusion and Future Work The key takeaway of the proposed MRS is to make a beginner guitar player more confident about using the entire fretboard for playing and making the learning of triad chords strong by providing information of all possible ways in which a single chord can be played. The major advantage of designing such an MRS system is that it is extensible. The proposed MRS system can be used to extend the design to make interactive learning systems for scales like pentatonic, major scales, etc. There is a tremendous scope of development for systems that can aid various aspects of the guitar playing. The system proposed in this paper will help in retention of beginner players. This system has a lot of scope of scalability and future improvisations as it just aims at triad chords. Further research and enhancements can be made in system developments that use seventh chords and power chords. There is a vast scope of system development and implementation in the area of cadence suggestions and recommendations.
678
N. Sharma and S. Sharma
References 1. Purwins H, Li B, Virtanen T, Schlüter J, Chang SY, Sainath T (2019) Deep learning for audio signal processing. IEEE J Sel Top Signal Process 13(2):206–219. https://doi.org/10. 1109/JSTSP.2019.2908700 2. Schedl M (2019) Deep learning in music recommendation systems. Front Appl Math Stat 5:1–9. https://doi.org/10.3389/fams.2019.00044 3. Sharma N, Sharma S (2022) A systematic review of machine learning state of the art in guitar playing and learning. https://doi.org/10.1729/Journal.31664 4. Laroche C, Kowalski M, Papadopoulos H, Richard G (2018) Hybrid projective nonnegative matrix factorization with drum dictionaries for harmonic/percussive source separation. IEEE/ ACM Trans Audio Speech Lang Process 26(9):1499–1511. https://doi.org/10.1109/TASLP. 2018.2830116 5. Calvo-Zaragoza J, Zhang K, Saleh Z, Vigliensoni G, Fujinaga I (2018) Music document layout analysis through machine learning and human feedback. Proc Int Conf Doc Anal Recogn ICDAR 2:23–24. https://doi.org/10.1109/ICDAR.2017.259 6. Joysingh SJ, Vijayalakshmi P, Nagarajan T (2019) Development of large annotated music datasets using HMM based forced Viterbi alignment. In: IEEE region 10 annual international conference on proceedings/TENCON, 2019 Oct, pp 1298–1302. https://doi.org/10.1109/TEN CON.2019.8929664 7. Fang J, Grunberg D, Lui, S Wang Y (2017) Development of a music recommendation system for motivating exercise. In: Proceedings of 2017 international conference on orange technology. ICOT 2017, 2018-Jan, pp 83–86. https://doi.org/10.1109/ICOT.2017.8336094 8. Patel A, Wadhvani R (2018) A comparative study of music recommendation systems. In: 2018 IEEE international students’ conference on electrical electronic computer science. SCEECS 2018, pp 1–4. https://doi.org/10.1109/SCEECS.2018.8546852 9. Du P, Li X, Gao Y (2020) Dynamic music emotion recognition based on CNN-BiLSTM. In: Proceedings of 2020 IEEE 5th information technology mechatronics engineering conference. ITOEC 2020, no Itoec, pp 1372–1376. https://doi.org/10.1109/ITOEC49072.2020.9141729 10. Motokawa Y, Saito H (2006) Support system for guitar playing using augmented reality display. Proceedings of ISMAR 2006 fifth IEEE ACM international symposium mixer augmentation real, pp 243–244. https://doi.org/10.1109/ISMAR.2006.297825 11. Humphrey EJ, Bello JP (2014) From music audio to chord tablature: teaching deep convolutional networks to play guitar. In: ICASSP, IEEE international conference on acoustic speech signal processing—proceedings, pp 6974–6978. https://doi.org/10.1109/ICASSP.2014. 6854952 12. I. I. Conference (2017) STRUMMER : an interactive guitar chord practice system intelligent systems laboratory, The University of Tokyo 2 National Institute of Advanced Industrial Science and Technology (AIST), Interactive, July, pp 1057–1062 13. Chen SH, Lee YS, Hsieh MC, Wang JC (2018) Playing technique classification based on deep collaborative learning of variational auto-encoder and Gaussian process. In: Proceedings of IEEE international conference on multimedia expo, 2018 July, pp 1–6. https://doi.org/10.1109/ ICME.2018.8486467 14. Hjerrild JM, Christensen MG (2019) Estimation of guitar string , fret and plucking position using parametric pitch estimation. Audio Analysis Lab, CREATE, Aalborg University, Denmark, pp 151–155 15. Sumarno L (2019) The influence of sampling frequency on guitar chord recognition using DST based segment averaging. In: Proceeding—2019 international conference on artificial intelligent information technology. ICAIIT 2019, pp 65–69.https://doi.org/10.1109/ICAIIT. 2019.8834628 16. Iino N, Nishimura S, Nishimura T, Fukuda K, Takeda H (2019) The guitar rendition ontology for teaching and learning support. In: Proceedings of 13th IEEE international conference on semantic computing. ICSC 2019, pp 404–411. https://doi.org/10.1109/ICOSC.2019.8665532
A Customizable Mathematical Model for Determining the Difficulty …
679
17. Yasui N, Miura M, Shimamura T (2019) Chord label estimation from acoustic signal considering difference in electric guitars. In: Proceedings of 2019 international symposium intelligent signal processing communication system. ISPACS 2019, no 2, pp 31–32. https://doi.org/10. 1109/ISPACS48206.2019.8986390 18. Wang et al C-Y (2020) Spectral-temporal receptive field-based descriptors and hierarchical cascade deep belief network for guitar playing technique classification. IEEE Trans Cybern: 1–12. https://doi.org/10.1109/tcyb.2020.3014207 19. Garoufis C, Zlatintsi A, Maragos P (2020) An LSTM-based dynamic chord progression generation system for interactive music performance School of ECE, National Technical University of Athens, Zografou 15773, Greece robot perception and interaction unit, Athena Research Center, 15125 Maroussi. ICASSP 2020—2020 IEEE international conference on acoustic speech signal processing, pp 4497–4501 20. Wortman KA, Smith N (2021) CombinoChord: a guitar chord generator app. In: 2021 IEEE 11th annual computing communication workshop conference. CCWC 2021, pp 785–789.https:/ /doi.org/10.1109/CCWC51732.2021.9376001 21. Burgoyne JA, Wild J, Fujinaga I (2011) An expert ground-truth set for audio chord recognition and music analysis. In: Proceedings of 12th international society music information retrieval conference. ISMIR 2011, no Ismir, pp 633–638 22. Harte C, Sandler M, Abdallah S, Gómez E (2005) Symbolic representation of musical chords: a proposed syntax for text annotations. In: ISMIR 2005—6th international conference on music information retrieval, pp 66–71 23. Su L, Yu LF, Yang YH (2014) Sparse cepstral and phase codes for guitar playing technique classification. In: Proceedings of 15th international society of music information retrieval conference. ISMIR 2014, no Ismir, pp 9–14 24. Rathnayake B, Weerakoon KMK, Godaliyadda GMRI, Ekanayake MPB (2019) Toward finding optimal source dictionaries for single channel music source separation using nonnegative matrix factorization. In: Proceedings of 2018 IEEE symposium series computing intelligent. SSCI 2018, pp 1493–1500
Load Frequency Control of Interconnected Hybrid Power System Deepesh Sharma
and Rajni Bala
Abstract The power system is too intricate, and alterations in various parts of it interact with one another. Any abrupt shift, no matter how big or small, can modify tie-line powers and produce frequency variations. LFC is the answer to this problem. To maintain the frequency at a fixed point or a point close to it, the LFC can be utilized. As a result, the LFC technique is employed for an interconnected power system that can return each area’s frequency and tie-line power to their original set point values or extremely close to them. Additionally, a typical controller is used to accomplish this. The typical controller, however, has significant drawbacks, such as the fact that it operates very slowly and it performs worse with the complexity of the system. Consequently, a controller that can solve this issue is needed. Fuzzy artificial intelligence techniques are better suited in this regard. To keep the frequency at a desired level, the LFC needs a quick and precise controller. In this research paper, three interconnected power system topics are covered. The first area is where the thermal power plant is located, the second area is where the hydro power plant is located, and the third area is where the nuclear power plant is located. These three areas are combined in three different ways and are connected by tie lines, and their frequency is maintained by different controllers, such as PI controller and fuzzy logic controller. We are comparing the values of settling time, maximum overshoot, and lowest overshoot for each scenario separately with 1% disturbance in each area to examine the performance of a multi-area system. Keywords Load frequency control · Fuzzy controller · Power system
D. Sharma (B) Department of Electrical Engineering, Deenbandhu Chhotu Ram University of Science & Technology, Murthal, Sonepat, Haryana, India e-mail: [email protected] R. Bala Department of Physics, Maharshi Dayanand University, Rohtak, Haryana, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 689, https://doi.org/10.1007/978-981-99-2322-9_52
681
682
D. Sharma and R. Bala
1 Introduction A power framework is an interconnected framework, where various power stations and generators are worked in equal amounts. It is most significant that the recurrence of the framework ought to stay consistent. The system frequency also affects the speed of various motors in industry. N = 120 f/P
(1)
The load on the system is varied, and it affects the frequency because when the load is varied, the speed is varied and hence the frequency varies. Load is not under your control if you are sitting in the power station, we can only monitor what is the frequency but you cannot control the load. Load is on the consumer side and is controlled by the consumer. If the grid frequency is not always well maintained, then the system will fail. As we know that each power station is designed around a standard frequency generally 60 Hz or 50 Hz. For the USA, this is generally 60 Hz. The load and generation are completely balanced at 60 Hz. If load increases the generation, then frequency will start to fluctuate, and if generation increases load, frequency will increase. So if frequency increases or decreases from the common frequency, equipment can be damaged. Generators can preserve damage to turbines due to speeding up too fast, and eventually, safety relays will ride out generators, lines, and transformers if the frequency deviation is simply too severe. Transformers can warm up and harm themselves if the current through them becomes too high which can occur during possible occasions because of recurrence issues. So it is required to stabilize the frequency at a fixed range for proper operations of a power system.
2 Literature Survey For more than thirty years, investigations have been available on load recurrence power of power framework. Linearized models of multi-region (counting two regions) power frameworks are thought about so far away for top execution. K. C. Divya et al. have introduced the hydro–hydropower framework and replication model [1]. They comprise a presumption of identical frequencies, all things considered, toward defeating the troubles of broadening the conventional approach. The representation of the model was acquired by disregarding the distinction in recurrences between the power regions. E. C. Tacker et al. have examined the LFC of interrelated force framework and explored the detailing of LFC through LCT [2]. Afterward, the impact of GRC was presented in these examinations, thinking about both discrete and continuous power frameworks. D. Sharma describe the effectivness of ANFIS controller in load frequency control [3]. Surya Parkash and SK Sinha
Load Frequency Control of Interconnected Hybrid Power System
683
developed the model of thermal hydro system, and they used artificial intelligence techniques to solve the problem of LFC [4]. H. Shayeghi and H. Shayanfar proposed the advanced control technique for an interrelated multi-area power scheme for LFC [5]. FLC is used to decrease the variations in system output [6, 7]. Power systems generally have multiple regions, and every region is unique as compared to others [8–10]. D. Sharma gave a survey of LFC with different techniques and different sources [16].
3 Modeling of System 3.1 Hydro–Thermal Interconnected System The LFC power system plays a significant important function in electrical production. The power framework may be isolated into a different number of load recurrence control areas which are interrelated through tie lines [11–14]. The principle target is currently to control or manage the recurrence of every area and furthermore keep on controlling the tie-line power according to region [15, 16]. In the primary load frequency control loop, if any change occurs in the system load, then it will create a consistent state deviation in frequency, and it relies upon the speed guideline of lead representative. So in request to diminish this recurrence deviation to zero, we ought to give a reset activity and control activity to control these frequency deviations by presenting the essential controller in the plant. The modeling of thermal speed governor is K gs 1 C . Y E(s) = P (s) − F(s) × R 1 + STgs
(2)
Modeling of thermal generator is F(s) = [PG(s) − PD(s)] ×
K Ps . 1 + sT ps
Thermal load model of transfer function is defined as 1 . F(s) = Pm(s) − PD(s) 2H s + D
(3)
(4)
The hydroelectric power framework demonstrating for LFC required a model of speed governor, turbine, and generator. For the stable representation of hydraulic turbine and water column, a small number of assumptions are to be prepared; i.e., the pressure-driven opposition is thought to be immaterial, and the penstock pipeline is thought to be inelastic and water incompressible [17–22].
684
D. Sharma and R. Bala
T.F. of a hydro turbine is shown as 1 − sTw PT h = . Yeh 1 + 0.5sTw
(5)
The transfer function of hydroelectric governor is TGh =
K gh 1 + sTgh2
1 + sTh2 ) 1 + sTh4
(6)
Load model for a hydro system Pe = PL + Dw.
(7)
Figure 1 represents the simulation model of two-area power systems which consists of thermal system and hydro system. Figure 2 represents the simulation governor model of nuclear power plant. The result is obtained by using techniques, i.e., proportional integral controller (PI) and artificial intelligence techniques like fuzzy logic controller.
Fig. 1 Representation of two areas of thermal hydro system
Load Frequency Control of Interconnected Hybrid Power System
685
Fig. 2 Governor model representation of nuclear plant
3.2 Thermal–Nuclear Interconnected Power System For nuclear power system generators, speed governors and turbines should be modeled. Here two low-pressure turbines (LP) and one high-pressure turbine (HP) are used. Figure 3 shows the transfer function of low-pressure and high-pressure turbines. The high-pressure turbine is intended to productively separate work out of great pressure steam as it at first enters primary impetus turbines. The low-pressure turbine is intended to effectively extricate work out of steam, which is depleting out of the great pressure turbine at low pressure. Figure 4 is the two-area simulation model which consists of thermal area power system and nuclear power system. These two sources are connected to each other through tie line. Here, the power system framework might be isolated into various numbers of load recurrence control regions which are interrelated through tie lines. The principle goal is currently to control or adjust the rate of recurrence of all regions and also to maintain the tie-line power of each region. In the primary load frequency control loop, if any change occurs in the framework load, then it will create a consistent state deviation in frequency, and it relies upon taking place the speed parameter of a regulator. To Fig. 3 Turbine model for nuclear plant
686
D. Sharma and R. Bala
Fig. 4 Model representation of controlled two-area thermal–nuclear system
decrease the rate of recurrence variation to zero, we should give a reset action or control action to control these frequency deviations. Here, two types of controllers are used to maintain the frequency at a constant level: first is PI and other is fuzzy.
3.3 Three-Area Thermal–Nuclear–Hydro System Hydro–thermal–nuclear system representation is shown in Fig. 5. Framework underneath research comprises three regions: interrelated power framework with one region as well as thermal energy creating units with re-warmer, second region comprises hydro producing unit, and third area included nuclear generating power units. These three areas are connected through tie lines. The steady-state error in frequency is eliminated by tie-lines bias control. The essential distinction between three regions is as far as settling time. Working model representations for three regions framework below consideration are displayed in the figure. Discussing the demonstration of this three-region power framework utilizing the controller, it is similar to the demonstration of the two-region power framework. In this system, conventional controller and artificial intelligence techniques are used to solve the frequency problem in two-area power systems.
Load Frequency Control of Interconnected Hybrid Power System
687
Fig. 5 Simulink model three-area system
4 Controlling Methods 4.1 Conventional PI Controller The PI controller is the most widely used control law in “thermal and hydro power station” operating frameworks. Due to its speedier transient reaction with the matching controller, PI controller is used to achieve the consistent condition very quickly. When there is a step change in load demand, low Kp values produce stable reactions with large consistent errors, while high Kp values produce superior consistent state performance with a more unfavorable transient reaction nature. In this way, regardless of whether raising Kp’s gain, the greater value of Kp is used to reduce the consistent state error; the system damping issue and time constant are thus lessened at that point. Therefore, picking the right rate of “Kp” is crucial. Since several faults must be present in order to generate a control o/p, the primary proportional action does not at all remove the consistent state error in the framework. Using integral in the controller is one general strategy for reducing the consistent error. Here, the produced ctrl signal is corresponding to the essential of the error signal, that is, U (t) = K i
e(t)dt,
(8)
688
D. Sharma and R. Bala
Fig. 6 PI controller block diagram
where Ki represents the essential gain. Figure 6 is the block diagram of the PI controller which is used in the simulation model of two-area and three-area simulation models in MATLAB. As we know, Gc(s) = K p(1 + 1/sT i).
(9)
Traditional controls provide a better, more distinctive reaction and are very simple to execute. However, as the complexity of the framework rises, their display falters. Accordingly, there is a need for a controller that can resolve this problem. Fuzzy logic controllers, an artificial intelligence controller, are better suited.
4.2 Fuzzy Logic Controller (FLC) Traditional controls are incredibly easy to use and offer a better, more recognizable reaction. Their display, however, becomes less stable as the framework’s complexity increases due to disturbance caused by load variation. Therefore, a controller that can tackle this issue is required. Artificial intelligence (AI) fuzzy logic controllers are more appropriate. Dispersed processors would be easier to execute since fuzzy logic can process an appropriate number of inputs and outputs, but as the number of inputs and outputs increases, system complexity increases. The three components of a fuzzy logic controller are the fuzzifier, rule basis, and fuzzified.
4.3 Rule Base for Fuzzy Logic System In a fuzzy logic controller, we provide input and then send it to the fuzzification module which fuzzifies the input which means it converts the crisp value into a fuzzy value by using a database and also uses membership function to define the input variable into a fuzzy variable. After fuzzification, the fuzzification output is sent to the inference module. It consists of a fuzzy rule base that takes a fuzzy variable as input, generates possible fuzzy output, and gives as the input to defuzzification,
Load Frequency Control of Interconnected Hybrid Power System
689
which converts the fuzzy output into crisp value, and after the fuzzified process, the result is obtained and further process proceeds [15]. FLC works on rules. The fuzzy codes are written on the face. A file of MATLAB utilizing this work in the Mamdani inference utilizing TMF proficient 1. The standards are based upon the M.F., and these principles are set in a suitable collection of information and yield boundaries. The fuzzy set is represented graphically. The degrees of the membership function are represented on y-axis in the [0 1] interval. A simple function is used to make the membership functions. The various types of membership functions are, i.e., triangular function, trapezoidal function, and Gaussian function, but in this project, only triangular functions are used. Here, two inputs and one output are taken.
5 Simulations and Results 5.1 Hydro–Thermal Interconnected System Case 1: Result parameter of hydro by using PI and FLC when 1% disturbance in thermal unit. Table 1 indicates that results of the FLC controller are better when disturbance is given in the thermal unit. Settling time and maximum overshoot are less in the FLC controller. Case 2: Result parameter of thermal plant by using PI and FLC when 1% disturbance occurs in hydro plant. Figure 7 shows the dynamic response of frequency when 1% disturbance is given in a hydro plant. FLC controllers settle the frequency in less time than the PI controller. Table 2 gives the comparison between PI controller and fuzzy logic controller of two-area power systems (thermal and hydro). When 1% disturbance is given in the hydro plant, then the fuzzy logic controller settles the system faster than PI controller in terms of frequency and oscillations. Table 1 Result of different parameters after simulation of a hydro plant (area2) when disturbance occurs in thermal plant (area1)
Parameters
PI controller
FLC controller
Settling time (s)
60
10
Maximum overshoot
4.856
0.04
Minimum overshoot
− 6.529
− 0.03
690
D. Sharma and R. Bala
Fig. 7 Results of both the areas when 1% disturbance occurs in a hydro plant
Table 2 Result of different parameters of thermal plant and hydro plant
Parameters
PI controller
Fuzzy logic controller
Settling time (s)
60
7
Maximum overshoot
4.789
0.04
Minimum overshoot
− 7.526
− 0.03
5.2 Two Regions of Thermal–Nuclear System Result parameter of nuclear plant by using PI and FLC when 1% disturbance occurs in thermal plant. Figure 8 shows frequency dynamic response of thermal and nuclear area when 1% disturbance is given in thermal plants. The same response can be seen in Table 3. The settling time of FLC is much less than PI controller, and maximum overshoot of FLC is also less than PI controller.
Fig. 8 Results of both the areas when 1% disturbance occurs in a thermal plant
Load Frequency Control of Interconnected Hybrid Power System Table 3 Result of different parameters after simulation of nuclear plant (area2) when disturbance occurs in a thermal plant (area1)
691
Parameters
PI controller
Settling time (s)
60
FLC controller 7
Maximum overshoot
1.08
0.0109
Minimum overshoot
− 3.178
− 0.015
Fig. 9 Results of all the three areas when 1% disturbance occurs in thermal plant
Table 4 Result of different parameters after simulation of the three areas when disturbance occurs in thermal plant
Parameters
PI controller
Fuzzy logic controller
Settling time (s)
40
20
Maximum overshoot
4.24
0.03
Minimum overshoot
− 5.55
− 0.03
5.3 Three Regions of Thermal–Nuclear–Hydro System Simulated result parameter of all the three areas when a disturbance occurs in thermal plant. Figure 9 represents the dynamic response of frequency in three thermal–nuclear– hydro power systems. In Fig. 9, FLC controller response is better than PI controller. Comparison between FLC and PI controller can be seen in Table 4. Settling time and maximum overshoot are better in case of FLC controller.
6 Conclusion In all the cases, we noticed that if any disturbance occurred in any plant, then frequency started to fluctuate; so here, we used two types of controller, i.e., PI and FLC to remove the fluctuations in frequency and settle down to a constant level. After using the controller, we observed that the FLC controller is better than the PI controller. It takes less time to settle and also takes a minimum value of overshoot, whereas the PI controller takes more time to settle and the value of overshoot is more.
692
D. Sharma and R. Bala
References 1. George G, Lee JW (2001) Analysis of load frequency control performance assessment criteria. IEEE Trans Power Syst 16 2. Kothari DP, Nagrath IJ (2003) Modern power system analysis, 3rd edn. TatamcGrow Hill 3. Sharma D (2020) Automatic generation control of multi source interconnected power system using adaptive neuro-fuzzy inference system. Int J Eng Sci Technol 12(3):66–80 4. Am EC, Kocaarslan I (2005) Load frequency control in two area power systems using fuzzy logic controller. Energy Convers Manage: 233–243 5. Shayeghi H, ShayanfarH (2004) Power system load frequency control using REF neural network based on f/-synthesis theory. In: Proceeding of IEEE conference on cybernetics and intelligent system. Singapore, pp 93–98 6. Shayeghi H, Shayanfar HA, Jalili A (2009) Load frequency control strategies: a state-of-the-art survey for the researcher. Energy Convers Manage 50:344–353 7. Nanda J, Kaul BL (1978) Automatic generation control of an interconnected power system. IEEE Proc 125(5):385–391 8. Nanda J, Kothari ML, Satsangi PS (1983) Automatic generation control of an interconnected hydrothermal system in continuous and discrete modes considering generation rate constraints. IEEE Proc 130:17–27 9. Sharma D, Yadav NK (2017) A comprehensive review on load frequency control. IJETAE 7(8):499–504 10. Lokanatha MK, Vasu K (2014) Load frequency control of two area power system using PID controller. Int J Eng Res Technol (IJERT) 3(11) 11. Sharma D, Yadav NK (2017) Application of soft computing for load frequency control of interconnected thermal-hydro-gas power system. IJRSR 8(9):9980–19983 12. Sivaramakrishnan AY, Hariharan MV, Srisailam MC (1984) Design of variable structure load frequency controller using pole assignment technique. Int J Control 40(3):487–498 13. Sharma D, Yadav NK (2017) AGC of LFC based single area power system with multi source power generation includes thermal, hydro and wind-diesel system. JARDCS 9(12):91–109 14. Sharma D, Yadav NK (2019) Lion algorithm with levy update: load frequency controlling scheme for two-area interconnected multi-source power system. Trans Inst Meas Control 41(14):4084–4099 15. Cam E, Kocaarslan I (2005) Load frequency control in two area power systems using fuzzy logic controller. Energy Convers Manage 46(2):233–243 16. Sharma D (2020) Load frequency control: a literature review. Int J Sci Technol Res 9(2):6421– 6437 17. Lokanatha M, Vasu K (2014) Load frequency control of two area power system using PID controller. Int J Eng Res Technol 3(11) 18. Sharma D, Yadav NK (2020) LFOPI controller a fractional order PI controller based load frequency control in two area multi-source interconnected power system. Data Technol Appl 54:323–342 19. Nagrath IJ, Gopal M. Control system engineering, 5th edn. New Age International Publisher, New Delhi 20. Nanda J, Mangla A, Suri S (2006) Some new findings on automatic generation control of an interconnected hydrothermal system with conventional controllers. IEEE Trans Energy Convers 21(1):187–193 21. IEEE Committee Report (1973) Dynamic models for steam and hydro turbines in power system studies. IEEE Trans Power Apparatus Syst 92(4):1904–1911 22. Divya KC, NagendraRao PS (2005) A simulation model for AGC studies of hydro-hydro systems. Electr Power Energy Syst 27:335–342
Author Index
A Agarwal, Diwakar, 265 Ahlawat, Khyati, 79 Ahmad, Shandar, 553 Akhtar, Jamil, 541 Aloisio, Angelo, 631 Amitasree, Palanki, 207 Anand, Nemalikanti, 17 Arathy, R., 477 Aravind Sekhar, R., 171
B Babu, V. Suresh, 255 Baiju, M. R., 255 Bala, Rajni, 681 Bansal, Anusha, 79 Bansal, Poonam, 115 Baseer, K. K., 463 Bhabani, Bidisha, 477 Bhatia, Ruby, 653 Bischoff, Stefan, 619 Bohra, Riyanshi, 567
Dayama, Gautam, 91 Deshpande, Ashwini M., 521 Deshpande, Tushar, 103 Devadason, Joshua, 493 Dev, Amita, 115 Devasenan, M., 507 Dhumane, Amol, 91 Divya, B., 195 Dongre, Deepika, 103 Doshi, Vatsal, 103 Do, Thanh-Nghi, 65
E Escobar-Gómez, Elías, 283
G Ghodke, Gayatri, 419 Gite, Rahul, 129 Gopan, Neethu Radha, 39 Goyal, Sonali, 653 Gupta, Govind P., 53 Gupta, Subodhini, 611
C Chandomi-Castellanos, Eduardo, 283 Chaturvedi, Amrita, 297 Chhabra, Dhairya, 91 Cyrus, Jindrich, 619
H Hari Vamsi, V., 29 Hlava, Jaroslav, 619 Hossain, Quazi Delwar, 599
D Darius, Preethi Sheba Hepisba, 493 Dass, Sayantan, 311 Datta, Shourjadeep, 103
J Jagadeesan, N., 433 Jain, Kusumlata, 567 Jain, Neelesh, 611
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 689, https://doi.org/10.1007/978-981-99-2322-9
693
694 Jancee, B. Victoria, 385 Jayapal, Cynthia, 507 Joies, Kesia Mary, 411 Jose, Deepa V, 221 Jose, Jisha, 411 Joseph, B. Maria, 463
K Kar, Pushpendu, 477 Kaushik, Neha, 241 Khatoon, Sultana, 541 Koolagudi, Shashidhar S., 129 Krishna, G. Venkata Sai, 207 Kumar, D. V. Ashok, 447 Kumar, Harish, 241 Kumar, Mukesh, 151 Kumar, Vishnu P., 411
Author Index P Paithane, A. N., 643 Prabha, R., 195 Priyanka, 653 R Rajak, Prince, 53 Rajesh, M., 207 Raj, Vinay, 241 Ramani, Aman, 91 Ramirez-Alvarez, Elizeth, 283 Ranade, Pranita, 419 Rani, R. Usha, 531 Ranjan, N. M., 643 Rao, Gummadi Srinivasa, 29 Rao, P. V. Gopikrishna, 447 Resmi, R., 255 Rosso, Marco Martino, 631 Ryait, Deepjyot Kaur, 369
L Lakshmi Devasena, C, 357
M Mahapatro, Judhistir, 477 Malakar, Madhuri, 477 Manavallan, S., 507 Manik, Varun, 91 Manitha, P. V., 207 Marroquín-Cano, Sergio Flavio, 283 Marwah, Neetu, 541 Mate, G. S., 643 Melchiorre, Jonathan, 631 Mistry, Sujoy, 311 Moezzi, Reza, 619 Mohapatra, Smaranika, 567 Morales, Eduardo F., 283 Mukesh, K., 577 Munna, Mohammed Saifuddin, 599
N Nagabotu, Vimala, 323 Naik, R. Hanuma, 447 Namburu, Anupama, 323 Nguyen, Chi-Ngon, 1 Nguyen, Ti-Hon, 65 Nguyen Van, Quoc, 1
O Omkari, D. Yaso, 585 Oviya, I. R., 577
S Saifulla, M. A., 17 Sainath, Kalya Lakshmi, 357 Saini, Indu, 343 Salman, Rahama, 611 Sarangal, Himali, 181 Sardone, Laura, 631 Sarkar, Pradyut, 311 Sathya, A., 195 Savaridassan, P., 401 Saw, Adrian, 619 Senthil, G. A., 195 Shah, Rushabh, 103 Sharan, Shambhu, 115 Sharma, Ajay, 151 Sharma, Deepesh, 681 Sharma, Manmohan, 369 Sharma, Nipun, 667 Sharma, Swati, 667 Shinde, Snehal B., 585 Singh, Aakanksha, 79 Singh, Butta, 181, 343 Singh, Manjit, 181 Singh, Payal, 265 Singh, Shashank Kumar, 297 Solomon, Darius Gnanaraj, 493 Soni, Neetika, 343 Sreeni, K. G., 171 Srisurya, Ippatapu Venkata, 577 Stella, I. Johnsi, 385 Sujatha, S., 39 Sunil, Rahul, 411
Author Index Swarupa, M. Lakshmi, 531
T Thomas, Flemin, 401 Thomas, Priya, 221 Tran, Thang Viet, 1
V Varun, V. V. S. S., 567 Vathsala, H., 129
695 Vats, Sakshi, 79 Velmurugan, T., 433 Venkateswararao, B., 29 Vinayak, Neha, 553
W Waghmare, Prachi P., 521
Y Yamuna, S. M., 507