490 35 16MB
English Pages 749 [734] Year 2021
Advances in Intelligent Systems and Computing 1380
Tarun K. Sharma Chang Wook Ahn Om Prakash Verma Bijaya Ketan Panigrahi Editors
Soft Computing: Theories and Applications Proceedings of SoCTA 2020, Volume 1
Advances in Intelligent Systems and Computing Volume 1380
Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Nikhil R. Pal, Indian Statistical Institute, Kolkata, India Rafael Bello Perez, Faculty of Mathematics, Physics and Computing, Universidad Central de Las Villas, Santa Clara, Cuba Emilio S. Corchado, University of Salamanca, Salamanca, Spain Hani Hagras, School of Computer Science and Electronic Engineering, University of Essex, Colchester, UK László T. Kóczy, Department of Automation, Széchenyi István University, Gyor, Hungary Vladik Kreinovich, Department of Computer Science, University of Texas at El Paso, El Paso, TX, USA Chin-Teng Lin, Department of Electrical Engineering, National Chiao Tung University, Hsinchu, Taiwan Jie Lu, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW, Australia Patricia Melin, Graduate Program of Computer Science, Tijuana Institute of Technology, Tijuana, Mexico Nadia Nedjah, Department of Electronics Engineering, University of Rio de Janeiro, Rio de Janeiro, Brazil Ngoc Thanh Nguyen , Faculty of Computer Science and Management, Wrocław University of Technology, Wrocław, Poland Jun Wang, Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong
The series “Advances in Intelligent Systems and Computing” contains publications on theory, applications, and design methods of Intelligent Systems and Intelligent Computing. Virtually all disciplines such as engineering, natural sciences, computer and information science, ICT, economics, business, e-commerce, environment, healthcare, life science are covered. The list of topics spans all the areas of modern intelligent systems and computing such as: computational intelligence, soft computing including neural networks, fuzzy systems, evolutionary computing and the fusion of these paradigms, social intelligence, ambient intelligence, computational neuroscience, artificial life, virtual worlds and society, cognitive science and systems, Perception and Vision, DNA and immune based systems, self-organizing and adaptive systems, e-Learning and teaching, human-centered and human-centric computing, recommender systems, intelligent control, robotics and mechatronics including human-machine teaming, knowledge-based paradigms, learning paradigms, machine ethics, intelligent data analysis, knowledge management, intelligent agents, intelligent decision making and support, intelligent network security, trust management, interactive entertainment, Web intelligence and multimedia. The publications within “Advances in Intelligent Systems and Computing” are primarily proceedings of important conferences, symposia and congresses. They cover significant recent developments in the field, both of a foundational and applicable character. An important characteristic feature of the series is the short publication time and world-wide distribution. This permits a rapid and broad dissemination of research results. Indexed by DBLP, INSPEC, WTI Frankfurt eG, zbMATH, Japanese Science and Technology Agency (JST). All books published in the series are submitted for consideration in Web of Science.
More information about this series at http://www.springer.com/series/11156
Tarun K. Sharma · Chang Wook Ahn · Om Prakash Verma · Bijaya Ketan Panigrahi Editors
Soft Computing: Theories and Applications Proceedings of SoCTA 2020, Volume 1
Editors Tarun K. Sharma Department of Computer Science Shobhit University Gangoh Gangoh, Uttar Pradesh, India Om Prakash Verma Department of Instrumentation and Control Engineering Dr. B. R. Ambedkar National Institute of Technology Jalandhar, Punjab, India
Chang Wook Ahn Gwangju Institute of Science and Technology Gwangju, Korea (Republic of) Bijaya Ketan Panigrahi Department of Electrical Engineering Indian Institute of Technology Delhi New Delhi, Delhi, India
ISSN 2194-5357 ISSN 2194-5365 (electronic) Advances in Intelligent Systems and Computing ISBN 978-981-16-1739-3 ISBN 978-981-16-1740-9 (eBook) https://doi.org/10.1007/978-981-16-1740-9 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Preface
This book stimulated discussions on various emerging trends, innovation, practices, and applications in the field of soft computing, ranging from image processing, healthcare, medicine, supply chain management and cryptanalysis. This book that we wish to bring forth with great pleasure is an encapsulation of research papers, presented during the three-day International Conference on fifth International Conference on Soft Computing: Theories and Applications (SoCTA 2020) organized in Virtual format in association with STEM Research Society. We hope that the effort will be found informative and interesting to those who are keen to learn on technologies that address to the challenges of the exponentially growing information in the core and allied fields of soft computing. We are thankful to the authors of the research papers for their valuable contribution to the conference and for bringing forth significant research and literature across the field of soft computing. Offering valuable insights into soft computing for teachers and researchers alike, the book will inspire further research in this dynamic field. We express special thanks to Springer and its team for their valuable support in the publication of the proceedings. With great fervor, we wish to bring together researchers and practitioners in the field of soft computing year after year to explore new avenues in the field. Gangoh, India Gwangju, Korea (Republic of) Jalandhar, India New Delhi, India
Tarun K. Sharma Chang Wook Ahn Om Prakash Verma Bijaya Ketan Panigrahi
v
About SoCTA Series
SoCTA (Soft Computing: Theories and Applications) is now a four-year young international conference. SoCTA was coined in the year 2016 in technical collaboration with mMachine Intelligence Research (MIR) Labs, USA, with an aim of, to highlight the latest advances, problems, and challenges and to present the latest research results in the field of soft computing with a link to scientific research and its practical implementation. SoCTA especially encourages the young researchers at the beginning of their career to participate in this conference and invite them to present their work on this platform. The objective of SoCTA is to provide a common platform to researchers, academicians, scientists, and industrialists working in the area of soft computing to share and exchange their views and ideas on the theory and application of soft computing techniques in multidisciplinary areas. Previous SoCTA series were successfully organized at the following venues: • SoCTA-2016: Amity University, Rajasthan, Jaipur, India. • SoCTA-2017: Bundelkhand University, Jhansi, Uttar Pradesh, India. • SoCTA-2018: Dr. B. R. Ambedkar National Institute of Technology, Jalandhar, Punjab, India. • SoCTA-2019: National Institute of Technology Patna, Bihar, India. Due to the pandemic and keeping in mind the health issue of the research fraternity, the organizing committee has decided to host the SoCTA-2020 conference in a virtual format. Also SoCTA-2020 is dedicated to the corona warriors, especially in the field of academics and research. The tagline for this year is Virtual Meet—Real Connections SoCTA-2020 is organized with the technical support of Dr B. R. Ambedkar National Institute of Technology, Jalandhar, India, and in association with a recently
vii
viii
About SoCTA Series
introduced Science, Technology, Engineering and Management (STEM)-Research Society. The proceedings of all the previous years of SoCTA Series were published in Advances in Intelligent Systems and Computing (AISC), Series of Springer, Indexed in SCOPUS. The credit of the success of the SoCTA series goes to our mentors, keynote and invited speakers, chief guests, guest of honor(s), members of the advisory board (national and international), program committee members, Springer as a publishing partner, all the author(s), participants, and the reviewer’s board. We sincerely appreciate your continued support, encouragement, and trust in us. We look forward to have this wonderful support in the coming SoCTA series as well. We are glad to inform you that the next in the SoCTA Series i.e. SoCTA-2021 is scheduled at the Indian Institute of Information Technology (IIIT) Kota, Rajasthan (MNIT Jaipur Campus). Looking forward to have your significant contribution in SoCTA series…
About STEM-Research Society
Est: 2020
The STEM-Research Society, a foundation, is registered in the year 2020 to support and promote research in the multidisciplinary domain under the able guidance of renowned academicians and researchers from India and abroad. The objective of the foundation is scientific, technical, research, and educational in nature. The foundation strives to advance the theory, practice, and application of science, technology, engineering and management and maintains a high professional standing among its members. The basic purpose of the STEM-RS is to bring together researchers, academicians, industrialists, and experts from different parts of the country and abroad to exchange knowledge and ideas at a common platform by organizing national and international events such as conferences, seminars, and workshops that unite the science, technology, engineering, and management and topics which are not mentioned here for the empowerment of research and development.
Vision The STEM-RS foundation will build a dynamic, interactive, global community of researchers, academicians, and industrialists to advance excellence in science, technology, engineering, and management.
ix
x
About STEM-Research Society
Mission The STEM-RS is a foundation of interested peoples worldwide that promotes research for the advancement of society in various spheres and the quality of life.
Values Being visionary, dynamic, interdisciplinary, inclusive, egalitarian and promoting research in all spheres of human life.
Diversity Statement Diversity drives innovation. STEM-RS engage all demographic teams worldwide in advancing science, technology, engineering, and management to improve the quality of life. All the Author(s) would be given a free membership for one year. Please keep visiting the Web site for recent updates: www.stemrs.in/
Message from Conveners It is our great pleasure to welcome you to the International Conference on Soft Computing: Theories and Applications (SoCTA-2020) in a virtual format. Soft computing methods are increasingly applied to solve problems in diverse domains. Hence, SoCTA is appropriately conceived to offer a forum to bring all such applied researchers together under one umbrella. SoCTA is now almost five years old, which means more diligent handling and sense of responsibility are required for continuous improvement and growth. There are no SoCTA series and SoCTA-2020 without the quality contributions made by the authors. In addition, SoCTA-2020 is very fortunate to have so many top-quality panel, keynote speakers in this tough time of pandemic COVID-19. We sincerely thank them all. We are particularly looking forward to the invited talks. We are delighted to have such a strong and varied series of plenary talks at the conference. The underlying philosophy motivating this conference, which has become a flagship forum in the area of mathematics and computer science, in general, and in the area of soft computing, in particular, has been to bring together researchers who apply, besides conventional traditional computing techniques, soft and other novel computing paradigms to problems and situations that have hitherto been intractable, complex, highly nonlinear,
About STEM-Research Society
xi
and difficult to solve. Soft computing is a cutting-edge field of research in which one of the main inspirations for problem solving is based on, for example, natural or biological systems that tend to be decentralized, is adaptive and is known to be environmentally aware, and as a result they have survivability, scalability, and flexibility properties. In addition to work on traditional serial computers, these researchers also exploit methods of efficiency with parallel computing techniques and tools to achieve high-performance computing capabilities in their work. There are two further key features of this conference series that make this a unique event; i.e., these events are “go-green” environmentally friendly conferences where the emphasis is on the quality of academic endeavor rather than spin and gloss; these events see participation from a large number of young researchers and particularly women scientists which is an important aspect if we are to increase female participation in science, technology, engineering, and mathematics (STEM) areas. Conferences like these are only possible thanks to the hard work of a great many people, and the successful organization of SoCTA-2020 has required the talents, dedication, and time of many volunteers and strong support from sponsors. Chairs of each event contributed exceptionally by attracting contributions, getting them reviewed, making accept and reject recommendations, developing the programs, and so on. We also thank the national and international advisory committee. Publication of SoCTA-2020 proceedings is not a simple task. Committee has contributed immensely. We are as ever grateful to Springer Plc. for their dedication and professionalism in helping us produce what is an excellent and high-quality proceedings. We also give our sincere thanks to all our colleagues on the organizing committee for their sincere work and support throughout the year. We are very grateful to the technical sponsors who have supported the conference despite the continuing difficult pandemic conditions. We thank you for participating in the conference and making it a success. We hope that all of you will benefit from the extensive technical program (in online mode) and establish long-lasting interactions with fellow delegates at SoCTA-2020.
Contents
Detection of Denial of Service Attack Using Deep Learning and Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sangeeta Saha, Neema Singh, and Bhawana Rudra Fake Profile Detection and Stalking Prediction on Facebook . . . . . . . . . . . Mummadi Swathi, Ashley Anoop, and Bhawana Rudra Empirical Evaluation of NSGA II, NSGA III, and MOEA/D Optimization Algorithms on Multi-objective Target . . . . . . . . . . . . . . . . . . . Priyanka Makkar, Sunil Sikka, and Anshu Malhotra
1 13
23
Moving Skills—A Contributing Factor in Developmental Delay . . . . . . . . Sonali Gupta, Akshara Pande, and Swati
33
Estimation of Wind Speed Using Machine Learning Algorithms . . . . . . . Sonali Gupta, Manika Manwal, and Vikas Tomer
41
A Comparative Study of Supervised Learning Techniques for Remote Sensing Image Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ashish Joshi, Ankur Dhumka, Yashikha Dhiman, Charu Rawat, and Ritika
49
Postal Service Shop Floor—Facility Layout Evaluation and Selection Using Fuzzy AHP Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S. M. Vadivel, A. H. Sequeira, and Sunil Kumar Jauhar
63
Wireless Motes Outlier Detection Taxonomy Using ML-Based Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Isha Pant and Ashish Joshi
75
A Proposed IoT Security Framework and Analysis of Network Layer Attacks in IoT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Neha Gupta and Umang Garg
85
Cloud Data Storage Security: The Challenges and a Countermeasure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kamlesh Chandra Purohit, Mahesh Manchanda, and Anuj Singh
97
xiii
xiv
Contents
Comparative Analysis of Numerous Approaches in Machine Learning to Predict Financial Fraud in Big Data Framework . . . . . . . . . . 107 Amit Gupta and M. C. Lohani Low-Cost Automated Navigation System for Visually Impaired People . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Chetan Bulla, Sourabh Zutti, Sneha Potadar, Swati Kulkarni, and Akshay Chavan Blockchain Platforms and Interpreting the Effects of Bitcoin Pricing on Cryptocurrencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Nitima Malsa, Vaibhav Vyas, and Jyoti Gautam A Design of a Secured E-voting System Framework for Poll-Site Voting in Ghana . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 Samuel Agbesi Pattern Matching Using Face Recognition System . . . . . . . . . . . . . . . . . . . . 161 Sandeep Kumar Srivastava, Sandhya Katiyar, and Sanjay Kumar A Fuzzy-Based Support Vector Regression Framework for Crop Yield Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 Uduak Umoh, Daniel Asuquo, Imoh Eyoh, Abdultaofeek Abayomi, Emmanuel Nyoho, and Helen Vincent A Mathematical Study of Hepatitis C Virus Model During Drug Therapy Treatment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 Yogita and Praveen Kumar Gupta Transformation of Medical Imaging Using Artificial Intelligence: Its Impact and Challenges with Future Opportunities . . . . . . . . . . . . . . . . . 201 Richa Gupta, Vikas Tripathi, Amit Gupta, and Shruti Bhatla A Keyword-Based Multi-label Text Categorization in the Indian Legal Domain Using Bi-LSTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 V. Vaissnave and P. Deepalakshmi Application of Deep Learning Techniques in Cyber-Attack Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 Priyanka Dixit and Sanjay Silakari Rederiving the Upper Bound for Halving Edges Using Cardano’s Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 Napendra Solanki, Pintu Chauhan, and Manjish Pal Online Teaching During COVID-19: Empirical Evidence During Indian Lockdown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 V. M. Tripathi and Ambica Prakash Mani An Ensemble-Based Method for Predicting Facebook Check-ins . . . . . . . 263 Shobhana Kashyap and Avtar Singh
Contents
xv
Modelling and Structural Analysis for Prosthesis Hip Design Using ANSYS with Finite Element Method . . . . . . . . . . . . . . . . . . . . . . . . . . 287 Sonam Tanwar and Ruhi Sharma Indian Sign Language Recognition Using a Novel Feature Extraction Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299 Ashok Kumar Sahoo, Pradeepta Kumar Sarangi, and Rajeev Gupta A Formal Study of Shot Boundary Detection Approaches—Comparative Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 Hanisha Nankani, Mehul Mahrishi, Sudha Morwal, and Kamal Kant Hiran Predicting Hospital Bed Requirements for COVID-19 Patients in Mumbai City and Mumbai Suburban Region . . . . . . . . . . . . . . . . . . . . . . 321 Narayana Darapaneni, Chandrashekhar Bhakuni, Ujjval Bhatt, Khamir Purohit, Vikas Sardana, Prabir Chakraborty, Vivek Jain, and Anwesh Reddy Paduri Job Scheduling on Computational Grids Using Multi-objective Fuzzy Particle Swarm Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 Debashis Dutta and Subhabrata Rath Analysis of Network Performance for Background Data Transfer Using Congestion Control Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349 Jaspreet Kaur, Taranjeet Singh, and Rijwan Khan Validation and Analysis of Metabolic Pathways Using Petri Nets . . . . . . . 361 Sakshi Gupta, Sunita Kumawat, and Gajendra Pratap Singh Approach of Machine Learning Algorithms to Deal with Challenges in Wireless Sensor Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375 Sudha, Yudhvir Singh, Harkesh Sehrawat, and Vivek Jaglan Cross-Domain Recommendation Approach Based on Topic Modeling and Ontology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397 Vikas, Bhawana Tyagi, Vinay Kumar, and Pawan Sharma The Study of Linear and Nonlinear Fractional ODEs by Homotopy Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407 H. Gandhi, A. Tomar, and D. Singh The Comparative Study of Time Fractional Linear and Nonlinear Newell–Whitehead–Segel Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419 H. Gandhi, A. Tomar, and D. Singh Parallel and Distributed Computing Approaches for Evolutionary Algorithms—A Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433 S. Raghul and G. Jeyakumar
xvi
Contents
Motion/Force Control for the Constrained Electrically Driven Mobile Manipulators Based on Hybrid Backstepping Control Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447 Naveen Kumar and Manju Rani Mathematical Interpretation of Fuzzy Information Model . . . . . . . . . . . . . 459 Bazila Qayoom and M. A. K. Baig Methodological Development for Time-Dependent AHP Using Probability Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467 Arpan Garg and Talari Ganesh Implementation of Speculate Modules and Performance Evaluation of Data Mining Clustering Techniques on Air Quality Index and Health Index to Predict High-Risk Air Polluted Stations of a Metropolitan City Using R Programming . . . . . . . . . . . . . . . . . . . . . . . . 477 N. Asha and M. P. Indira Gandhi Automated Gait Classification Using Spatio-Temporal and Statistical Gait Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491 Ratan Das, Preeti Khera, Somya Saxena, and Neelesh Kumar Real-Life Applications of Soft Computing in Cyber-Physical System: A Compressive Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501 Varsha Bhatia, Vivek Jaglan, Sunita Kumawat, and Kuldeep Singh Kaswan A Study on Stock Market Forecasting and Machine Learning Models: 1970–2020 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515 Pradeepta Kumar Sarangi, Muskaan, Sunny Singh, and Ashok Kumar Sahoo Discussion on the Optimization of Finite Buffer Markovian Queue with Differentiated Vacations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523 M. Vadivukarasi, K. Kalidass, and R. Jayaraman Stability Analysis of HJB-Based Optimal Control for Hybrid Motion/Force Control of Robot Manipulators Using RBF Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535 Komal Rani and Naveen Kumar RBF Neural Network-Based Terminal Sliding Mode Control for Robot Manipulators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547 Ruchika and Naveen Kumar An In-Memory Physics Environment as a World Model for Robot Motion Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559 Navin K. Ipe and Subarna Chatterjee Motion Model and Filtering Techniques for Scaled Vehicle Localization with Fiducial Marker Detection . . . . . . . . . . . . . . . . . . . . . . . . . 571 Kyle Coble, Akanshu Mahajan, Sharang Kaul, and H. P. Singh
Contents
xvii
Analysis of Liver Disorder by Machine Learning Techniques . . . . . . . . . . 587 Sushmit Pahari and Dilip Kumar Choubey Various Techniques of Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 603 Reshu Agarwal, Annu Malik, Tanya Gupta, and Shylaja VinayKumar Karatangi Fog–Cloud-Assisted Internet of Things: A Review of Workload Allocation and Latency Management Techniques . . . . . . . . . . . . . . . . . . . . . 613 Upma Arora and Nipur Singh Artificial Neural Network, Convolutional Neural Network Visualization, and Image Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623 Ankur Seem, Arpit Kumar Chauhan, and Rijwan Khan A Study on RPL Protocol with Respect to DODAG Formation Using Objective Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633 Sakshi Garg, Deepti Mehrotra, and Sujata Pandey An Ensemble Learning Approach for Brain Tumor Classification Using MRI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645 Ranjeet Kaur, Amit Doegar, and Gaurav Kumar Upadhyaya Multimodal Emotion Recognition System Using Machine Learning and Psychological Signals: A Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 657 Rishu, Jaiteg Singh, and Rupali Gill Drowsiness Image Detection Using Computer Vision . . . . . . . . . . . . . . . . . . 667 Udbhav Bhatia, Tshering, Jitendra Kumar, and Dilip Kumar Choubey Implementing Deep Learning Algorithm on Physicochemical Properties of Proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685 Charu Kathuria, Deepti Mehrotra, and Navnit Kumar Misra Locking Paradigm in Hierarchical Structure Environment . . . . . . . . . . . . 695 Swati, Shalini Bhaskar Bajaj, and Vivek Jaglan Ensemble Maximum Likelihood Estimation Based Logistic MinMaxScaler Binary PSO for Feature Selection . . . . . . . . . . . . . . . . . . . . . 705 Hera Shaheen, Shikha Agarwal, and Prabhat Ranjan Automatic Identification of Medicinal Plants Using Morphological Features and Active Compounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 719 Saakshi Agrawal and Sowmya Yellapragada A Prototype IoT Management System to Control Grid-Parallel Distribution of Localised Renewable Energy for Housing Complexes in New-Normal Era . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 733 Sandip Das, Abhinandan De, and Niladri Chakraborty Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747
About the Editors
Dr. Tarun K. Sharma holds Ph.D. in Soft Computing in 2013 from IIT Roorkee. Since June 2020, he is associated with Shobhit University, Gangoh, Saharanpur as a Professor and Head of CSE and Dean—School of Engineering and Technology. Earlier he worked with Amity University Rajasthan as an Associate Professor and Head—Department of Computer Science and Engineering/IT as well as Alternate Director—Outcome. He has supervised two Ph.D., seven M.Tech. Dissertations, several MCA and B.Tech. Projects. He has over 80 research publications in his credit. He has been to Amity Institute of Higher Education Mauritius on deputation. He has availed grants from Microsoft Research India, CSIR, New Delhi and DST New Delhi to visit Australia; Singapore and Malaysia respectively. He is a founding member of International Conference on Soft Computing: Theories and Applications (SoCTA Series) and Congress on Advances in Materials Science and Engineering (CAMSE). He has edited 5 volumes of Conference Proceedings published by AISC series of Springer (SCOPUS) Publication and two edited books with Asset Analytics, Springer. Prof. Chang Wook Ahn received the Ph.D. degree from the Department of Information and Communications, GIST, in 2005. From 2005 to 2007, he was with the Samsung Advanced Institute of Technology, South Korea. From 2007 to 2008, he was a Research Professor at GIST. From 2008 to 2016, hewas an Assistant/Associate Professor with the Department of Computer Engineering, Sungkyunkwan University (SKKU), South Korea. He is currently a Director of MEMI (Meta-Evolutionary Machine Intelligence) Lab and a Professor with the School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), South Korea. His research interests include genetic algorithms/programming, multi-objective optimization, evolutionary neural networks, and quantum machine learning. He has been Guest Edited various thematic issues in Journal of repute. Dr. Om Prakash Verma is presently associated with Dr. B. R. Ambedkar National Institute of Technology Jalandhar, Punjab, India since January 2018 as an Assistant Professor in the Department of Instrumentation and Control Engineering. He has almost 11 years of teaching experience. He did his Ph.D. from IIT Roorkee, xix
xx
About the Editors
M.Tech. from Dr. B. R. Ambedkar NIT Jalandhar and B.E. from Dr. B. R. Ambedkar University Agra. He is presently working on ISRO Sponsored Project as a PI. He has edited a book on Soft Computing: Theories and Applications and has been a potential reviewer of several International Journals of high repute. He has published more than 30 research papers in SCI/Scopus/ESI indexed Journals. He has published recently published a paper in Renewable and Sustainable Energy Reviews, (IF: 12.110). He has guided four M.Tech. Students and supervising six Ph.D. Students. Prof. Bijaya Ketan Panigrahi is a Professor in the Department of Electrical Engineering, Indian Institute of Technology Delhi. His research interests include the security of cyber physical systems, digital signal processing, and soft computing applications to power systems. Professor Panigrahi received a Ph.D. in electrical engineering. He is an associate editor for IEEE Systems Journal and a Senior Member of the IEEE. Professor Panigrahi was elected Fellow of INAE in 2015. He is in teaching and research profession since 1990. He has published more than 500 Journals Articles in Journals of high repute. He has edited several conference proceedings with publishers of repute like Springer. He has file more than six patents. As per the Google Scholar his H-Index is 55 with more than 12650 Citations and CrossRef is above 7550. His research interest includes Power System Planning, Operation and Control, Machine Intelligence and Evolutionary Computing.
Detection of Denial of Service Attack Using Deep Learning and Genetic Algorithm Sangeeta Saha, Neema Singh, and Bhawana Rudra
Abstract One of the most common Internet attacks causing significant economic losses in recent years is the denial of service (DoS) attack. Users and Internet service providers (ISPs) are constantly affected by this attack. Nowadays, this cyber threat and the number of attackers both are growing, even though there is a rapid development of new protection and detection technologies. So developing mechanisms to detect this cyber threat is a current challenge in network security. This paper presents a genetic algorithm followed by a deep learning technique and some classifier for the detection of DoS attack. A multilayer perceptron (MLP), neural network classifier, decision tree classifier and support vector machine (SVM) classifier are used, respectively, for the detection of DoS attack. The aim of this paper is to detect DoS attacks with and without using genetic algorithm (GA) with classifier and prove, in terms of accuracy, that genetic algorithm with classifier is able to provide a better performance than the case where genetic algorithm is not used in the detection procedure. Keywords Denial of service attack · Decision tree · Genetic algorithm · Multilayer perceptron · Support vector machine
1 Introduction In recent years, as per the information security reports, denial of service (DoS) attacks have caused significant financial losses to industries and governments worldwide. As the Internet expands, more aspects of our daily life depend upon the Internet more than ever before, may it be from social connections to payments to everywhere [1]. As a result of this, the number of malicious attacks has also grown in both type and quantity. Therefore, the need to identify these types of attacks and prevent them, various techniques are coming into picture. DOS attacks aim to gain unauthorized access to user systems and information through building traffic over the network by generating multiple request flows simultaneously and making systems unavailable S. Saha · N. Singh (B) · B. Rudra National Institute of Technology, Surathkal, Karnataka, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_1
1
2
S. Saha et al.
to users [2, 3]. Denial of service (DOS) attacks are performed by a single person or a system, whereas when these attacks are performed by more than one person, or bots, it is called distributed denial of service (DDoS) attack. So, DDoS attacks could be considered as a particular type of DoS attack itself. DDOS attacks are carried out in two phases: intrusion and attack. During the intrusion phase, attackers install DDOS attack tools on different networks’ hosts. In the attack phase, triggering is made to attack the target network [6]. There are many types of DoS attacks; most common [8] are ICMP, UDP, SYN and HTTP flood, in order to disturb the target clients, by exhausting their network resources. This could also be launched to exhaust the server resources like server’s socket, port, memory, database or input output bandwidth. In the former case, the attack is network level flooding, and in the latter case, it is referred to as application level DoS flooding which is typically performed on a HTTP webpage. The main purpose of the attack detection system is to differentiate malicious behavior from normal traffic. This kind of research is done in the area of cybersecurity, during which the character of the new attack is not known beforehand and appears on real time, on a never ending basis. This should be meeted out in a very flexible and effective way by constructing the model of any form of attack which will influence the network, or just by constructing a standard traffic common model. Omer KASIM [7] also states that we can improve the accuracy of the model by feature selection and hyperparameters presented in the dataset to improve the results.
2 Literature Survey Shreekhand and Deepak [1] used multilayer perceptron (MLP) and random forest (RF) for the detection of DOS attacks. In this work, a randomly selected 50% dataset was used for training and the rest for testing. They also took training and testing datasets in different ratios and checked accuracy for both the classifiers. The system was tested with the help of CIC IDS 2017 dataset. It was observed that MLP was able to provide more accuracy than RF in the testing phase for any new unidentified patterns. In their case, accuracy was considered as the major factor for comparison between the classifiers. Siva Sankari et al. proposed a method that presented a genetic algorithm (GA) to detect the denial of service attack. The collected data was split into training and testing sets, and then extraction of features over the dataset was done. From the extracted features, normalization was performed on the initial population, using which, the normalized status was calculated. The population was evolved by the number of generations while increasingly improving the behavior of the individuals by increasing the fitness value as the measure of quality. Initially, the numbers of bestfit individuals were selected based on a user-defined fitness function. The remaining entities were selected and paired with each other. In the selection phase, population individuals with better fitness were selected, whereas others were removed. A crossover, i.e., a process where each couple of entities that were selected accidentally
Detection of Denial of Service Attack Using Deep Learning …
3
participated in exchanging their parents with each other until a total new population was generated, was used to form new strings in order to obtain a better string. Mutation was used to add new data in a random way to the genetic process and eventually helped to avoid getting fascinated. Mutation was the process of randomly disappointing genetic information. So by using GA, they were able to make better feature extraction and improve their accuracy to detect DoS attack [2]. Imam et al. [3] proposed a DoS detection system based on network features which was produced from statistical extraction and combined the same with an artificial neural network (ANN) method as a detection engine with training function variation. The supervised learning decision tree and Naive Bayes methods, respectively, were also used to detect backscatter data flow from DoS attacks based on the Center for Applied Internet Data Analysis (CAIDA) 2008 dataset. Dataset for ANN training phase was divided into 70% sets for training, 15% sets for validation and 15% sets for testing. They compared, based on classification results, using parameters of accuracy, mean-squared error (mse) and iteration. DoS attack detection could be done effectively with the artificial neural networks (ANN) alongside the appropriate training functions. The study was able to find the best DoS detection accuracy of 99.2% given by ANN by the number of hidden layer neurons 2n + 1, where n was the number of input neurons with quasi-Newton (MATLAB trainlm) training function. They concluded that the quasi-Newton training function could give better accuracy in comparison with all other training functions. Sumathi and Karthikeyan [4] predicted the DOS attack in the network with the help of a deep neural network (DNN) classifier that included a cost minimization strategy for assets available in public platforms. Performance metrics like average delay, detection accuracy, packet loss, cost per sample, packet delivery ratio, overhead, throughput, etc., were used for analyzing the performance of the model. The simulation results showed that the DNN cost minimization algorithm was able to provide a far better performance in terms of the detection accuracy that became 99% with very less false reduction, a high average delay, less packet loss, less overhead, high packet delivery ratio and high throughput as compared to the already existing algorithm. The model is predicted with an accuracy of 98.9, 98.1 and 99 for the KDD dataset, mixed dataset and SSE dataset, respectively. The training of this DNN was purely based on the features of an unsupervised leaning technique making the use of an auto-encoder in the pre-training phase and the process of back propagation neural network in the fine-tuning phase. Wang et al. [5] worked on the multilayer perceptron (MLP) in order to get a high efficiency by reduction in the number of the features used in training. Selection of features was performed to select a subset of the features that could perform best under a certain criterion of assessment. Feature selection was classified into wrappers, filters and embedded methods. The model was iteratively evaluated with the help of various features combinations in the form of inputs and the traditional wrapper approach aimed to find an optimal subset. They were able to create a closed-loop system in order to solve the issue by adding a feedback mechanism that would perceive the errors in the phase of detection. Their work was limited to the MLP model, and they assumed that the basic structure of this feedback could be extended to various other
4
S. Saha et al.
techniques of machine learning-based detection. The key goal was to increase the availability of modern methods for the detection purpose based on machine learning. MLP model could provide an accuracy of 98% when the selected features were under such categories of source IP address, TCP sequence, source port, destination port and TCP flags.
3 Proposed Method 3.1 Multilayer Perceptron Internet attack detection is one of the crucial areas that is popular among researchers for a protracted time. We are using one in every of the deep learning classifier which is multilayer perceptron (MLP) or a feed forward neural network (FFNN) which could be a class of feed forward artificial neural networks. A MLP consists of at least three layers of nodes: 1. 2. 3.
An input layer A hidden layer An output layer.
Input file is fed into the input layer and receiving the output from the output layer. The quantity of the hidden layer was increased to the maximum amount, to form the model more complex in line with our task. Except for the input nodes, each node could be a neuron that uses a nonlinear activation function. MLP uses a supervised learning technique called back propagation for training. Its multiple layers and nonlinear activation distinguish MLP from a linear perceptron. It is best suitable for nonlinearly separable data. All neurons are ready to update when passing through the layer as per given weights and biases. Weight update equation is shown below: w = w + l∗ (ex − pre)∗ x w = weight. l = learning rate. ex = expected output. pre = predicted output. x = input. Feed forward network is the most common neural network model. Its goal is to approximate some function f (). Given, as an example, a classifier y = f (x) that maps an input x to an output class y, the MLP finds the most effective approximation thereto classifier by defining a mapping, y = f (x;) and learning the most effective parameters for it. The MLP networks are composed of the many functions that are chained together. A network with three functions or layers would form f (x) =
Detection of Denial of Service Attack Using Deep Learning …
5
Fig. 1 Multilayer perceptron
f (3)(f (2)(f (1)(x))). Each of those layers consists of units that perform a transformation of a linear sum of inputs. Each layer is represented as y = f (W ×T + b), where f is the activation function, W is the set of parameters or weights within the layer, x is the input vector which may even be the output of the previous layer and b is the bias vector. The layers of an MLP consist of several fully connected layers because each unit during a layer is connected to any or all the units within the previous layer. During a fully connected layer, the parameters of every unit are independent of the remainder of the units within the layer, which implies, each unit possesses a singular set of weights (Fig. 1).
3.2 Decision Tree Decision tree is one of the predictive modeling approaches used in statistics, data mining and learning. Decision trees are constructed via an algorithmic approach that identifies ways to split a data set based on different conditions. It is one of the most widely used and practical methods for supervised learning. Decision trees are a nonparametric supervised learning method used for both classification and regression tasks. Tree models where the target variable can take a discrete set of values are called classification trees. Decision trees where the target variable can take continuous values (typically real numbers) are called regression trees.
3.3 Support Vector Machine SVM is a part of the supervised learning method. SVM can be applied to regression as well as classification-related problems. For classification, the SVM draws a hyperplane, thereby separating the data points into d different classes, also ensuring that the difference between the two classes’ nearest points to the hyperplane is maximum.
6
S. Saha et al.
Fig. 2 Working of SVM
SVM when used in regression is similar to hand drawing a line or the hyperplane somewhere in the center of the input data points ensuring that the distance of the points from the line is minimum. In such case, SVM tries to minimize the generalization error, instead of minimizing the training error. Figure 2 is a diagrammatic representation of the working of SVM.
3.4 Genetic Algorithm Genetic algorithm is used to identify a good subset of features which was later used by an ensemble classifier to classify network traffic as good or bad. Ensemble classifiers do not learn a single classifier but learn a set of classifiers. They combine the predictions of multiple classifiers. This helps in reducing the dependence on the peculiarities of a single training set and reduces bias introduced by a single classifier. Different types of ensemble methods are commonly used which involve manipulating the data distribution, manipulating the input features, manipulating the class labels or introducing randomness into the learning algorithm. Bagging and boosting are commonly used to modify the data distribution. The base classifiers have to satisfy the criteria that the classification errors made by the classifiers have to be as uncorrelated as possible. The purpose of the genetic algorithm is to select a subset of the features to be used by the ensemble classifier, train and test the ensemble classifier and calculate its fitness. The search component is a GA, and the evaluation component is an ensemble classifier. The initial population is randomly generated, and each individual has a subset of the 41 features present in the training
Detection of Denial of Service Attack Using Deep Learning …
7
data set. Each individual is then evaluated using an ensemble classifier. Once the top individuals from a generation are found, crossover from the parents creates the offspring, and some mutation is performed on the child to maintain some diversity in the population. The Weka library is used for the implementation of the ensemble classifier. The library is a collection of implementation for various data mining algorithms including classification algorithms.
3.5 Training and Testing the Model There are basically three steps in the training of the model. • Forward pass In this step, i.e., training the model, the input is passed to the model and multiplies with weights, adds bias at every layer and finds the calculated output of the model. • Calculate error or loss When the information instance is passed, output will be obtained from the model that is called the predicted output. Label is assigned based on the information obtained, i.e., the real or expected output. Based upon these, calculate the loss and then perform back propagation. There are various loss functions, supporting the output and requirement. • Backward pass After calculating the loss, back propagates the loss and updates the weights of the model by using gradient descent. This is the main step in training the model. In this step, weights will adjust according to the gradient flow in that direction. Training the models is done with 70% of the entire dataset specific for the kind of atack. Testing the models is done with the remaining 30% of the dataset.
3.6 Process to Detect DOS Attacks Using GA with Classifiers • The entire KDD CUP dataset is loaded irrespective of the type of attack mentioned for the packet in it. Dataset included three different types of packets of TCP, UDP and ICMP, respectively. The same could be depicted from the column of’protocol type. • The system considers only packets with’srv serror rate’ greater than 70%, that is, if the same host is sending packets to the same service over a stretch of time, only those packets were considered. • Feature extraction was performed using GA to obtain only the subset of features needed for detection of attacks • As the classifiers are not able to classify character sequence values, the’service’ column was converted to numeric values. Service is the type of service requested
8
S. Saha et al.
by the DoS attack request, like http or telnet, etc. The dataset had these values in the form of texts. • When working with classifiers without using genetic algorithm, all the features are used for DoS attack detection, whereas when apply classifiers with genetic algorithm, perform the feature extraction using GA which reduces some of the features used in the implementation process of DoS attack detection in order to avoid the problem of overfitting. • Replace all classes of attacks with 1 and normal result (no attacks) with 0 in our dataset csv files, respectively, so that the only data that was dealing with the attacks to train our models. • Henceforth, the models are trained and tested to get the accuracy of prediction based on the MLP, decision tree and SVM classifiers, respectively. The above process can be clearly seen in Fig. 3.
Fig. 3 Process to detect DOS attacks
Detection of Denial of Service Attack Using Deep Learning …
9
4 Results 4.1 Experimental Dataset The DoS attack KDD CUP dataset is used in the experiment from the year 2017. The datasets consist of two types of data: training and testing. Each raw of the dataset represents a connection between two network hosts like source and destination and is described by 42 attributes (39 continuous or discrete numerical attributes and 3 categorical attributes). This set of attributes also includes general TCP features like duration, protocol type, service, src bytes, dst bytes, flag, land, wrong fragment and urgent and derived features like the same host features and the same service features.
4.2 Experimental Environment The experiment was carried out by creating ipynb files, coding in Python language. The files have been executed on Google Colab, setting runtime hardware accelerator to Tensor Processing Unit (TPU) in order to increase the speed of execution.
4.3 Experimental Results For the implementation of the model, SVM classifier, decision tree classifier and the deep learning algorithm of MLP classifier, with and without the use of genetic algorithm were used for feature extraction purpose. The proposed system is used to classify the packet as normal or under DoS attack packet, respectively. A comparison is performed based on accuracy of the model obtained. Accuracy of the model can be calculated as follows: Accuracy = (X + O)/(M + N)
(1)
where X = True Positive, O = True Negative, M = Condition Positive = X + O, N = Condition Negative = U + V, U = False Positive and V = False Negative. X = True Positive is activated when the positive label record is classified as a positive record. O = True Negative is activated when the negative label record is classified as a negative record.
10 Table 1 Comparison of accuracy
S. Saha et al. Classifier
Accuracy without GA Accuracy with GA (%) (%)
MLP
97.32
99.11
Decision tree 96.68
98.23
SVM
95.28
94.37
Fig. 4 Comparison of accuracy
M = Condition Positive is the term used for the total number of True Negative (X) and True Positive (O). U = False Positive is the term which is triggered when the positive labeled record is classified as a negative record. V = False Negative is referred when a negative labeled record is classified as a positive label record. All the 41 features are used for training of MLP, decision tree and SVM classifiers, respectively. The accuracy of the classifiers is obtained. After this, use genetic algorithm for feature extraction and retrain the classifiers to obtain the next set of accuracies, which is observed to increase in the case of all the three classifiers used, which can be clearly depicted from Table 1 and also from Fig. 4.
5 Conclusion In this paper, SVM classifier, decision tree classifier and the deep learning technique of neural network classifier of MLP, along with genetic algorithm, were implemented for feature extraction purpose, in order to detect DoS attacks. These algorithms efficiently detected application layer DoS attacks and as per the experimental results,
Detection of Denial of Service Attack Using Deep Learning …
11
able to conclude that the all the three classifiers, when used along with the genetic algorithm, are able to provide better accuracies as compared to the cases where the classifiers are applied without using genetic algorithm.
References 1. Wankhede, S., Kshirsagar, D.: “DoS attack detection using machine learning and neural network.” 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA) 2. Siva Sankari, L.K., Wise, D.C.J.W., Priya, B.: “An effective method for detail of service attack detection using genetic algorithm.” (2015) 3. Riadi, I., Sunardi, Muhammad, A.W.: “DDoS detection using artifificial neural network regarding variation of training function.” (2019) 4. Sumathi, S., Karthikeyan, N.: ”Detection of distributed denial of service using deep learning neural network.” J. Ambient Intell. Humanized Comput. (2020) 5. Wang, M., Lu, Y., Qin, J.: ”A dynamic MLP-based DDoS attack detection method using feature selection and feedback.” (2020) 6. Singh, K., Dhindsa, K.S, Nehra, D.: ”T-CAD: a threshold based collaborative DDOS attack detection in multiple autonomous systems.” J. Inf. Secur. Appl. 51 (2020), Article 102457 7. Kasim, O.: “An effificient and robust deep learning based network anomaly detection against distributed denial of service attacks.” p 107390 (24 Oct 2020) 8. Alkasassbeh, M., Al-Naymat, G., Hassanat, A.B., Almseidin, M.: ”Detecting distributed denial of service attacks using data mining techniques.” Int. J. Adv. Comput. Sci. Appl. 7(1) (2016)
Fake Profile Detection and Stalking Prediction on Facebook Mummadi Swathi, Ashley Anoop, and Bhawana Rudra
Abstract The increasing popularity and demand of social media has resulted in connecting people across the globe in a better way. The use of social media platforms to express their views and showcase their day-to-day life is increasing gradually. The activities related to social, business, entertainment and information are being exchanged regularly in social networking. In case of Facebook, there are approximately 1.5 billion users and this count is increasing daily. More than 10 million likes and shares are performed daily on Facebook. Many other networks, like LinkedIn, Instagram, Twitter, Snapchat, etc., are also growing exponentially. But, with all the advancements and growth, several problems are also introduced. Facebook has its own benefits to people but at the same time Facebook is being targeted for many malicious activities such as creating fake profiles to stalk people, online impersonation, etc., which can harm the reputation and invade privacy in online social platform. One of the challenging problems in social network security is to recognize the fake profiles. This has resulted in need of cybersecurity measures and applications to prevent people from cyberbullying such as stalking from fake profiles. In this paper, a framework to classify a Facebook profile as genuine or fake using machine learning techniques is proposed and the same framework will be used for the prediction of stalking. Keywords Facebook · Fake profile · stalking · Facebook Graph API
1 Introduction A social networking site allows the users to communicate with others who has similar interests in terms of personal, professional, backgrounds or with any real-life connections. The online social networks (OSNs) [1] such as Facebook, Instagram
M. Swathi (B) · A. Anoop · B. Rudra Department of Information Technology, National Institute of Technology, Surathkal, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_2
13
14
M. Swathi et al.
and Twitter are the popular platforms for creating and sharing profiles, text, pictures, etc., to find and make friends, thereby allowing the users to communicate with each other across the globe. Corporate companies use this kind of sites for advertisement and promotion of their business. Government organizations use online social networks as a platform to deliver their services to the citizens in an efficient manner and make them aware about various situations happening across the country. As the popularity of these sites is increasing, criminals also gain the information for various purposes. The rising demand of social media platforms such as Facebook to communicate with the world has resulted in the people getting stalked and harassed through these platforms. A large amount of personal information is shared among friends in online social network and also many people do not pay much attention towards the privacy measures on online social networks. Protecting the privacy of an individual user has become one of the important problems because of ransomware attacks, harming and performing any kind of attacks. In recent years, several privacy threats and many malicious activities which exploit privacy of online social networks have been reported. One of such activity is creating fake profile of an individual user which is also known as profile cloning. This attack is performed after the identification of a victim, then creating an account with real name with the use of photograph in the same network. The photographs can be copied and used while registering a cloned account. Names are not unique as people exist with identical names [2–4]. Researches were conducted to find fake accounts using machine learning techniques. With machine learning, the systems can be made to learn how to detect fake profiles. One such method is to detect the fake profiles in social networking sites, the most popular being Facebook using neural networks. The detection results of the same method can further be used to predict the people who are stalking some individual using these fake profiles.
2 Related Work With the increasing popularity of online social networks (OSNs), the users’ privacy has become a major concern. Researchers working in this field have proposed various solutions to preserve the privacy of the people with profiles on OSNs such as Facebook being one of the most popular online social networks having billions of users. In order to protect the information in the network, the fake profiles are to be detected and this can be performed using machine learning [5, 6]. In this approach, the authors have collected the previous data in order to find the fake accounts created by bots and cyborg. The accounts that have more than 30 followers were considered as real accounts and are discarded. For the purpose of training the machine learning model, they created fake accounts and trained the system. Once the differentiation is performed, the model was successfully deployed. The model can differentiate a fake account from a real one when actual data set is used.
Fake Profile Detection and Stalking Prediction on Facebook
15
Another solution to detect the fake profiles on Facebook is FakeBook [3, 7]. The methodology proposed in this approach is used to identify the features which are related to the interactions and network graph properties: • The number of OSNs friends emerged. • The individual profiles cluster around the population mean and consistent rate of accumulation of the friends for each individual profiles, the detection metric raises alert if a variation exists. • The profile behaviour in the data set of real profiles is characterized. The authors have considered average degree of the nodes and the single friends in the group of the online social networks graph. They have come up with a Facebook sensing application that will collect the required statistical information from a profile and displays the result. One of the approaches proposed is finite automation based which is used for recognizing fake identities in online social networks (OSNs) [5, 8]. A detection mechanism called FakeProfilesRecognizer (FPR) is proposed for recognizing and detecting fake profiles in social networks. This recognizer uses regular expression to form friend pattern (FP) for the each and every available profiles in the social graph. This pattern is used to distinguish the profiles by considering the redundancy and duplication in OSNs [9, 10]. The authors in bot detection discussed about the identification of the automated user accounts from normal Twitter accounts [11, 12]. The individuals are identified and differentiated in their relations with others based on the IDs. There still exist the difficulties for the identification and differentiation of the people [13, 14]. There are many methods such as social proximity, SybilGuard, SybilLimit, Sybilproof, SybilInfer, SybilRank, online social network bot detection to detect and analyse the fake profiles and their online social bot [3, 15, 16]. Machine learning methods like POST method, accuracy of detection models, supervised learning, multi-agent perspectives are useful in profile creation and analysis on social networks. A review paper for detecting the spam Twitter account using machine learning suggests to gather Twitter data and then extracts features from data set. Features selected for detection shows more effectiveness in obtaining the correct result. Two classes are considered as spam or non-spam from the sample data set which is performed manually using the filtering services. Machine learning models are trained, with labelled samples and tested to identify classes of particular data instance. Detection models are evaluated by considering the parameters such as accuracy, detection rate, false negative, recall, precision. [4].
3 Proposed Work Fake profile detection is performed using neural network model by training the model with publicly available fake and genuine user data sets. Features are selected and extracted from the data set, and classification algorithm is applied on it. Once the model is trained, it can be tested with user data. For stalking data, the available list
16
M. Swathi et al.
Fig. 1 Flowchart of proposed model
of user ids present in the page source in the InitialChatFriendsList section is used. It is observed that this list gives us the ids, of users who most frequently interacts with our profile in terms of chats, likes, comments, timeline views, etc. Using this id, we can get the user details through Facebook Graph API. Facebook’s Graph API [6] is an API used for accessing the objects and its connections in social graph of Facebook. This API presents consistent view of the social graph, by representing objects in the graph and the connections between these objects. The user data is designated as the stalking data. We use this data as the test data for the prediction in order to verify if the person is stalking from a fake profile or not as shown in Fig. 1. The proposed work is divided into two steps • Fake profile detection • Stalking prediction.
3.1 Fake Profile Detection For fake profile detection, initially features are selected to apply classification algorithms, for example, statuses count, friends count, followers count, sex, language,
Fake Profile Detection and Stalking Prediction on Facebook
17
etc. Neural networks are used for the classification of the data. Training is performed using back-propagation algorithm. Once the selection of the attributes is performed, classified data set of profiles as fake and genuine will be used to train the model. Selected attributes are extracted from the profile for the purpose of classification. From this data set, 80% of profiles are used to prepare training set and remaining 20% for testing. The training data set is fed to the classification model. It learns from the data set so that later the model can be used to predict the classes accurately for other data. The labels from test set are then removed and are left for determination by the trained classifier.
3.2 Stalking Prediction Once the detection model is properly trained and tested, it can be used for stalking prediction. The list of people stalking (interacting the most) will be taken into consideration. These profiles will be considered as the profile that has to be classified as fake or genuine. the data is fed to the trained model which will predict if a particular user is stalking from a fake profile or a genuine profile as stalking from fake profiles tend to be more dangerous. Once this is found, it is easy to filter such people from friends list. The working of the system using above steps is as follows : • Features are selected to apply classification algorithms. For example, status count, friends count, gender, etc. • After selection of the attributes, data set of profiles that are already classified as fake or genuine are used to train the model. • Selected attributes are extracted from the profile for the purpose of classification. • From this data set, 80% of the data is considered for training and 20% for testing. • Training data set is fed to the classification model. It learns from the data set and is expected to give correct class labels for the testing data set. • The labels from test set are removed and are left for determination by the trained classifier. • The list of people stalking (interacting the most) will be taken. These profiles will be taken as the profile that has to be classified as fake or genuine. • Useful features will then be extracted and fed to the trained classifier.
4 Results and Analysis The proposed model has been designed and tested successfully using a publicly available data set. The data set consists of 1469 genuine users and 1329 fake users. For testing purpose, a sample data set of 20 users is considered for stalking purpose.
18
M. Swathi et al.
Fig. 2 Extracted features used for classification
Fig. 3 Confusion matrix
The features that were considered for fake profile detection include statuses count, sex code, language code, friends count, followers count, etc. (see Fig. 2). The confusion matrix generated for the fake profile detection is shown in Fig. 3. It represents the true label along the y-axis and predicted label along the x-axis. It can be observed that the correct predictions are more and in a comparatively higher proportion.
Fake Profile Detection and Stalking Prediction on Facebook
19
Fig. 4 Classification results
Fig. 5 Results
Fig. 6 Predicted stalking result
When the experiment was performed then the accuracy of 93% was achieved with a percent error of 6% as shown in Fig. 4. The detailed classification is shown in Fig. 5. along with the precision, recall, f 1-score and support for both fake and genuine classes for more understanding. The precision is intuitively the ability of the classifier not to label as positive a sample that is negative. The recall is intuitively the ability of the classifier to find all the positive samples. F1-score is the weighted average of precision and recall. Finally, support is the number of samples of the true response that are present in class. Figure 6 shows the stalking prediction result. The account name, genuine or fake will be printed for the user to know the details.
20
M. Swathi et al.
5 Conclusion and Future Work The constant evolution of fake accounts is occurring in day-to-day social media. Once it is discovered, the models are emerging for the detection and protection of the profiles in the network. This raises the importance for the developing fake account detection techniques considering their near-real behaviour with the genuine ones. Using such accounts people stalk others, gather information related to them and harass them or perform ransomware attack. Sometime they try to defame others in the network without revealing their own identity to others. Such people stalking from fake accounts tend to be more dangerous than the genuine users. Our system will able to detect and present the user about the fake profile users by which the actions can be performed by the respective users to block or remove from the account and stop the visit to the user profile. This framework will expose the profiles that stalks people through a fake Facebook account. This system has some of the limitations in terms of collecting user data as only a few features can be extracted and used for this purpose. This also causes a problem for real-time implementation of the same which can be solved as in future work. Acknowledgements We would like to thank Rudrani Wankhade, Student in IT Department, NITK for helping us in this work.
References 1. Singh, N., Sharma, T., Thakral, A., Choudhury, T: Detection of fake profile in online social networks using machine learning. In: 2018 International Conference on Advances in Computing and Communication Engineering (ICACCE-2018), Paris, France 22–23 June 2018. https:// ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8441713 2. Bilge, L., Strufe, T., Balzarotti, D., Kirda, E.: All your contacts are belong to us: automated identity theft attacks on social networks. In: Proceedings of ACM World Wide Web Conference (2009). https://doi.org/10.1145/1526709.1526784 3. Dr. Tiwari, V.: Analysis and detection of fake profile over social networks. In: International Conference on Computing, Communication and Automation (ICCCA2017). https://ieeexplore. ieee.org/stamp/stamp.jsp?tp=&ar-number=8229795&tag=1 4. Gheewala, S., Patel, R.: Machine learning based twitter spam account detection : a review. In: Proceedings of the Second International Conference on Computing Methodologies and Communication (ICCMC 2018) IEEEConference Record 42656; IEEE Xplore ISBN:978-15386-3452-3. https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8487992 5. Torkyl, M., Meligy, A., Ibrahim, H.: Recognizing fake identities in online social networks based on a finite automaton approach. In: IEEE. https://ieeexplore.ieee.org/stamp/stamp.jsp? tp=&arnumber=7856436&tag=1 (2016) 6. Weaver, J., Tarjan, P.: Facebook linked data via the graph API. Semantic Web (2012) 7. Conti, M., Poovendran, R., Secchiero, M.: FakeBook: detecting fake profiles in online social networks. In: IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (2012). https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6425616 8. Ivana, G.-I.: Meaning construction in overviewing: “It Was Like Catching Up, But Without Talking”. In : Social Ties in Online Networking. https://www.springer.com/gp/book/ 9783319715940 (2018)
Fake Profile Detection and Stalking Prediction on Facebook
21
9. Terevinto, P.N., et al. : A framework for OSN performance evaluation studies. In : Machine Learning Techniques for Online Social Networks. https://www.springer.com/gp/ book/9783319899312 (2018) 10. Huang, B., et al.: Discover your social identity from what you tweet: a content based approach. In: Disinformation, Misinformation, and Fake News in Social Media. https://www.springer. com/gp/book/9783030426989 (2020) 11. Chu, Z., Gianvecchio, S., Wang, H., Jajodia, S.: Who is tweeting on twitter: human, bot, or cyborg? In: Proceedings of the 26th Annual Computer Security Applications Conference. pp. 21–30. ACM (2010) 12. Rahaman, I., et al.: On the problem of multi-staged impression allocation in online social networks. In: Machine Learning Techniques for Online Social Networks. https://www.springer. com/gp/book/9783319899312 (2018) 13. Jenkins, R.: Social Identity. Routledge (2014) 14. Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. arXiv:1607.01759 (2016) 15. Chelmis, C., et al.: Order-of-magnitude popularity estimation of pirated content. In: Machine Learning Techniques for Online Social Networks. https://www.springer.com/gp/ book/9783319899312 (2018) 16. Rezvanian, A., et al.: Social networks and learning systems a bibliometric analysis. In: Learning Automata Approach for Social Networks. https://www.springer.com/gp/book/9783030107666 (2019)
Empirical Evaluation of NSGA II, NSGA III, and MOEA/D Optimization Algorithms on Multi-objective Target Priyanka Makkar, Sunil Sikka, and Anshu Malhotra
Abstract It is known that almost every multi-objective algorithm is to stay on Pareto dominance, and the single optimization local search cannot be easily integrated with these MOEA. Only a few multi-objective (MO) algorithms are built using a decomposition strategy. MOEA/D algorithm decomposes problem based on multi-objective into various sub-problems and optimizes these entire sub-problem concurrently. One and all sub-problems are optimized by using information from its neighboring subproblems. Also, a good comparison between these two approaches was missing in the literature on multi-objective targets. The comparisons carried out between these two different strategies are needed for identifying their strengths and weaknesses. The development of efficiency and effectiveness of these multi-objective evolutionary algorithms is the basis of these comparisons. This paper compares three MOEAs and also covers NSGA version II (Pareto domination), NSGA version III (Pareto domination), and the MOEA/D (decomposition-based). We had taken the multi-objective Rosenbrock function as a testing MO problem. Keywords NSGA v-2 · NSGA v-3 · MOEA · MOEA/D · Optimization · Multi-objective
1 Introduction In mono-objective problems, it is facile to access the magnificent solution, but in real life, we face multi-objective problems such as reduce the risk, increase the reliability, decrease the deviation from the desired level, and decrease cost. It is very difficult P. Makkar (B) · S. Sikka Amity University Haryana, Gurgaon, India S. Sikka e-mail: [email protected] A. Malhotra Northcap University Gurugram, Gurgaon, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_3
23
24
P. Makkar et al.
to access one solution for the multi-objective problems, or we can say no single solution optimized all the objectives for multi-objective problems (MOP) which satisfied all the criteria because one objective is a conflict of another objective such situation produce a group of optimal solutions also called as Pareto front. To tackle multi-objective problems [1], lots of mathematical techniques as well as MOEA are designed such as NSGA version II, NSGA version III, and MOEA/D [2]. These are few algorithms on which researchers are working on it from last few years. In most of the MOP, researchers apply the MOEA’s shown by Coello et al. [3]. These are very effective algorithms to solve such kinds of problems [4, 5] but in the case of non-dominated sorting, GA computational complexity is high and expensive due to the existence of huge population size. This paper compares the performance of NSGA version II, NSGA-3, and MOEA/D on the MO Rosenbrock function. The rest of the paper is written as follows: Sect. 2 represents MOEA, Sect. 3 compares MOEA/D with NSGA version II and NSGA version III and shows that MOEA/D performs better than NSGA version II and NSGA version III. Section 4 presents the result of MOEA/D. Section 5 summarizes the paper.
2 Moea In the past few years, lots of researchers are working on multi-objective evolutionary algorithm (MOEA) to solve real-time MOP. Three MOEA algorithms NSGA version II, NSGA version III, and MOEA/D have been discussed in this paper. NSGA version II algorithm is based upon the techniques in which we find Pareto front to give the best solution during their search, NSGA version III is based on a reference point on a hyperplane, whereas MOEA/D is based on decomposing of the problem into sub-problems [6].
2.1 NSGA Version II From the last few years, NSGA version II MOEA is admired by the researchers based upon Pareto domination. It is based upon two basic principles. 1. 2.
First is based on a quick solution of non-dominated sorting. Second is the conservation of diversity in the solutions.
First of all, we generate a random population called parent population Pt . Apply selection, mutation, and crossover on Pt to generate offspring population called Qt where Rt is the sum of parent and offspring population (Rt = Pt UQt ). Let the size of the parent population (Pt ) is W, then the size of Rt is 2 W. Now, according to non-domination, we will sort the population Rt, and we get a lot of solutions like F 1 , F 2 , F 3, and so on where F 1 contains the best solution then F 2, and so on. We select all the solutions from set F 1 in the latest population Pt+1 called new population. If
Empirical Evaluation of NSGA II, NSGA III, and MOEA/D …
25
Fig. 1 Multi-objective NSGA version II
the size of F 1 is less as compared to the parent population, in that case we will add more solutions from Pareto front F 2 , F 3, and so on; in the new population till the size of the latest population (Pt+1 ) gets the same as the old population (Pt ) rest we will reject all the solutions. After that, we follow the crowding distance sorting to reduce the population size and to find the best solution. Figure 1 represents the working of NSGA II. The computational complexity of this algorithm in each generation is O(kW 2 ) where w represents the population size, and k represents the number of objectives [7, 8].
2.2 NSGA Version III In 2014, Deb and Jain update the second version of NSGA known as NSGA version III by amending some selection mechanisms in the NSGA version II algorithms [9– 11]. In the NSGA version III algorithm, first of all, we will calculate the number of reference points (RP), and this RP depends upon the number of objectives present in a given problem. Mostly, in real-time multi-objective problems, number of RP is very high even for low dimension objective space. To overcome such a situation as well as reduce the algorithm runtime, it is suggested to combine the RP with a lower p-value.
2.3 Multi-objective Evolutionary Algorithm Decomposition The multi-objective evolutionary algorithm based upon decomposition decomposes the scalar multi-objective problem into scalar optimization problems instead of fixing the multi-objective problem as a single problem [12]. Few benefits of using this algorithm are as follows:
26
P. Makkar et al.
1.
The computational complexity of MOEA/D for each generation is less than other MO algorithms [13]. The decomposition technique is a better way to solve critical problems. It has an effective technique for evolutionary computation. It is simple to design a local search operator using well-developed single-target optimization algorithms. All the computations are applied to a small number of populations because fewer numbers having evenly distributed solutions [14]. There is a various method to transforming the issue of approximation of Pareto front into several scalar OP.
2. 3. 4. 5. 6.
3 Computational Experiment Here, we consider the multi-objective Rosenbrock function as a testing MO problem and consider two objectives standard deviation and cost. All the programs are executed in MATLAB, and performance is compared in terms of cost as objective 1 and standard deviation as objective 2 for NSGA version II, NSGA version III, and MOEA/D multi-objective algorithms. We represent the mean cost for all three algorithms in Table 1 and conclude that cost for MOEA/D is less among all. The second table represents the standard deviation for all three algorithms and concludes that MOEA/D performs nicely in this also. In the last Table 3, we represent the execution time (ms), and MOEA/D gives results very quickly (Table 2). In Fig. 2, the average cost of objective 1 is between multi-objective NSGA version II, NSGA version III, and MOEA/D; Fig. 3 represents the standard deviation for Table 1 Cost of objective #1 between multi-objective NSGA-V2, NSGA-V3, and MOEA/D for 10 runs with mean and standard deviation
Run
NSGA-V2
NSGA-V3
MOEA/D
1
0.57
0.60
0.49
2
0.66
0.75
0.35
3
0.58
0.59
0.47
4
0.58
0.58
0.48
5
0.58
0.59
0.55
6
0.85
0.88
0.50
7
0.58
0.59
0.46
8
0.92
0.59
0.51
9
0.59
0.56
0.46
10
0.59
0.58
0.46
Total
0.65
0.63
0.47
STD
0.128
0.102
0.052
Empirical Evaluation of NSGA II, NSGA III, and MOEA/D … Table 2 Cost of objective #2 between multi-objective NSGA-V2, NSGA-V3, and MOEA/D for 10 runs with mean and standard deviation
Table 3 Analysis of the result on MO rosenbrock function on all three MOEA algorithms
27
Run
NSGA-V2
NSGA-V3
MOEA/D
1
0.55
0.58
0.73
2
0.57
0.75
0.55
3
0.85
0.56
0.58
4
0.56
0.57
0.58
5
0.56
0.58
0.59
6
0.57
0.85
0.58
7
0.95
0.56
0.58
8
0.56
0.57
0.59
9
0.56
0.56
0.57
10
0.55
0.57
0.58
Average
0.63
0.62
0.59
STD
0.15
0.10
0.05
Run
NSGA-V2
NSGA-V3
MOEA/D
1
31.9
38.3
25.1
2
35.9
58.5
28.0
3
30.8
31.9
25.2
4
32.2
30.7
27.9
5
31.2
30.5
27.0
6
40.6
34.9
27.6
7
29.7
32.0
27.9
8
31.3
29.4
26.4
9
33.6
29.5
26.5
10
30.3
29.9
25.9
Average
32.7
34.6
26.8
STD
3.29
8.86
1.10
objective one. Figure 4 represents the average cost for objective 2 between multiobjective NSGA-V2, NSGA-V3, and MOEA/D; Fig. 5 represents the standard deviation for objective 3. Figure 6 represents the average execution time for NSGA version II, NSGA version III, and MOEA/D, and Fig. 7 represents the standard deviation in execution time for NSGA version II, NSGA version III, and MOEA/D.
28
P. Makkar et al.
Fig. 2 Average cost of objective #1 between multi-objective NSGA-V2, NSGA-V3, and MOEA/D
Fig. 3 Standard deviation of objective #1 between multi-objective NSGA-V2, NSGA-V3, and MOEA/D
4 Results After analyzing multi-objective evolutionary algorithms, we find that MOEA/D performs appropriately in the case of real-time complex multi-objective problems because it is based on the decomposition, which divides the problem into subproblems. It is easy to solve scalar problems rather than solving a MOP as one problem. It is found that MOEA/D is more efficient than NSGA II and NSGA III for multi-objective problems.
Empirical Evaluation of NSGA II, NSGA III, and MOEA/D …
29
Fig. 4 Average cost of objective #2 between multi-objective NSGA-V2, NSGA-V3, and MOEA/D
Fig. 5 Standard deviation of objective #2 between multi-objective NSGA-V2, NSGA-V3, and MOEA/D
5 Conclusion and Future Work From the above result, it is visible that the average cost, execution time, as well as the standard deviation in the case of multi-objective evolutionary algorithm based on decomposition, is less after analyzing it with other algorithms like non-dominated genetic algorithm version II and version III. Owing to all these advantages, it is right to say that a multi-objective evolutionary algorithm is appropriate in case of realtime problems having more than one objective. In future work, we are planning to implement multi-objective evolutionary decomposition algorithm in other real-world situations and optimize the number of objectives more effectively.
30
P. Makkar et al.
Fig. 6 Average execution time(ms) for three MOEA algorithms
Fig. 7 Standard deviation execution time(ms) for three MOEA algorithms
References 1. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.A.M.T.: A fast and elitist multiobjective genetic algorithm: NSGA-V2. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002) 2. Zhang, Q., Li, H.: MOEA/D: a multi-objective evolutionary algorithm based on decomposition. IEEE Trans. Evol. Comput. 11(6), 712–731 (2007) 3. Coello, C.C.A., Brambila, S.G., Gamboa, J.F., et al.: Evolutionary multiobjective optimization: open research areas and some challenges lying ahead. Complex Intell. Syst. 6, 221–236 (2020) 4. Mansoor, U., Kessentini, M., Wimmer, M., Deb, K.: Multi-view refactoring of class and activity diagrams using a multi-objective evolutionary algorithm. Softw. Qual. J. 25(2), 473–501 (2017) 5. Rajpurohit, J., Sharma, T.K., Abraham, A., Vaishali, A.: Glossary of metaheuristic algorithms. Int. J. Comput. Inf. Syst. Ind. Manag. Appl 9, 181–205 (2017)
Empirical Evaluation of NSGA II, NSGA III, and MOEA/D …
31
6. Li, H., Zhang, Q.: Multi-objective optimization problems with complicated Pareto sets, MOEA/D, and NSGA-V2. IEEE Trans. Evol. Comput. 13(2), 284–302 (2008) 7. Zitzler, E., Laumanns, M., Thiele, L.: SPEA2: improving the strength Pareto evolutionary algorithm. TIK-report, 103 (2001) 8. Elarbi, M., Bechikh, S., Gupta, A., Said, L.B., Ong, Y.S.: A new decomposition-based NSGAV2 for many-objective optimization. IEEE Trans. Syst. Man Cybern.: Syst. 48(7), 1191–1210 (2017) 9. Ciro, G.C., Dugardin, F., Yalaoui, F., Kelly, R.: An NSGA-V2 and NSGA-V3 comparison for solving an open shop scheduling problem with resource constraints. IFAC-Papers Online 49(12), 1272–1277 (2016) 10. Ishibuchi, H., Imada, R., Setoguchi, Y., Nojima, Y.: Performance comparison of NSGA-V2 and NSGA-V3 on various many-objective test problems. In 2016 IEEE Congress on Evolutionary Computation (CEC), pp. 3045-3052. IEEE, (Jul 2016) 11. Mkaouer, W., Kessentini, M., Shaout, A., Koligheu, P., Bechikh, S., Deb, K., Ouni, A.: Manyobjective software re-modularization using NSGA-V3. ACM Trans. Softw. Eng. Methodol. (TOSEM) 24(3), 1–45 (2015) 12. Peng, W., Zhang, Q., Li, H.: Comparison between MOEA/D and NSGA-V2 on the multiobjective traveling salesman problem. In: Multi-Objective Memetic Algorithms, pp. 309–324. Springer, Berlin, Heidelberg (2009) 13. Ishibuchi, H., Sakane, Y., Tsukamoto, N., Nojima, Y.: “Evolutionary many-objective optimization by NSGA-V2 and MOEA/D with large populations.” In: 2009 IEEE International Conference on Systems, Man and Cybernetics, pp. 1758–1763. IEEE (2009) 14. Deep, K., Bansal, J.C.: Mean particle swarm optimisation for function optimisation. Int. J. Comput. Intell. Stud. 1(1), 72–92 (2009)
Moving Skills—A Contributing Factor in Developmental Delay Sonali Gupta, Akshara Pande, and Swati
Abstract Developmental delay has become a major issue for society as there is less awareness among parents. These delays can be seen sometimes after the birth of a child, and he can have major disorders such as language, vision, thinking, social factor, emotional skills, and motor skill. Various factors such as premature birth, inherited, high breast feeding, etc., are responsible for developmental delay in child. In this study, we are mainly focusing on infant motor behavior skills such as hand/leg movements, crawling, walking downstairs, etc., to have a better understanding about the growth of child with motor skills disorder. This study reviews various developmental delays in child and discussed about machine learning approach to detect motor delay in child. The diagnosis of motor delays in early stage will be immensely helpful for parents to focus on the medication/physiotherapy required. This research would be important for parenting and shows various child growth milestones to be followed by child. Machine learning approaches can be used to detect child disabilities. This study may be helpful to parents to detect child development delay at early stage and focusses on various techniques for it. Keywords Artificial intelligence · Developmental disabilities · Motor delay · Milestone
1 Introduction Developmental delays are common in childhood, occurring in 10–15% of pre-school children [1]. Developmental delays in child means when an infant is not able to reach to their developmental indicators in an appropriate time interval. Developmental disabilities begin anytime during the developmental period, and usually, a person can suffer for his lifetime. Most developmental problems begin before a birth, but S. Gupta (B) · A. Pande Graphic Era Hill University, Dehradun, U.K, India Swati Graphic Era Deemed To Be University, Dehradun, U.K, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_4
33
34
S. Gupta et al.
some can happen after born because of any injury, infection, or other factors [2]. Spontaneous movements in child during their first few months lead to “general growth” of child. Child exhibit developmental delay in various ways such as vision, speech, social, emotional, and motor skills which are caused due to factors such as premature birth, heredity, alcohol consumption during pregnancy, environmental pollution, etc.[2]
1.1 Literature Review It is important to focus on developmental delays as it occurs one in seven children in a country. All around the globe, the total load of disability increased by 52% between 1990 and 2017 [3]. In entire world, 52.9 million (95% uncertainty interval [UI] 48.7– 57.3; or 8.4% [7.7–9.1]) children younger than 5 years (54% males) had developmental disabilities in 2016 compared with 53.0 million (49.0–57.1 [8.9%, 8.2–9.5]) in 1990 [4]. There is a study that reports on socio-demographic risk attributes related with delay in movement in various populations, whether of any gender, parental unemployment or a lower family income or underweight or overweight [4]. Due to motor developmental delay, child can have various risks associated with their health. A recent study has been proposed to identify spontaneous movements using video analysis of child at the age of three, and they implemented many features for identification of cerebral palsy [5]. Assessment of movement can be done using storage of data in infants [6]. The developmental delay can be broadly classified into various categories (Fig. 1). • Global developmental delay means when a child suffers from one or more disorder simultaneously. Assessment of these disorders involve awareness among parents about child’s health, and existing neurological behavior can lead to predict developmental outcome [7]. • Speech and Language delay: 11% of toddlers suffer from speech disorder. • Early intervention is required else child can have autism disorder [7].
Global Developmental Delay speech & language delay
hearing loss
au sm
Fig. 1 Developmental delay in child [7]
verbal difficul es
neurological disorders
motor delays
Moving Skills—A Contributing Factor in Developmental Delay
35
• Hearing loss: pre-school children are likely to fall under this category of disability. Due to this, child sometimes also not able to talk properly [7]. • Autism: Children who suffer from problem to speak properly faces problem in act together verbally with people in surroundings. As realistic and social deficiencies are important features of autism, as it is a salient to aspect for disconnection among language, motor behavior, and adaptive skills. It is found that autism is common disorder, present in one child out of 500 children [7]. • Verbal difficulties: speech delay and cognitive ranges possibly will recommend a neurodevelopment diagnosis which will work by means of a nonverbal learning disorder. In these type of cases, child possibly will have weakened visual-spatial perceptual capabilities [7]. • Neurological disorders: children with language disorder have neurological disorder which affects their brain system [7]. • Motor delay: It is observed in a child in few months of birth where the child cannot move properly, disorders such as ataxia, myopathy, cerebral palsy, spinal muscular, and atrophy (withering) possibly will be present. If a child is normal and healthy and there is no trace of motor delay, then the child do not suffer from cerebral palsy [7].
1.2 Previous Study and Limitations Developmental delays are reported to occur in 10–15% and GDD in 1–3% of children under the age of five years. Various factors are responsible for these delays. If developmental delays are detected late, opportunities for early diagnose are lost, which usually results in relatively poor outcomes such as learning disability, behavior problems, and functional impairments later on. There is strong research proof suggesting that effective early identification of developmental delays and timely early intervention can positively alter a child’s long-term trajectory [1, 8]. It is important to increase our understanding of (the interplay between) parental and child behavior and, foremost, to identify which interactional strategies by parents are effective in increasing children’s engagement to prevent child developmental delay [8, 15]. There are relative limitations for developmental delays which are crucial for development that are: • Parents are less aware about the milestones carried by child during growth as various families are nuclear. • Machine learning approaches are not properly applied to detect developmental delays. • Developmental datasets are not available.
36
S. Gupta et al.
1.3 Motivation and Objectives Due to the limitations of techniques used to know about milestone delay in a specific time, we believe that machine learning approaches may be fruitful. With the help of machine learning, some useful applications may be developed which can detect developmental delay at the right time. This can give an alarm to parents to contact medical professionals, so that they can prevent their child from any kind of disability.
2 Motor Milestone for a Child Motor indicators play major role in infant’s life, but there is no single, universal process that all babies follow. One report was published by World Health Organization (WHO), represent the window of achieving milestones by child [8] (Fig. 2). 1.
Movement recognition aims to sense relevant movements of child with accessing different methods using video recordings, and fidgety movements are analyzed. Spontaneous movements in child are supposed to analyze properly and should be given valuable treatment. Joint loss of vision, audible range, and vestibular function affect the overall motor development of child [9]. There are some specific disabilities in children which are [2, 9]: • Approximately one fourth of hearing loss in babies is due to less care of pregnant woman. • Child can have fetal alcohol syndrome due to genetic disorders. • Child can have autism spectrum disorder. • Low birth, infections are associated with higher risk of disability.
2.
Vision loss is one of the significant issue in which a child is unable to see properly or sometimes vision may be blurred [2, 9].
Milestone
Age (in months) when 90% of babies have achieved milestone Walking with assistance Si ng with support 0
5
10 Age (in months)
Fig. 2 Child milestones (WHO) [8]
15
20
Moving Skills—A Contributing Factor in Developmental Delay
37
2.1 Motor Delays The term “motor delay” indicates the slow growth of gross motor skills (such as crawling, walking, etc.) in children. Physician referrals of motor delay in child during 6–18 months [10]. Probable reasons behind motor delays: 1. 2. 3. 4. 5.
6.
Premature birth: when a baby is born before the due delivery date such as in 7 months of pregnancy [10]. Cognitive disability: in this problem, child may have cognitive disability which affects their language or thinking skills [10]. Vision related problems: child will face problem in recognizing thing’s and problem in clear vision [10]. Cerebral palsy: it means child can have posture deficit or movement disorder [10]. Spina bifida: it is the defect when a child is not able to fully develop their spinal cord and not able to stand, sit or walk properly. It occurs when neural tube does not close properly [10]. Ataxia: when mother take alcohol in excess then sometimes child suffers from neurological sign consisting of involuntary movements, lack of speech, etc. [10] Motor skill can be of various types:
• Gross Motor skills: it includes large group of muscle for performing actions like walking, crawling, balancing, etc. It can further have two more parts—locomotor skills: running, jumping, and object-control skill: catching, throwing [10, 13]. • Fine motor skills: it requires small group of muscles to perform movements with hands, feet, fingers. Reading and writing by child also come under this skill. Brain problems, spinal cord, and other illness can affect fine motor skills [10, 13]. 2.1.1 i.
Major Problems Raised Due to Motor Delays:
Hypotonia: it is the major symptom of motor delay in infants. In this, child has less muscle tone. Muscle tone consists of muscle compression and rigidity in muscles, soft tissue, and tendons. In some study, defendants agreed if a child suffering from hypotonia can be related to the following features: decreased strength, decreased non-acceptance of any activity, increased flexibility, round shoulder posture, inclined onto supports, hypermobile joints, deprived attention, and motivation. It is mentioned that disorders like cerebral palsy, down syndrome (DS), and Prader–Willi syndrome are component of Hypotonia [10]. Assessment of child health must be done during pregnancy, and special care should be done while laboring. Some diagnose also shown in Fig. 3.
There are various diagnose related with hypotonia, and we can have some discussion on it.
38
S. Gupta et al.
Diagnose related with Hypotonia
Cerebral Palsy
Down Syndro me
Spina Bifida
Muscular Dystrophy
DNA Syndrome
ADD/ ADHD
Fig. 3 Diagnose related with hypotonia [10]
• Down Syndrome: It is the problem which affects chromosomes (tiny “package” in genes). There is an extra chromosome which may cause hearing loss, vision problem, and ear/eye infections in child [10]. • Muscular Dystrophy: In this, child can have weaker muscles and muscle mass can be lost [10]. • DNA Syndrome: It is inherited problem caused by DNA disorder [10]. • ADD/ADHD: In this problem, child is an attention seeker, and they always wanted that their parents should entertain them all the time [10]. ii.
iii.
iv.
Fine motor adaptive delay: It is important factor to analyze visual activity of pre-school child. According to Denver developmental screening II test verifies skills of child that include manual coordination, stimulus organization, and handling of small objects and involve visual and motor skills. Personal or social delay: if a child is facing the individual and social delays, diagnosis should be done to find whether child is suffering from the developmental intellectual disorder or autism or its surrounding environment is violent, negligence, or denial. It depends on environment needs of child [12]. Enhancement of psychological advancement of child can be done by socialization which enhances child’s social development skill. When child turns into adult, linear growth of social skills helps them to be independent. Social development ability comprises of adjustment, emotion, interaction, and awareness socially as studied by several researchers. Social competence can be seen in several aspects, but its basic meaning is to acquire skill by an individual to socially sound with others. Social cognition is an imperative factor in social competence which means in cognitive processes one interacts with others to comprehend their behaviors, emotions, feelings, and intentions, all these aspects are interconnected with behavior. The absence of social skill in children determines to unfortunate academic fallouts and disturbs mental health [12].
Moving Skills—A Contributing Factor in Developmental Delay
39
3 First Advisable Cure for Motor Skills Delay Parents should contact the physician if they feel any kind of lack of activities in their child. The physician may advice some physical therapy or occupational therapy depending upon the delay to indulge a child into physical activity. It is important for parents to keep monitoring their child activity, so that they do not miss any delays. Fine motor and developmental tests for child includes Peabody Developmental Motor Skills (PDMS-2) and the Bruiniks–Oseretsky test of motor proficiency (BOT-2) assessments [12]. There are various techniques to detect the disabilities in child using prediction models, applications, etc.
4 Developmental Delays Prediction Using AI Nowadays, scientists are using machine learning and artificial intelligence to predict that a child has developmental disability. Sensors can be used to capture various parameters (such as temperature, heart rate, etc.) of a child. These parameters will act as features for machine learning model. With the help of these features, a machine learning model may be developed which can help to detect the developmental delay [14]. Wearable sensor technology is a convenient kit for reviewing movement, enabling 24-hour attention of child. Algorithms are also used to analyze normal or delayed actions in child. Researchers also came up with prediction model that was able to work movements of child. A tech company Cognoa [14] has developed an app that parents can use and monitor their baby’s health. In Cognoa app, pattern matching algorithms are used to detect the differences.
5 Conclusion This study clearly indicates developmental delay is a major issue in society. Parents have to take special attention regarding the infant’s health and consult to doctor accordingly. Day to day life movements must be monitored to get accurate responses from child and should engage child in physical and mental activities.
6 Future Work In future, we will work on datasets obtained from child’s activity and apply a machine learning technique to develop a model for predicting the chances of disabilities in child and suggest some methods to cope up with them.
40
S. Gupta et al.
References 1. Choo, Y.Y., et al.: Developmental delay: identification and management at primary care level. Singap. Med. J. 60, 119 (2019) 2 Centers for Disease Control and Prevention, https://www.cdc.gov/ncbddd/developmentaldisab ilities/facts.html, last accessed 2019/09 3 World Health Organization.: The Global Burden of Disease. World Health Organization Press, Geneva. https://www.healthdata.org/sites/default/files/files/policy_report/2019/GBD_2017_B ooklet.pdf, last accessed 2017 4. Veldman, S.L., Jones, R.A., Chandler, P., Robinson, L.E., Okely, A.D.: Prevalence and risk factors of gross motor delay in pre-schoolers. J. Paediatr. Child Health 56, 571–576 (2020) 5. Kanemaru, N., et al.: Specific characteristics of spontaneous movements in preterm infants at term age are associated with developmental delays at age 3 years. Dev. Med. Child Neurol. 55, 713–721 (2013) 6. Fjørtoft, T., et al.: Inter-observer reliability of the “assessment of motor repertoire—3 to 5 months” based on video recordings of infants. Early Hum. Dev. 85, 297–302 (2009) 7. Encyclopedia of Children’s Health, healthofchildren.com/D/Developmental—Delay.html, last accessed 2020/06/20 8. Gwen Dewar Ph.D., Parenting science WHO charts, https://www.parentingscience.com/motormilestones.html, last accessed 2019 9. Adde, L., et al.: Using computer based video analysis in the study of fidgety movements. Early Hum. Dev. 85, 541–547 (2009) 10. Martin, K., et al.: Characteristics of hypotonia in children: a consensus opinion of pediatric occupational and physical therapists. Pediatr. Phys. Ther. 17, 275–282 (2005) 11 Centers for Disease control and Prevention, Down syndrome facts, https://www.cdc.gov/ncb ddd/birthdefects/downsyndrome.html, last accessed 2020/05/11 12. Bodner, N., et al.: Parental behavior and child interactive engagement: a longitudinal study on children with a significant cognitive and motor developmental delay. Res. Dev. Disabil. 1(103), 103672 (2020 Aug) 13. Pant M, Ray K, Sharma TK, Rawat S, Bandyopadhyay A.: Soft computing: theories and applications. Proc SoCTA. 2 (2016) 14. Pant, M., Ray, K., Sharma, T.K., Rawat, S., Bandyopadhyay, A.: Soft computing: theories and applications: proceedings of SoCTA 2016. Springer, Berlin, Heidelberg, Germany, vol. 2, (2017). ISBN 978–981–10–5699–4 15. Ray, K., Sharma, T.K.: Soft computing : theories and application, AISC SoCTA2 2
Estimation of Wind Speed Using Machine Learning Algorithms Sonali Gupta, Manika Manwal, and Vikas Tomer
Abstract Wind energy plays a vital role as renewable energy resource and nonpolluting source. It is an eco-friendly, developing and clean energy. Wind energy that is procurable is arbitrary, therefore wind generation cannot be better known in advance. For organization of system as energy dispatching, wind farm operator ought to grasp the facility from wind in ahead of time. Wind speed prediction is still a challenge due to the stochastic and highly varying characteristics of wind. This study focused on forecasting the wind speed by processing the publicly available east wind dataset and performed three machine learning algorithms ZeroR, random forest and random tree to predict the wind speed using time series analysis. This work concluded that random forest algorithm is less efficient to predict wind speed as others algorithms depend on various factors for prediction. Keywords Machine learning · Prediction · Random forest · Random tree · Wind speed
1 Introduction India is at the edge of a renewable energy uprising. Government has set an ambition for achieving hundred seventy-five gigawatt (GW) of renewable energy capacity by 2022 [1]. Wind speed statement supports to reduce the unreliability in wind generation. It permits for lot of appropriate grid organization and incorporation of wind with power systems. It conjointly helps to cut back the imbalance responsibilities and penalties. For power generation, the powerhouse operator must have a minimum of grasp the speed of wind information of 1 day ahead, and for confirmation, he needs to grasp the wind speed information of 1 h ahead.
S. Gupta (B) · M. Manwal Graphic Era Hill University, Dehradun, Uttrakhand, India V. Tomer Graphic Era University, Dehradun, Uttrakhand, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_5
41
42
S. Gupta et al.
Intermittency of wind is mainly the principal task in using wind energy as a trustworthy autonomous supply of electrical supply. Energy crisis and depletion of fossil fuels area unit are the most important threats threatening the globe nowadays. Adequate utilization of renewable energy resources like wind and biomass proves that these are most important for fulfilling societal needs. Due to the unspecified nature and complementary performance of wind and solar energy, several scholars are aiming their efforts toward combined energy systems [1]. Hence, there is a crucial need to predict the accurate wind speed.
1.1 Literature Survey Wind speed forecasts rely on the assessment of expected wind energy in near future. This is important for the production of wind energy from turbines. Various categories of groupings are required for predicting wind speed such as based on time horizons, based on principles and methods, input data and predicted data. Previous study shows that forecasting of speed of wind has a crucial role to play for wind energy generation. Various machine learning algorithms such as artificial neural network and ARIMA model are incorporated for prediction. Hybrid models have less error as compared to other algorithms [1]. Time series analysis was also done by [2] for predicting trend of wind speed. Proposed method was compared to ARIMA and TES models. Duration for collecting was 4 months. The forecast error was the highest during onset of monsoon season. Similarly, authors in [3] used various artificial neural networks for predicting minimum MSE. CERC mandates with an accuracy of 70% for wind generation. According to [4], hourly wind speed forecasting is important for better energy generation. Authors used linear and nonlinear input–output prototypes for prediction. Short-term wind speed prediction plays an important role too explained by [5]. Author utilized several ANN methods such as ANN-BF, ANFIS and ANN-PSO for wind speed prediction in Iran. MSE and RMSE amounts for ANN-GA model are low in association with other models for short-term estimation. According to [6], Bayesian regularization is the best option for wind speed forecasting using MATLAB software. In response to that [7] presented a mean-of-posteriors Bayesian estimation method for correcting rain contaminated wind speed estimation. Authors used different methods such as wind speed dependent and spatial variation of errors of wind speed for explaining overall improvement in prediction. Authors also explained that proposed technique is useful for rain-affected winds. Accordingly, [8] proposed ANN and ANFIS models for short time forecasting with accuracy of 13.53%. In response to that [9] used wind farm data in Spain for various seasons. Authors used principal component analysis algorithm for extracting factors and predicted the wind speed using RBF neural network.
Estimation of Wind Speed Using Machine Learning Algorithms
43
1.2 Proposed Work Authors have applied various machine learning algorithms on east wind dataset and calculated various statistical measures. The proposed prediction model depends on wind energy. Location for prediction was some regions of India. Performance evaluators were RMSE, MAPE and MAE for prediction analysis. This study focusses on various research objectives as follows: • Data processing can be done more accurately to predict the wind speed uniformity. • In case of sudden weather change, there is a need to accurately predict the wind speed.
2 Methodology This study followed a methodology which can be shown using the block diagram. Figure 1 Steps can be explained below through following steps: Step 1: Step 2: Step 3: Step 4:
Dataset was collected from [10]. Wind speed data was analyzed, and different machine learning algorithms were applied. We used WEKA tool [11], in which first we removed missing values from the mean values if any missing value was found. This study classified the dataset using ZeroR, random forest and random tree algorithm with tenfold cross-validation, with 2652 instances. Results can be concluded, and prediction can be done on the basis of the results of these algorithms.
Above-mentioned step 1 describes the source of the data [10], where it was in the raw form, so in step 2, it was cleaned and preprocessed by using several methods, such as normalization. Missing values are also removed in this step. After having an immense experimentation on this data for model trial, finally ZeroR, random forest and random tree algorithm are decided, as they have been giving the better results in comparison with other machine learning algorithms. The sample of the data is applied on these machine learning algorithms with tenfold cross-validation. This provides the unbiased results in case for unseen data. All the experimentation trials with sampling are done in step 3. In step 4, the consolidation of results is prepared. Analysis of the results is discussed later.
44
S. Gupta et al.
Fig. 1 Methodology followed for prediction
3 Result Analysis and Discussion The proposed study found that random forest is slow as compared to other two algorithms in forecasting the wind speed in India. The proposed method was implemented using WEKA. Cross-validation has been done for training purpose. Authors have compared machine learning algorithm results and showed that wind speed forecasting can be done more reliably for country’s welfare and utilizing energy. Government has targeted to generate maximum wind energy in several years, and it is non-polluting source of energy.
Estimation of Wind Speed Using Machine Learning Algorithms
45
Fig. 2 ZeroR algorithm on dataset
This work implemented artificial neural network for predicting accurate wind speed in the areas of India. Figure 2 describes the wind speed forecasting using ZeroR algorithm using tenfold cross-validation, whereas Figs. 3 and 4 describe the wind speed accuracy using random forest and random tree machine learning algorithms.
4 Conclusion and Limitations Wind energy is friendly to environment in comparison with fossil fuels. Wind turbines are useful for electricity generation as it lasts for 20–25 years. Government involved R and D institutions, entrepreneurs for handling wind speed. Accordingly, authors worked on forecasting of wind speed using various machine learning algorithms.
46
S. Gupta et al.
Fig. 3 Random forest algorithm on dataset
Authors used WEKA software for analysis purpose. WEKA is a data mining application with a number of statistical algorithms and a GUI allowing you to have your own network structure. Random tree and ZeroR algorithms are capable for estimating wind speed distribution. Data used was trained and tested for prediction process and for getting less error. These algorithms are better in terms of error factor. Also random forest algorithm is less efficient in comparison with others as it is having higher chances of error.
5 Future Scope Wind speed forecasting is a major issue as it can help in various societal aspects. For further research, many algorithms such as artificial neural network and deep learning models would be included, and prediction can be further improved. Also wavelets can be used to capture the high-frequency components for wind speed estimation. The effect of Coriolis factor has not considered when predicting wind speed, which can be a most predominant factor for the analysis. Also remote sensing techniques can be applied for resource assessment in various hilly areas of country.
Estimation of Wind Speed Using Machine Learning Algorithms
47
Fig. 4 Random tree algorithm on dataset
References 1. Nair, K.R., et al.: Forecasting of wind speed using ann, arima and hybrid models. Doi: https:// doi.org/10.1109/ICICICT1.2017.8342555 2. Prema, V., et al.: Time series decomposition model for accurate wind speed forecast, https:// doi.org/10.1186/s40807-015-0018-9 3. Kaur, T., et al.: Application of artificial neural network for short term wind speed forecasting.” PESTSE, pp. 1–5 (2016) 4. Singh, A., et al.: Short term wind speed and power forecasting in Indian and UK wind power farms. https://doi.org/10.1109/POWERI.2016.8077339 5. Fazelpour, F., et al.: “Short-term wind speed forecasting using artificial neural networks for Tehran, Iran, https://doi.org/10.1007/s40095-016-0220-6 6. Kumar, S., et al.: “Wind speed forecasting using different neural network algorithms.” In: IEMENTech, pp. 1–4 (2018) 7. Gopalan, K., et al.: A Bayesian estimation technique for improving the accuracy of SCATSAT-1 winds in rainy conditions. IEEE 12, 1362–1368 (2019)
48
S. Gupta et al.
8. Sreenivasa, S.C., Agarwal, S. K., R.K.: Short term wind forecasting using logistic regression driven hypothesis in artificial neural network. In: PIICON, pp. 1–6 (2014) 9. Verma, S.M.et al.: Markov models based short term forecasting of wind speed for estimating day-ahead wind power, doi: https://doi.org/10.1109/ICPECTS.2018.8521645 10. Dataset of east wind speed, https://data.gov.in, last accessed 2020/04/15 11. Tool for processing of data, https://weka.com, last accessed 2020/03/17 12. Wind power discussion, https://energy.economictimes.indiatimes.com/news/renewable/indiastop-9-states-by-installed-wind-power-capacity/68782064, last accessed 19 November 2020.
A Comparative Study of Supervised Learning Techniques for Remote Sensing Image Classification Ashish Joshi, Ankur Dhumka, Yashikha Dhiman, Charu Rawat, and Ritika
Abstract Remote sensing image classification has long attracted the attention of the remote-sensing community because classification results are the basis for many environmental and socioeconomic applications. The classification involves a number of steps, one of the most important is the selection of an effective image classification technique. This paper provides a comparative study of the supervised learning techniques for remote sensing image classification. The study is being focused on classification of land cover and land use. Supervised learning is a branch of machine learning and is used in this study. The comparison is made among the different techniques of pixel-based supervised classification used for remote sensing image classification. The study has been made on a labelled data set. After the implementation, support vector machine has been found to be the most effective algorithm among the five algorithms of pixel-based supervised classification (i.e. maximum likelihood estimation, minimum distance classifier, principal component analysis, isoclustering and support vector machine). Keywords Remote sensing image classification · Pixel-based supervised learning · Support vector machine (SVM) · Land use/land cover analysis
1 Introduction Remote sensing remains in trend for the researchers and scientists, and the main reason for this is that it maps and detects the characteristics of earth through the light reflected and emitted by Earth’s surface. This information is generally captured in the form of images. Such images are captured using special cameras with the help of aircrafts and satellites. The collection of these images is further used by researchers and scientists to evaluate the different conditions occurring on Earth [1]. Geology, exploration of minerals, assessment of hazards, oceanography, agriculture, forestry, land degradation and environmental monitoring are some of the major A. Joshi (B) · A. Dhumka · Y. Dhiman · C. Rawat · Ritika Department of Computer Science and Engineering, THDC-Institute of HydroPower Engieering and Technology, Terhi Garhwal, Uttarakhand, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_6
49
50
A. Joshi et al.
fields where remote sensing is used. Images that are to be collected from remote sensing depends upon the field in which the analysis is to be done. Remotely sensed data can further be in the form of spatial, radiometric, temporal and spectral resolutions. Such data may contain both airborne and space-borne sensor data [2]. The bandwidth and the sampling rate over which the sensor fetches the information come under spectral resolution. There is a need to extract and analyse the smallest features, and this is done under spatial resolution. The number of discrete signals that a sensor fetch is collected under the radiometric resolution. These discrete signals are dynamic and of some particular strength. To analyse the changes in data, the temporal resolution is taken. This data under temporal resolution depends on the time elapsed in the consecutive images taken on the same ground by the sensors [3]. The spectral resolution bands seen in a multispectral satellite image are listed below [4] (Table 1): This paper focuses on the analysis of land use and land cover through the classification of remote sensing images. For a sustainable development of a society, it is required to make a scheduled check over the land cover and land use. To make a successful land use and land cover classification, the first step is to search for suitable data set and second step is to find an efficient technique to perform the classification [5, 6, 7]. The data set we used here is the EuroSAT data set. The images from Sentinel2 satellite has been covered in this data set. It covers 13 spectral bands. Moreover, it consists of 10 classes. The data set has 27000 labelled and geo-multiresolution referenced samples. These images are present in .tiff extension. It has two variants: one is ‘rgb’ that includes only red, blue and green bands encoded as JPEG and its download size is 89.91 MB (approx.) and another is ‘all’ that contains all 13 bands spectral resolution and is of approx. download size of 1.93 GB [8]. We have used Table 1 Bands in multispectral satellite image
S. No.
Bands
Wavelength
1
Coastal aerosol
(0.43–0.45μm)
2
Blue
(0.45–0.51μm)
3
Red
(0.53–0.59μm)
4
Green
(0.64–0.67μm)
5
Yellow
(0.585–0.625μm)
6
Red-edge
(0.705–0.745μm)
7
Near-infrared 1—NIR-1
(0.76–0.90μm)
8
Near-infrared 2—NIR-2
(0.86–1.04μm)
9
Short-wave infrared 1—SWIR-1
(1.57–1.65μm)
10
Short-wave infrared 2—SWIR-2
(2.08–2.35μm)
11
Panchromatic
(0.50–0.68μm)
12
Cirrus
(1.36–1.38μm)
13
Thermal infrared—TIRS-1
(10.60–12.51μm)
A Comparative Study of Supervised Learning Techniques …
51
the variant with 13 band spectral resolution, i.e. ‘all’. The sample images of all 10 classes available in the data set are as follows (Figs. 1, 2, 3, 4, 5, 6, 7, 8, 9 and 10): The techniques used for remote sensing image classification can be pixel-based or object-based. Pixel-based techniques can further be distinguished as supervised Fig. 1 AnnualCrop_71
Fig. 2 Forest_768
52
A. Joshi et al.
Fig. 3 Herbaceous vegetation_1511
Fig. 4 Highway_2469
and unsupervised. The pixel-based unsupervised classification requires some unlabelled data set and the classification techniques such as K-means or ISODATA. The two steps involved in the pixel-based unsupervised classification are generation of clusters and assigning of classes. The supervised pixel-based classification requires a labelled data set and a classification technique such as maximum likelihood, minimum distance, principal component, support vector machine (SVM)
A Comparative Study of Supervised Learning Techniques …
53
Fig. 5 Industrial_1117
Fig. 6 Pasture_1323
or isocluster. The steps involved in supervised approach are selection of training areas, generation of signature files and classification. The object-based image analysis (OBIA) involves segmentation, selection of training areas, statistics and classification [9]. In pixel-based classification, square pixels are created and each pixel has a specific class but in OBIA pixels are grouped into representative vector shapes with specific size and geometry.
54
A. Joshi et al.
Fig. 7 Permanent crop_1708
Fig. 8 Residential_14
2 Pixel-Based Supervised Learning The basic and fundamental unit of a satellite image is a pixel. In pixel-based techniques, the pixels are classified on the basis of the spectral information that they contain. The pixel-based technique is a traditional technique to make classification
A Comparative Study of Supervised Learning Techniques …
55
Fig. 9 River_1554
Fig. 10 SeaLake_1058
of the satellite images [10]. This technique is suitable for images with a low-tomedium resolution. For images with high resolution, this technique fails to do the best because it does not work with the information of the surrounding pixels. Thus, heterogeneity of the pixel information makes the classification tough in pixel-based technique. For such high-resolution images, object-based image analysis (OBIA) is used [9].
56
A. Joshi et al.
The pixel-based technique further works in supervised and unsupervised manner. Here, the emphasis is on pixel-based supervised learning technique. The algorithms involved in this approach are:
2.1 Maximum Likelihood The algorithm used by the maximum likelihood classification tool has two important aspects; first is the collection of normally distributed cells in each class sample in the multidimensional space, and second is the Bayes’ theorem of decision-making [11]. In this technique, we select a model (say linear regression) and use observed parameters x to create model’s parameter θ. The parameters are estimated using the equation (probability density function) as below: p(x; μ, σ ) = √
1 2π σ 2
−(x−μ)2
2 e (2σ )
σ and μ are the parameters of distribution. In probability density estimation, the σ and μ are unknown for a known set X of observed data. These parameters are to be estimated through maximum likelihood estimation (MLE). Suppose θ is a parameter that estimates X, thus it is required to maximize the likelihood of the θ with X. The log of the above probability density equation (PDF) is taken for the purpose and is termed as log likelihood as shown below [12]: n 1 n log f xi ; μ, σ 2 = − log(2π ) − log σ 2 − (xi − μ)2 2 2 2σ 2 The maxima of log function are to be calculated in order to maximize the likelihood. It is calculated as LL(θ; x). This is called the first derivative and is calculated with respect to θ and equated to 0. When the second derivative of the LL(θ; x) is calculated w.r.t θ, it is made confirmed that it should be negative [13]. The process is then terminated.
2.2 Minimum Distance The minimum distance classifier is used where images are to be classified in multifeature space. It is used to classify unknown image data to classes where the distance between the image and the class is minimized. Minimum distance is identical to the maximum similarity among images. Distance is treated as the index of similarity [14]. The following distances are used under this procedure:
A Comparative Study of Supervised Learning Techniques …
57
Euclidian Distance. Sometimes all the population classes are different from each other, in such a case Euclidian distance is calculated. It can be equated as below: dk2 = (X − μk)t(X − μk) Normalized Euclidian Distance. It is an enhanced equation of Euclidian distance. It is proportional to similarity of Dex and can be equated as below: dk2 = (X − μk)t σ k − 1(X − μk) Mahalanobis distance. In some cases, the axes of the feature space may have some correlation among them, in such cases Mahalanobis distance is used. Variance and variance–covariance matrices are used along with Mahalanobis distance. It can be equated as below: dk2 = (X − μk)t
k − 1(X − μk)
where X is vector of image data (with n bands), i.e. [x1, x2, .... xn], μk is the mean of the kth class, i.e. [m1, m2, .... mn], σk represents the variance matrix (as below): ⎡
σ11 0 ⎢ 0 σ22 · · · ⎢ σk = ⎢ .. .. ⎣ . . 0
0 0 .. .
⎤ ⎥ ⎥ ⎥ ⎦
· · · σnn
And k represents the variance–covariance matrix (as below): ⎡
σ11 σ12 ⎢ σ21 σ22 · · · ⎢ k = ⎢ .. .. ⎣ . .
σ1n σ2n .. .
⎤ ⎥ ⎥ ⎥ ⎦
σn1 σn2 · · · σnn
2.3 Principal Component The principal component analysis (PCA) is a tool that summarizes the information stored in large data tables. This summarization is done with the help of smaller sets. These smaller sets are of ‘summary indices’. The reason behind the implementation of the ‘summary indices’ is that they are easy to visualize and analyse than large databases [15]. It helps to find correlation between data points. For example, it can be
58
A. Joshi et al.
helpful to identify the probability of two products being bought together in a general store. PCA is the basic form of multivariate data analysis. It uses projection methods. This approach is helpful to predict future trends, outliers, jumps and clusters. Some data sets are found to have missing values, categorial data, some imprecise measurements and multicollinearity. This algorithm is fit to work on such data sets too. In this, the ‘Summary indices’ is created by extracting the important information from the data. This information is further expressed as summary indices, i.e. the principal components. At the very base of this algorithm, it finds lines, planes and hyper-planes in a K-dimensional area. These lines, planes and hyper-planes help to approximate data. This theorem also works for least squares. To find the variance of the coordinates of a line or a plane to be as large as possible, it is important to find a line that best fits the least square. Consider a matrix X with N rows and K columns. The rows represent the observations and the columns represent the Variables. A K-dimensional matrix is created. In next step, each observation of X-matrix is to be placed on the K-dimensional space. This seems like a swarm of points on the K-dimensional space. The mean centring is done in the next step and variable averages are subtracted from the data. This vector of averages is interpretable as a point in the space. The point is situated in the middle of the swarm of the points. Now the data is ready for the computation of first principal component (PC1). This component is the line in the Kdimensional space that best fits in the least square sense. Similarly, a second principal component (PC2) is calculated, a sample is shown in Fig. 11: K-dimensional plane (sample) [7]. Both the PC1 and PC2 create a model plane. A sample is shown in Fig. 12: model plane (sample) [7]. The coordinate values of the observations on model plane are called scores. The plotting of the projected configuration is known as a score plot. If two variables are positively correlated, when the numerical value of one variable increases or decreases, the numerical value of the other variable has a tendency to change in the same way [7]. Fig. 11 K-dimensional plane (sample)
A Comparative Study of Supervised Learning Techniques …
59
Fig. 12 Model plane (sample)
2.4 Isocluster The isocluster analysis uses the technique of migrating means. Under this, technique of migrating means a modified iterative optimization clustering approach is used [16]. All cells are separated into a user-specified group. These groups are unimodal and formed in a multidimensional space of a raster of multiple bands. A cluster centre is made and all samples are then assigned to these centres. This theorem works iteratively and with every iteration a new mean is calculated for every class. The no. of classes is usually not known. A high number of classes are chosen then after analysis of resulting cluster, the function is rerun with a reduced number of classes. In this approach, the minimum Euclidian distance is calculated and each candidate class is assigned to a cluster.
2.5 Support Vector Machine The support vector machine can be explained as a discriminative classifier. This classifier is defined by a separating hyper-plane. The linear algebra is made into the hyper-plane with the help of a kernel. The input (Y ) and each support vector (yi) are crossed with each other in a dot product. This dot product is calculated using the following linear kernel equation: f (y) = B(0) + sum(ai ∗ (Y, yi)) The polynomial kernel equation can be written as follows: K (Y, yi) = 1 + sum(Y ∗ yi)d
60
A. Joshi et al.
and exponential kernel equation can be written as follows: K (Y, yi) = exp(−gamma ∗ sum Y − yi2 To avoid the misclassifying of the training examples, the SVM optimization uses the regularization parameter. The regularization parameter is also termed as C factor. To calculate the influence of a single training example, the SVM approach uses the gamma parameter. The low values of the gamma parameter denote that the influence is ‘far’ and the high value of the gamma parameter shows that the influence is ‘close’. To depict the close class points, a margin is used. This margin is a separation between lines [12].
3 Future Research The research deals with a limited area of pixel-based and object-based supervised learning techniques. In future, the research can be extended by applying the metaheuristic algorithms [17]. These algorithms can be applied on the data set and compared. The frog leaping algorithm [18, 19] that supports opposition learning can be imposed on the data set to calculate results and compare them with the other algorthims such as butterfly optimization algorithm [20], bee colony application [21], etc. These comparisons can further extend this research and can provide a wide basis of comparison among the algorithms to classify land use and land cover classification.
4 Conclusion The EuroSAT data set with all the 13 spectral resolution bands is used for remote sensing image classification. Pixel-based supervised image classification has been implemented. After a successful implementation of the five approaches, i.e. maximum likelihood estimation, minimum distance classifier, principal component analysis, isoclustering and support vector machine, it has been found that the support vector machine approach is the most effective in classification of remote sensing images.
References 1. (2020) The USGS site [Online]. Available: https://www.usgs.gov/ 2. Lu, D., Weng, Q.: “A survey of image classification methods and techniques for improving classification performance.” In: Int. J. Remote Sens., pp. 823–870 (Mar 2006)
A Comparative Study of Supervised Learning Techniques …
61
3. Pradham, P., Younan, N.H., King, R.L.: “Concepts of image fusion in remote sensing applications.” Department of Electrical and Computer Engineering, Mississippi State University, USA 4. (2020) The GISgeography site [Online]. Available: https://gisgeography.com/spectral-signat ure/ 5. Tuia, D., Volpi, M., Copa, L., Kanevski, M., Muñoz-Marí, J.: “A survey of active learning algorithms for supervised remote sensing ımage classification.” In: IEEE J. Sel. Top. Sign. Proces. 5(3) (Jun 2011) 6. Tuia, D., Ratle, F., Pacifici, F., Kanevski, M.F., Emery, W.J.: “Active learning methods for remote sensing ımage classification.” In: IEEE. Trans. Geosci. Remote Sens. 47(7) (Jul 2009) 7. Romero, A., Gatta, C., Camps-Valls, G.: “Unsupervised deep feature extraction for remote sensing ımage classification.” In: IEEE Trans. Geosci. Remote Sens. 54(3), 1349–1362 (Mar 2016). (2020) The Umetrics Suite Blogs Site [Online]. Available: https://blog.umetrics.com/ what-is-principal-component-analysis-pca-and-how-it-is-used 8. (2020) The eurosat page on TensorFlow site [Online]. Available: https://www.tensorflow.org/ datasets/catalog/eurosat 9. (2020) The GISgeography site [Online]. Available: https://gisgeography.com/image-classific ation-techniques-remote-sensing/ 10. (2020) The Knowledge Portal on Stars Project sit [Online]. Available: https://www.stars-pro ject.org/en/knowledgeportal/magazine/image-analysis/algorithmic-approaches/classificationapproaches/pixel-based-classification/ 11. (2020) The Esri Resources site [Online]. Available: https://resources.esri.com/help/9.3/arc gisengine/java/gp_toolref/spatial_aanalys_tools/how_maximum_likelihood_classification_ works.htm 12. (2020) The Medium site [Online]. Available: https://medium.com/ 13. (2020) The Analytics Vidhya site [Online]. Available: https://www.analyticsvidhya.com/blog/ 2018/07/introductory-guide-maximum-likelihood-estimation-case-study-r/ 14. (2020) The Remote Sensing Lab. Available: https://sar.kangwon.ac.kr/ 15. Jolliffe, I.T., Cadima, J.: “Principal component analysis: a review and recent developments.” Phil. Trans. R. Soc. A., 374, 20150202 16. (2020) The Esri Resources site [Online]. Available: https://resources.esri.com/help/9.3/arcgis desktop/com/gp_toolref/spatial_analyst_tools/how_iso_cluster_works.htm 17. Rajpurohit, J., Sharma, T.K., Abraham, A., Vaishali.: “Glossary of metaheuristic algorithms.” In: Int. J. Comput. Inf. Syst. Indus. Manage. Appl. ISSN 2150-7988. 9, 181-205 (2017) 18. Sharma, T.K., Pant, M.: Opposition-based learning embedded shuffled frog-leaping algorithm. In: Pant, M., Ray, K., Sharma, T., Rawat, S., Bandyopadhyay, A. (eds.) Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computing, vol. 583. Springer, Singapore, (2018). https://doi.org/10.1007/978-981-10-5687-1_76 19. Sharma, T.K., Rajpurohit, J., Prakash, D.: Enhanced local search in shuffled frog leaping algorithm. In: Pant, M., Sharma, T., Verma, O., Singla R., Sikander A. (eds) Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computing, vol 1053. Springer, Singapore, (2020). https://doi.org/10.1007/978-981-15-0751-9_132 20. Sharma, T.K., Sahoo, A.K., Goyal, P.: “Bidirectional butterfly optimization algorithm and engineering applications.” In: Materials Today: Proceedings. Doi: https://doi.org/10.1016/j. matpr.2020.04.679 21. Sharma, T.K., Rajpurohit, J., Sharma, V., Prakash, D.: Artificial bee colony application in cost optimization of project schedules in construction. In: Ray, K., Sharma, T., Rawat, S., Saini, R., Bandyopadhyay, A. (eds.) Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computing, vol. 742. Springer, Singapore, (2019). https://doi.org/10.1007/978981-13-0589-4_63
Postal Service Shop Floor—Facility Layout Evaluation and Selection Using Fuzzy AHP Method S. M. Vadivel , A. H. Sequeira , and Sunil Kumar Jauhar
Abstract This paper aims to evaluate and select the optimal facility layout under operational and work environment criteria and its relevant sub-criteria using fuzzy AHP model. This study will be helpful in how to enhance the operational performance with the changes in facility layout. Initially, based on the field survey, we have created seven logical layouts planning, which was accepted by all shop floor employees, managers, and postal administrations to evaluate under production performance layout schemes. In this facility layout design (FLD) problem, seven alternative layouts and two performance criteria and 14 sub-criteria are considered. The results are impressive in analysing the FLD in shop floor postal layouts. This empirical study engaged in the national sorting hub (NSH), Mangalore, the southern part of India. Keywords Facility layout planning · Facility layout design · Mail processing operations · Fuzzy set theory · Fuzzy AHP · Operational performance · MCDM
1 Introduction The postal service is processing the articles through various functions such as collecting, scanning, sorting, dispatching, and delivering. In India, the postal department comes under the government through the ministry of communications and information technology. In a daily basis, more than six lakhs mails were collected and managed through 400 mail offices [11]. Then, it is delivered through their commercial vehicles such as air, train, and van. In recent days, postal service getting more S. M. Vadivel (B) Department of Industrial & Production Engineering, The National Institute of Engineering, Mysuru 570008, India e-mail: [email protected] A. H. Sequeira School of Management, National Institute of Technology Karnataka, Surathkal 575025, India S. K. Jauhar Indian Institute of Management Kashipur, Kundeshwari, Uttarakhand 244713, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_7
63
64
S. M. Vadivel et al.
articles, which is challenging to sort unless any efficient method. Postal departments traditionally following their operational sequence without aware of new techniques known as lean manufacturing concepts. Hence, we are planning to implement lean manufacturing concepts in Indian postal service in NSH Mangalore to enhance the operational performance so that the articles reach faster to the customers. In connection with this, modifying the existing layout design through lean service and workplace environment attributes is the prime focus to evaluate the layouts. NSH sorting pattern mainly based on all-India sorting such as circle, regional, district, and local post office. Each country has a different sorting pattern to customize the standard to make the customers happy through delivering the articles on time. This paper arranged as follows: Section 2 is the literature review and addressing the research gap. Section 3 is the research methodology; Sect 4 is application of FAHP model in NSH Mangalore, Karnataka, and southern part of India. Section 5 is results and discussion, while Sect 6 is conclusions, limitations, and further research recommendations.
2 Literature Support Some of the literature supports highlighted on fuzzy AHP and other MCDM methods applied in manufacturing or service industries, as mentioned in Table 1. There is scant in existing literature discussed the application of the fuzzy AHP model applied in the postal service industry for the enhancement of operational performance.
3 Research Methodology Systematic layout planning (SLP) considered for procedural layout design and relationship chart (REL) used for logical relationships among the layout area sections. Figures 1 and 2 show the methodology proposed for evaluating the layout using FAHP under cost and work environment divisions. Table 2 shows the selection criteria and sub-criteria for the layout evaluation. Integrate the three experts’ opinions such as postmaster general, postal manager, and quality manager and obtain the feasible layout using quantitative and qualitative data.
Postal Service Shop Floor—Facility Layout Evaluation …
65
Table 1 Brief literature review on FAHP and other MCDM techniques applications S. No Author, Year Industry applied
Methodology
Description
1
[7]
Tourism service industry
Fuzzy AHP
Evaluated performance of the service of a foreign travel intermediary using FAHP. For that, they gathered data from 36 senior travel managers and 56 general managers in different travel agencies
2
[9]
Small–medium QFD and fuzzy AHP enterprises (SME) in the southern part of Taiwan
The authors proposed the product planning phase of the product with the help of QFD and fuzzy AHP
3
[11]
tourism development Fuzzy AHP
The projected model used for a Iranian tour agency
4
[16]
Coal mining
Trapezoidal fuzzy AHP Proposed to estimate method work safety in hot and moist environments. They considered work, environment, workers, and ten sub-factors
5
[1]
Healthcare Service
Fuzzy AHP
Investigated healthcare performance measures and lean supply chain management (LSCM) practices
6
[17]
Water management
Fuzzy AHP and fuzzy TOPSIS
Presented to prioritize a set of water loss management strategies
7
[3]
Small–medium enterprise (SME)
fuzzy AHP—TOPSIS approach
Recommended to find the lean implementation rank in SMEs. The fuzzy AHP used for barriers weightages (continued)
3.1 Fuzzy AHP Method A fuzzy number is a special fuzzy set F = {x, µA(x), xR}, where x takes its value on the real line R: −∞ < x < + ∞ and µA(x) is a continuous mapping from R1 to the closer interval [0,1].
66
S. M. Vadivel et al.
Table 1 (continued) S. No Author, Year Industry applied
Methodology
Description
8
[10]
Welding industry
DOE and fuzzy AHP
The fuzzy AHP technique used in the Taguchi method for optimization in submerged arc welding parameters
9
[6]
Flight service
Fuzzy AHP and 2-tuple Fuzzy AHP applied to fuzzy linguistic method examine the in-flight service quality structure problem
10
[13]
Food production
Fuzzy AHP—TOPSIS
Attempted to formulate a sustainable manufacturing strategy. The three facility decision such as location, supply chain distance, and minimal socio-environmental impact were considered for this strategy
11
[8]
Hospital
Fuzzy AHP with SHELL method
Proposed for an operating theatre to design and choose the best layout plans using fuzzy AHP and SHELL method
12
[15]
Transpiration
AHP
Evaluated eight private bus travels under safety, operation, comforts, etc.,
Step 1: A triangular fuzzy number can be denoted as U = (1, 2, 3); its membership function μA(x): R → [0,1] is equal to: ⎧ 1 1 ⎪ ⎨ 2−1 x − 2−1 , x ∈ [1, 2], 1 1 µA(x) = 2−3 − 2−3 , x ∈ [2, 3], ⎪ ⎩0 other wise, where 1 ≤ 2 ≤ 3, 1 and 3 stand for lower and upper value of the support of U, respectively, and 2 for the middle value. ˜ A(x) = A˜ = (1, 2, 3) Step 2: Relative importance in the fuzzy scale (Refer Table 3)
Postal Service Shop Floor—Facility Layout Evaluation …
67
Layout 1
Layout 2 Postal service shop floor plant
.
Operational performance
. .
Layout n
Fig. 1 Selection of optimal layout from “n” layouts
Fig. 2 The triangular fuzzy numbers membership function adopted from Kwong and Bai [5]
Step 3: Reciprocal fuzzy number using the below equation − 1 = (l, m, u) − 1 = A
1 1 1 , , u m l
Step 4: Fuzzified pair-wise comparison matrix Step 5: Equation to sum two fuzzy numbers to find fuzzy weight (Wi) A1 + A2 = (l1, m1, u1) + (l2, m2, u2) = (l1 + l2, m1 + m2, u1 + u2)
68
S. M. Vadivel et al.
Table 2 Criteria and sub-criteria for the layout selection
Criteria
Sub-criteria
Cost division (CD) (facility and operations)
Mail flow (CD1) Personnel flow (CD2) Minimum distance (CD3) Future expansion (CD4) Space consumption (CD5) Process suitability and equipment changes (CD6) Aesthetics (CD7)
Work environment division (WED) (facilities, safety, and control)
Emergency exit (WED1) Security (WED2) Supervision (WED3) Comfortness (WED4) Light facilities (WED5) Noise control (WED6) Pollution control (WED7)
Table 3 Comparative rating scale [1–9]
AHP scale
Fuzzy AHP scale
Definition
Intensity of importance
Intensity of importance
Equal importance
1
(1,1,1)
Moderate importance
3
(2,3,4)
Strong importance
5
(4,5,6)
Very strong importance
7
(6,7,8)
Extreme importance 9
(9,9,9)
Intermediate values
(1,2,3) (3,4,5) (5,6,7) (7,8,9)
2 4 6 8
Step 6: Fuzzy weights calculated as follows Wi =
n
aijk
n n , i=1 . j=1 aijk
1, 2, . . . , nWi = i = 1, 2, . . . , n Step 7: Defuzzication using centre of area (COA) method j=1
COA(Wi) =
l +m+u 3
n
n i=1
aijk
n , j=1 aijk
j=1
.
i =
Postal Service Shop Floor—Facility Layout Evaluation …
69
Fig. 3 Example of candidate layouts
Step 8: Normalized weight for ranking the alternatives.
4 Application—NSH Mangalore Either the manufacturing or service industry, FLD has a crucial role in performance [2]. Here, the postal service industry’s shop floor layout problem was examined. This case contains seven alternative layouts for evaluation (see Fig. 3 for candidate layout)—the proposed fuzzy AHP method used for ranking these alternative layouts and provides a basis for decision making.
5 Results Analysis 5.1 Finding the Performance Weights Criteria Using FAHP Method The questionnaire was given to these postal managers for their decision about the rate of the criteria. Their opinions about the relative importance of a pair of criteria are shown in Table 4. Then, integrate the three decision maker’s views and obtain the fuzzy matrix calculations as follows:
70
S. M. Vadivel et al.
Table 4 Pair-wise comparison of main attributes C
Criteria (C)
Operational (C1)
Work environment (C2)
Operational (C1)
(1,1,1)
(2/3,1,2) (2,3,4) (1,1,1)
Work environment (C2)
(3/2,1,1/2) (1/4,1/3,1/2) (1,1,1)
(1,1,1)
CM =
(4.67, 6, 8) (3, 3.33, 3.75)
where CM—criteria matrix 2
aij = (4.67, 6, 8) + (3, 3.33, 3.75) = (7.67, 9.33, 11.75)
j=1
Cw1 =
(4.67, 6, 8) = (0.61, 0.64, 0.68) (7.67, 9.33, 11.75)
Cw2 =
(3, 3.33, 3.75) = (0.39, 0.36, 0.32) (7.67, 9.33, 11.75)
Cw1 = {(0.61 + 0.64 + 0.68)/3} = 0.643. Cw2 = {(0.39 + 0.36 + 0.32)/3} = 0.357.
5.2 Calculation Method Step 1:
Step 2: Step 3:
Step 4:
Pair-wise comparisons were obtained through all criteria and sub-criteria thoroughly analysed in entire hierarchy formation using the comparative scale. Then, relative weights were obtained through a matrix structure. The global weight can compute by multiplying the relative weight of criteria and the sub-criteria into local weights of the seven layouts (Saaty 1990). See Table 5 for further details. Based on the global priority weights (see Table 6), layouts have been ranked.
0.156576
0.146071
0.143939
0.198807
0.115934
CD2
CD3
CD4
CD5
CD6
CD7
0.643
0.125094
0.140474
0.118817
0.141953
0.149192
0.199282
WED2
WED3
WED4
WED5
WED6
WED7
0.357
0.125189
WED1
WED
0.106065
0.132607
CD1
CD
Weight
Sub criteria
Criteria
Local weights
0.127025
0.166995
0.101029
0.18094
0.174972
0.10921
0.139829
0.14482
0.105228
0.165651
0.168227
0.173558
0.108508
0.134008
Layout 1
0.148926
0.171673
0.211741
0.0919328
0.123657
0.113165
0.138905
0.120761
0.145326
0.157104
0.138246
0.137665
0.0888492
0.212048
Layout 2
0.171492
0.139402
0.142071
0.107732
0.118746
0.102793
0.217765
0.111629
0.139482
0.148626
0.173123
0.176917
0.092578
0.157645
Layout 3
Table 5 Local and global weights of facility layouts using fuzzy AHP method
0.177997
0.255469
0.186743
0.0962831
0.0765702
0.0721005
0.134837
0.10267
0.162328
0.138859
0.105896
0.177407
0.190352
0.122489
Layout 4
0.136355
0.175404
0.143896
0.154841
0.07758
0.240353
0.0715714
0.1949
0.12134
0.12745
0.09372
0.15074
0.14773
0.10646
Layout 5
0.300713
0.155717
0.0855839
0.180505
0.123682
0.0561995
0.0976006
0.200036
0.269068
0.128892
0.155509
0.0632404
0.0940952
0.0891597
Layout 6
(continued)
0.194344
0.170123
0.173968
0.208209
0.0981624
0.105094
0.0500995
0.136783
0.210086
0.123101
0.186265
0.105171
0.106551
0.132044
Layout 7
Postal Service Shop Floor—Facility Layout Evaluation … 71
0.132607
0.156576
0.146071
0.143939
0.198807
0.115934
CD2
CD3
CD4
CD5
CD6
CD7
0.643
0.140474
0.118817
0.141953
0.149192
0.199282
WED3
WED4
WED5
WED6
WED7
Sum
0.125094
WED2
0.357
0.05062
0.009037025
0.008894412
0.005119869
0.007675053
0.008774709
0.004877161
0.09124 0.006249304
0.125189
WED1
WED
0.010795688
0.013451601
0.01533146
0.015800494
0.017473536
0.009252076
0.009139318
Layout 1
Sum
0.106065
CD1
CD
Weight
Sub criteria
Criteria
Global weights
Local weights
Table 5 (continued)
0.05183
0.010595143
0.009143569
0.010730445
0.003899575
0.006201302
0.005053786
0.006208008
0.09100
0.009002197
0.018577444
0.014540411
0.012984569
0.013859888
0.007575843
0.01446163
Layout 2
0.05167
0.012200571
0.007424766
0.007199763
0.00456974
0.005955019
0.004590587
0.009732456
0.09262
0.008321447
0.017830389
0.013755749
0.016260345
0.017811715
0.007893784
0.010751357
Layout 3
Layout 4
0.05290
0.012663361
0.013606673
0.009463616
0.004084105
0.003839936
0.003219904
0.006026199
0.09364
0.007653593
0.020750859
0.012851786
0.009946139
0.017861047
0.016230611
0.008353725
Layout 5
0.05072
0.009700796
0.009342288
0.007292249
0.006567994
0.003890576
0.010733818
0.003198703
0.08567
0.01452893
0.015511244
0.011795851
0.008802525
0.015176257
0.012596391
0.00726055
Layout 6
0.05475
0.021393828
0.008293728
0.004337154
0.007656601
0.006202556
0.002509789
0.004362012
0.09631
0.014911796
0.034395743
0.011929313
0.014605973
0.00636694
0.00802315
0.006080673
Layout 7
0.05239
0.013826346
0.009061014
0.008816215
0.00883174
0.004922768
0.004693346
0.002239071
0.09461
0.010196566
0.026855903
0.011393339
0.017494689
0.010588445
0.00908521
0.009005374
72 S. M. Vadivel et al.
Postal Service Shop Floor—Facility Layout Evaluation …
73
Table 6 Global values of the facility layouts alternatives Layouts L1
L2
L3
L4
L5
L6
L7
FAHP overall Weights
0.13629
0.11304
0.16772
0.17878
0.09205
0.21154
0.19978
FAHP—ranking
5
6
4
3
7
1
2
Table 7 Comparative results—AHP and FAHP
Alternative layouts
AHP ranking
Fuzzy AHP ranking
L1
4
5
L2
5
6
L3
7
4
L4
2
3
L5
3
7
L6
1*
1*
L7
6
2
Note *Operational performance-oriented layout 6 incorporated in NSH on April 2018
5.3 Comparative Study Murugesan et al. [12] were examined for the comparative study as shown in Table 7. We found out layout 6 has an optimal operational performance layout. It has got the highest global weight (0.22587, 0.21154) both AHP and FAHP methods. In contrast with this, layout 4 is the second priority in AHP, whereas layout 7 is the second priority in FAHP. The limitation of this study is the absence of sensitivity analysis for conforming robustness of the pair-wise comparisons of the rank.
6 Conclusions The paper aims to optimize the facility layout using fuzzy AHP method for the betterment of operational performance layout from various alternatives. This study is importance because a well-designed FLD planning reduces the material handling and operational cost either in manufacturing or service industry. The fuzzy AHP method is advantages over AHP as fuzzy resolves the imprecise and ambiguity judgement of decision makers. It deals with elusive subjects in the pair-wise comparison process. According to the results, AHP and FAHP showed that layout 6 is found to be a suitable operational performance layout, and it was implemented in NSH postal service on April 2018. The postal administration is quite satisfactory with the improvement of
74
S. M. Vadivel et al.
production performance results. Hence, a practical implication is justified. Limitation of this study is the obtained solution other MCDM methods such as MACBETH, WASPAS, BWM, and EDAS. In future, more participants can be surveyed to get better results with different expertise in the domain knowledge. This methodology can be adopted in other service industries to find the optimal FLD problems.
References 1. Adebanjo, D., Laosirihongthong, T., Samaranayake, P.: Prioritizing lean supply chain management initiatives in healthcare service operations: a fuzzy AHP approach. Prod. Plan. Control 27(12), 953–966 (2016) 2. Apple, J.M.: Plant Layout and Material Handling. Wiley, New York (1997) 3. Belhadi, A., Touriki, F.E., El Fezazi, S.: Prioritizing the solutions of lean implementation in SMEs to overcome its barriers: an integrated fuzzy AHP-TOPSIS approach. J. Manuf. Technol. Manage. 28(8), 1115–1139 (2017) 4. India post mailing service, Indian postal mail operations. https://www.indiapost.gov.in/VAS/ Pages/RTI/RTI-Manual-1.aspx. Accessed 5 June 2020 5. Kwong, C.K., Bai, H.: A fuzzy AHP approach to the determination of importance weights of customer requirements in quality function deployment. J. Intell. Manuf. 13(5), 367–377 (2002) 6. Li, W., Yu, S., Pei, H., Zhao, C., Tian, B.: A hybrid approach based on fuzzy AHP and 2-tuple fuzzy linguistic method for evaluation in-flight service quality. J. Air Transport Manage. 60, 49–64 (2017) 7. Lin, C.T., Lee, C., Chen, W.Y.: Using fuzzy analytic hierarchy process to evaluate service performance of a travel intermediary. Serv. Ind. J. 29(3), 281–296 (2009) 8. Lin, Q., Wang, D.: Facility layout planning with SHELL and fuzzy AHP method based on human reliability for operating theatre. J. Healthcare Eng. 2019, 1–12 (2019) 9. Liu, H.T., Wang, C.H.: An advanced quality function deployment model using fuzzy analytic network process. Appl. Math. Model. 34(11), 3333–3351 (2010) 10. Majumder, A.: A simple and robust fuzzy-AHP-based Taguchi approach for multi-objective optimization of welding process parameters. Int. J. Prod. Qual. Manage. 20(1), 116–137 (2017) 11. Makui, A., Nikkhah, Z.: Designing fuzzy expert system for creating and ranking of tourism scenarios using fuzzy AHP method. Manage. Sci. Lett. 1(1), 29–40 (2011) 12. Murugesan, V.S., Sequeira, A.H., Shetty, D.S., Jauhar, S.K.: Enhancement of mail operational performance of India post facility layout using AHP. Int. J. Syst. Assur. Eng. Manage. 11(2), 261–273 (2019) 13. Ocampo, L.A.: Applying fuzzy AHP–TOPSIS technique in identifying the content strategy of sustainable manufacturing for food production. Environ. Dev. Sustain. 21(5), 2225–2251 (2019) 14. Ramachandran, K: Indian postal history focus on Tamilnadu, Imayaa publications, India (2011) 15. Vadivel, S.M., Sequeira, A.H., Jauhar, S.K., Baskaran, R., Robert Rajkumar, S.: Application of multi-criteria decision-making method for the evaluation of Tamil nadu private bus companies. Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computing, vol. 1154. Springer, Singapore, (2020). https://doi.org/https://doi.org/10.1007/ 978-981-15-4032-5_21 16. Zheng, G., Zhu, N., Tian, Z., Chen, Y., Sun, B.: Application of a trapezoidal fuzzy AHP method for work safety evaluation and early warning rating of hot and humid environments. Saf. Sci. 50(2), 228–239 (2012) 17. Zyoud, S.H., Kaufmann, L.G., Shaheen, H., Samhan, S., Fuchs-Hanusch, D.: A framework for water loss management in developing countries under fuzzy environment: integration of Fuzzy AHP with Fuzzy TOPSIS. Expert Syst. Appl. 61, 86–105 (2016)
Wireless Motes Outlier Detection Taxonomy Using ML-Based Techniques Isha Pant
and Ashish Joshi
Abstract WSNs have enthused resurgence in research on machine learning-based approaches with the intent of overcoming the physical restraints of sensors. Although resource constrained in nature, WSNs domain has tremendous potential for building powerful applications, each with its own individual characteristics and requirements. This fascinating field of WSNs although comprises of various research issues and challenges, viz. energy efficiency, localization, etc., which needs to rectified. One such prominent challenge is detection of outliers whose chore is to preclude any kind of malicious attacks in the network or lessen the noisy error prone data in millions of wireless sensor networks. To do so, the methodologies developed needs to take care of inherent limits of sensor networks so that the energy intake of motes is minimum and lifespan of the motes is maximized. Consequently, the quality of data must be thoroughly patterned as any kind of outlier in the sensed network may degrade the quality of the data and hence affect the final decision. Thus, it becomes imperious to retain the quality of the data. Numerous ML-based methodologies have been used by several researchers over the time to detect any form of outliers or anomaly present in the network. In this paper, some machine learning methodologies have been discussed which have proved their mettle in outlier detection for sensor networks. This paper presents a brief survey on outlier detection in WSNs data using various ML-based techniques. Keywords Outlier detection · Motes · Machine learning · Sensor networking
1 Introduction Wireless sensor networks arena have arisen as a new computing field spearheading the basis of pervasive practice of computing. The sensor network field consists of dispersed form of self-directed devices comprising of sensors to monitor the environmental conditions or occurring physical conditions. The sink or central base station I. Pant (B) · A. Joshi Department of Computer Science, THDC Institute of Hydropower Engineering and Technology, Tehri, Uttarakhand, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_8
75
76
I. Pant and A. Joshi
deployment in WSN is very imperative as all dispersed node eventually delivers their respective assembled data to the foremost unit sink. The WSN comprises of random or fixed type of nodes which expand themselves in the sensor field through a proper deployment or arbitrary deployment mechanism. The prolonged intervals of node deployment and not so favorable environmental conditions infer restriction on energy usage as human intervention is not quite possible. So, life time of each placed node deployed in the field is effectively determined by its battery life [12]. The WSNs mainly stand on the methodology of one-time deployment of nodes with multiple budding applications. Being an information gathering source, the WSNs are considered to be the utmost standard services employed in a widespread variety of industrial as well as commercial-based applications. The domain includes a wide number of selfdirected, low-powered, circulating devices called as sensor motes or nodes. Every node comprises of multi-functional capabilities so as to gather information from their respective environment. This information being sensed by the nodes must be errorfree and highly secured from any kind of malicious actions [14]. The sensor network must provide correct and relevant sensed information to the end-user. If the sensed data is flawed by any means, then the performance of the system will eventually be decreased. Events that do not confirm the anticipated deeds of the system are termed as outliers. These outliers or anomalies are persuaded in the system for a variety of motives, such as malicious activities like cyber-intrusion, break-down of system, and terrorist activities. The roots of anomaly in sensor networks are namely events or intrusions, attacks, and the faulty sensor motes. To ease the detection of outliers, the machine learning techniques are introduced as an assortment of algorithms and tools proficient of forming prediction models. A low-false alarm rate mechanism must be followed by the outlier detection algorithm. A forthright outlier detection methodology is to state an area representing normal behavior and affirm any observation in the information that does not belong to that normal region as an outlier or anomaly. The aim of introducing ML-based outlier detection methods is to provide utmost appropriate data to the end-user. The paper presents an insight on various ML methodologies for the detection of outliers in sensor networks. The rest part of the paper is schematized in the following sections: Sect. 2 deals with the ML techniques for motes outlier detection; Sect. 3 says about the literature survey. Section 4 gives the findings from the literature followed by future directions, and Sect. 5 contains the conclusion of the paper.
2 Machine Learning Methodology Machine learning methodologies are renowned for their self-experiencing reliable, cost-effective, and efficient computing processing nature which needs not to be programmed often. The epitome of machine learning could be caught by the fact that the field is present in almost every scenario of the real world, from robotics to
Wireless Motes Outlier Detection Taxonomy Using ML …
77
natural language processing to speech processing to some general routines like stock market prediction to advertisement posts to big data analytics and many more. The integration of ML methodologies with sensor networks delivers enormous growth to the field of WSN. Varied applications of machine learning applied to WSN domain are as follows: i.
ii. iii.
iv.
v.
vi.
The ML algorithms are beneficial in increasing the efficiency of the sensor network, and these algorithms being used help in the removal of the dead or dysfunctional mote from other active motes working in the environment. For the nodes working in the mobile sink environment, the localization can be made smooth by applying varied ML methodologies. The energy-harvesting schema makes sensor networks more self-driven, capable of working long and with minimum maintenance even if installed in severe locations without human intervention [13]. The benefit of using ML methodologies here helps in increasing the lifespan of the network, which consecutively makes WSNs suitable for forecasting the amount of energy to be harvested for a given time duration. By reducing the dimensionality of the sensed data at a cluster head (CH); i.e. the transmission overhead which was occurring due to bulk sending of data to the respective CH can be reduced using the ML algorithms. ML methodology can also resolve the coverage problems among motes in the targeted environment caused by the presence of fixed nodes present in the environment. Dynamic routing is required in WSNs because of the changing nature of the sensor network. To ease this condition, ML techniques are favorable as they can improve the system performance.
ML field constitutes of wide form of themes & patterns; therefore, these themes provide a promising solution to various WSNs challenges so as to eradicate the need for unnecessary restructuring of the network. The following section gives an insight on ML techniques specifically beneficial for outlier detection in wireless sensor network.
3 Detecting Outlier in WSN Using ML Based Techniques The ML techniques used to identify the outlier pattern in sensor networks are categorized as supervised learning methodology, unsupervised learning methodology and reinforcement learning methodology, respectively.
78
I. Pant and A. Joshi
3.1 Supervised Learning Approach The model is based on labeled training set. Here, the inputs are predefined and madeup of training and test dataset. The algorithm classifies the outputs on the basis of patterns learned from the training set. This learning method is used to resolve various issues for WSNs such as localization & objects targeting, query processing, MAC, security and intrusion detection, QoS & data integrity. i.
ii.
iii.
iv.
Neural Networks: ANNs are systems inspired by functioning of human brain, designed to recognize a particular pattern from a set of inputs given to them without being actually programmed. Neural network comprises of three layers, namely input layer, output layer, and a hidden layer in between them. The ANNs interpret the sensory data via machine perception. In WSNs, the neural networks are used to detect faulty or dead nodes, for improvising the nodes efficiency. Support Vector Machines: ML approach that learns to use labeled training samples is to classify the data points. The primary purpose of SVM is to create decision boundaries so as to set apart n-dimensional space into classes so that new data point can be positioned in correct place. We call this decision boundary as hyperplane [15]. In WSNs, the SVM is used detecting malicious behavior of motes, classification, and regression analysis. Decision Tree: As the name suggests, decision tree is a tree-like structure graph. It constitutes of cost features, utility, and outcomes of the resources. It is a modeling approach based on predictions. The decision tree methodology is used in sensor networks for outlier detection and many connectivity problems. K-nearest neighbor: KNN classifies datasets points on the basis of the points that are most similar to it. Also called as lazy learner algorithm, as KNN is not capable for learning instantly from the training set, instead it stores the dataset. At the time of classification, this algorithm performs an action on that particular dataset.
3.2 Unsupervised Learning The approach is used to find patterns from unlabeled datasets where the outputs are unknown. This methodology is used for clustering method and data aggregation in WSNs. The unsupervised approach is suitable for sensor networks as it discovers the hidden relationships on its own. i. ii.
Principal Component Analysis: PCA is a multi-variate methodology used for dimensionality reduction. It is useful in filtering varied noisy datasets. Kohonen’s Maps (Self-organizing Maps SOM): The self-organizing ANNs are said to be as Kohonen’s Map. Kohonen’s map (self-organizing maps) is trained using competitive learning. Instead of having a series of layers, the structure SOM constitutes of single-layer linear 2D grid of neurons.
Wireless Motes Outlier Detection Taxonomy Using ML …
iii.
79
K-Means Clustering: The K-means clustering is mainly used as a preprocessing unit for various methods to find out the basic initial pattern. It categorizes data into varied clusters and works in a sequential manner by selecting knodes randomly. In context with WSNs, this methodology is used in clustering of nodes.
3.3 Reinforcement Learning It enables an agent (a mote) to learn by interacting with the working environment. The agent mote with its own experience is acquainting to take best actions that will maximizes its long-term rewards. The approach depends on the measures: trial search and delayed outcome. RL learning is used in various WSN routing problems. The renowned reinforcement learning approach for solving routing issues in WSNs is Q-learning algorithm.
4 Literature Survey This section presents a generalized study on ML methodologies used to detect outliers in sensor networks. The following paper compiled varied ML-based methodologies for outlier detection for wireless sensor networks discussed as follows: Yang et al. [1] had put-forwarded online mode outlier detection of motes methodology using one-class support vector machine. The proposed concept takes spatial and temporal relation STA benefits to detect the anomaly. The work is conducted on sensorscope system using MATLAB. The proposed schema is suited to provide detection accuracy. Chen et al. [2] offered an intruder recognition mechanism in the concerned network to surge the system efficiency by making use of SVM and immune algorithm (IA). The research is based on AI-SVM model. The tactic is well-suited, resulting in improved reliability. Xu Suya et al. [3] provided a method to detect poly-dimensional anomalies in sensor networks by conducting MATLAB simulation using support vector machine based on k-NN (k-nearest neighbor) algorithm. The experimental results of the conducted research provide a better high-outlier detection rate and reduction in dimensionality aspect. Zhang et al.[4] used real-time datasets MATLAB simulation put-forwarded another outlier detection mechanism based on hyper-ellipsoid one-class SVM to minimize the accuracy issue of the working network. The experimental results provide a better low-false alarm rate in the network. Martin et al.[5] devised a methodology using Sliding Window Algorithm & Least Square SVM to detect any online outlier in Transient Time series. The methodology
80
I. Pant and A. Joshi
is devised on Contiki OS using Zoletria motes and a three-tier setup experimentation. The model is feasible to be applied for monitoring the systems in Sensor Networking. Ayadi et al. [6] created a model approach by conducting a research, by setting up some real-time sensor mote in a living laboratory to demonstrate the results using OF, PW, and LOF methods. The research was conducted to find any type of faults in the network. The experiment concluded making LOF favorable among others. Martin et al. [7] provided a dataset using a virtual system for Online Anomaly detection in transient time series using SVM. The author implemented a hierarchical agent framework with radial basis function (RBF) Kernel to find faults. The methodology is an improvement over simple RBF kernel implementation. Wazid et al. [8] conducted an experimentation on OPNET Modular using Weka 3.6.10 dataset to detect black-hole nodes in WSN Setup. The devised setup is well-suited to find any kind of black-hole motes in the network using the k-means clustering approach. Ayadi et al. [9] presented a novel approach for detection of anomaly in real transaction dataset taken from CRNS Lab using Neural Net Approach, Bayesian and OCSVM ML Techniques. The main concern of the research is better network efficiency. Vilenski et al. [10] used a Pipeline Prototype Software System to find malfunctioning dendrometer sensor motes using statistical approach. The conduct was successful and effective to be applied on a huge dataset to detect faulty motes to ensure quality of data circulating in the network. Chen et al. [11] put-forwarded an enhancement over traditional spatiotemporal and attribute correlations (STASVDD) mechanism by proposing Novel-STASVDD method for anomaly detection. The research was conducted so as to surpass the huge computational programming required in the former approach. The devised approach performs better for anomaly detection. Taxonomy of varied ML methodologies for detecting the outliers in sensor networks is given in Table 1.
5 Findings from Reviewed Literature and Conclusion The study presents a brief taxonomy of varied ML-based outlier detection methodologies for wireless sensor network domain using methodology like k-NN, statistical approaches, neural networks, support vector machine, and its classes. The objective of each methodology proposed is to detect any kind of faulty or malfunctioning node in the sensor deployment environment. The taxonomy shows all prominent research from a time period from 2009 to 2020. Among various methodologies, the support vector machine (SVM) is considered to be prominent. In the future, more emphasis needs to be given to detection of faulty and malfunctioning mobile nodes deployed in the sensor networks.
Year
2009
2011
2012
2012
2015
2015
Reference
[1]
[2]
[3]
[4]
[5]
[6]
Platform
Aim of proposed research
AI-SVM Model
Detection of Outlier based on Hyperellipsoid one-class SVM
Removal of anomaly in WSN using SVM
Recognizing Intruders in the System
Accuracy detection
Poly-dimensional Anomaly detection
Efficiency
Accuracy
Issues in WSN
Ayadi, H., Zouinkhi, A., Boussaid, B. and M NaceurAbdelkrim
Real-time data of motes placed in a living lab
Detection & Localization of Faults
Accuracy of the system
Hugo Martins, Palma, Contiki OS, Zolertia Accommodation & Anomaly detection Alberto Cardoso and Motes, testbed with Online detection of Paulo Gil three tank benchmark anomalies system
Y. Zhang, Nirvana Real-time Datasets, Meratnia and Paul Matlab Havinga
Xu, S., Hu, C., Wang, MATLAB Software L., & Zhang, G
Y. Chen, Y. Qin, Y. Xiang, J. Zhong, and X. Jiao
Yang Zhang, Nirvana Sensorscope System, Online mode Meratnia and Paul MATLAB Software Outlier detection Havinga
Author
Table 1 Comparative analysis of different machine learning-based methods used for outlier detection
Methodologies OF,LOF & PW
Least Square SVM, Sliding Window based algorithm
Hyper-ellipsoid oneclass SVM
SVM based on k-Nearest Neighbor algorithm (k-NN)
Support Vector Machine (SVM) & Immune Algorithm (IA)
One-class Support Vector Machine (OCSVM)
ML Methodology Used
(continued)
Among the 3 methods, density based anomaly detection LOF is favorable
Feasible methodology to monitor systems over WSNs
Detecting lower-false alarms or False positive rate
Resulting in dimensionality reduction, High detection accuracy
Better reliability and Performance
Better detection accuracy, low-false alarm rate, low computational cost
Experimental Results
Wireless Motes Outlier Detection Taxonomy Using ML … 81
Year
2015
2016
2017
2019
2019
Reference
[7]
[8]
[9]
[10]
[11]
Table 1 (continued) Platform
OPNET Modular
Chen, Y., & Li, S
EfratVilenski, Peter Bak, Jonathan D. Rosenblatt Real & Synthetic sensor network datasets
Pipeline Prototype Software System
AyaAyadi, CRNS Lab Dataset OussamaGhorbel, M. S. Bensaleh, Abdelfateh Obeid and Mohamed Abid
Mohammad Wazid
H. Martins, F. Dataset generated Januário, L. within a Virtual Palma, A. System Cardoso and P. Gil
Author
Detection of Faulty nodes
Network Performance
Detection of black-hole nodes
Anomaly detection and improvement of Sensitivity in transient time series
Issues in WSN
Anomaly detection Reducing energy using consumption caused novel-STASVDD by traditional STASVDD
Identifying malfunctioning dendrometer sensors
Outlier Detection in real transaction dataset
Hybrid Outlier Detection in WSN using weka 3.6.10 dataset
Online Outlier detection
Aim of proposed research
(STASVDD) Spatiotemporal and attribute correlations
Statistical Approach, Dendrogram
Neural Technique approach, Bayesian Network, OCSVM
k-Means Clustering, Using weka 3.6.10
Radial basis function kernel, LS-SVM, Multi-agent framework (Hierarchical)
ML Methodology Used
Better outlier Detection correctness
Effective for huge datasets
Reduction in time complexity, better accuracy & communication overhead
Suitable for finding Black-hole nodes
Improvement over RBF Kernel approach
Experimental Results
82 I. Pant and A. Joshi
Wireless Motes Outlier Detection Taxonomy Using ML …
83
References 1. Zhang, Y., Meratnia, N., Havinga, P.: Adaptive and online one-class support vector machinebased outlier detection techniques for wireless sensor networks. In: International Conference on Advanced Information Networking and Applications workshops. IEEE, pp. 990–995 (2009) 2. Chen, Y., Qin, Y., Xiang, Y., Zhong, J., Jiao, X.: Intrusion detection system based on immune algorithm and support vector machine in wireless sensor network. In: Information and Automation, ser. Communications in Computer and Information Science, vol. 86, pp. 372–376, Springer, Berlin Heidelberg, (2011) 3. Xu, S., Hu, C., Wang, L., Zhang, G.: Support vector machines based on k nearest neighbor algorithm for outlier detection in WSNs. In: 8th International Conference on Wireless Communications, Networking and Mobile Computing (2012) 4. Zhang, Y., Meratnia, N., Havinga, P.: Distributed online outlier detection in wireless sensor networks using ellipsoidal support vector machine. In: Elsevier Ad Hoc Network, pp. 1–13 (2012) 5. Martins, H., Palma, L.S., Cardoso, A., Gil, P.: A support vector machine based technique for online detection of outliers. In: Transient Time Series, IEEE, pp. 1–6 (2015) 6. Ayadi, H., Zouinkhi, A., Boussaid, B., Abdelkrim, M.N.: A machine learning methods-outlier detection in WSN. In: 16th International conference on Sciences and Techniques of Automatic control & computer engineering STA, December 21–23, pp. 722–727 (2015) 7. Martins, H., Januário, F., Palma, L., Cardoso, A., Gil, P.: A machine learning technique in a multi-agent framework for online outliers detection in wireless sensor networks. In: IECON 41st Annual Conference of the IEEE Industrial Electronics Society, Yokohama, pp. 688–693 (2015) 8. Wazid, M.: Hybrid anomaly detection using K means clustering in wireless sensor networks. In: Centre for Security, Theory and Algorithmic Research, International Institute of Information Technology, Hyderabad 500032, India, pp. 1–17 (2016) 9. Ayadi, A., Ghorbel, O., Bensaleh, M.S., Obeid, A.F., Abid, M.: Performance of outlier detection techniques based classification in Wireless Sensor Networks. In: IEEE, pp. 687–692 (2017) 10. Vilenski, E., Bak, P., Rosenblatt, J.D.: Multivariate anomaly detection for ensuring data quality of dendrometer sensor networks. In: Elsevier Computers and Electronics in Agriculture, pp. 412–421 (2019) 11. Chen, Y., Li, S.: A lightweight anomaly detection method based on SVDD for wireless sensor networks. In: Wireless Personal Communications (2019) 12. Shaw, S., Kadam, S., Joshi, S., Hadsul, D.: Advanced virtual apparel try using augmented reality (AVATAR). In: Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computing, vol. 1154. Springer, Singapore (2020) 13. Jha, S., Mehta, A.K., Azad, C.: A fuzzy logic based approach for prediction of squamous cell carcino. In: Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computing, vol. 1154. Springer, Singapore (2020) 14. Bhanu, K.N., Jasmine, H.J., Mahadevaswamy, H.S.. Machine learning Implementation in IoT based Intelligent System for Agriculture. In: International Conference for Emerging Technology (INCET), Belgaum, India, pp. 1–5 (2020) 15. Ascioglu, G., Senol, Y.: Design of a wearable wireless multi-sensor monitoring system and application for activity recognition using deep learning. IEEE Access 8, 169183–169195 (2020)
A Proposed IoT Security Framework and Analysis of Network Layer Attacks in IoT Neha Gupta and Umang Garg
Abstract Internet of things (IoT) is able to renaissance in our daily life which converts existing dumb devices into the brand-new technology. To do so, there are several integrated modules such as radio frequency identification (RFID), sensors, actuators, cloud services, wireless sensor and actor network (WSAN), and Android services. It offers a multidimensional service area which may reflect from a tiny device to the large industrial applications. Although it provides social and economic re-transformation in the field of the latest technologies, the interconnected objects and devices include several research challenges like heterogeneity, scalability, privacy, trust, interoperability, and energy efficiency which may generate during the interaction with remote applications. The security and privacy concern of the devices remain a crucial challenge that needs to be addressed. In this paper, we first proposed a new security framework that considers security at all levels of data movement. Secondly, we classify distinguished security issues at the network layer of IoT communication. Finally, we also discussed some countermeasure techniques that provide prevention from the attacks up to a distant level. Keywords IoT · Security issues · Network layer attacks · Countermeasure techniques
1 Introduction IoT is a field that makes human life easy through its smart applications ranging from small useful item to large smart city or healthcare appliances [11]. There are distinct types of possible communication in IoT such as application-to-hardware, hardware-to-hardware, and application-to-application. IoT is the technology which enables the device-to-device communication using some sensors and actuators [10]. N. Gupta · U. Garg (B) Graphic Era Deemed to be University, Dehradun, India U. Garg Graphic Era Hill University, Dehradun, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_9
85
86
N. Gupta and U. Garg
The foundation of IoT involves forming connected devices that can be accessed ubiquitously from anywhere. The origin of term IoT initiated by Kevin Ashton in the year of 1990 with a small integrated device which can be accessed remotely [3]. According to the authors [17], an expected growth of IoT may rise upto 1.7 trillion in the next year with 20 million of devices. To provide IoT services, some architectures proposed by several authors [13] which mainly includes three-layered, five-layered, and seven-layered architectures. The basic functionalities of IoT were defined by Gubbi et al. [9] three layers that ensure to provide the basic services of IoT from any remote system through Internet. Although three-layered architecture provides a good aspect for IoT, it does not include any security or privacy aspect of the system. The five-layer architecture [4] includes a wider and secured aspect of the IoT system which includes edge nodes, abstraction of objects, service administration, service balance, and application layer. A detailed architecture was proposed by CISCO [7] which includes a well-defined aspect of IoT through seven layers such as physical devices, local connectivity, edge computing, data storage level, data abstraction, application, and collaboration and process layers. Although several architectures have been proposed by several researchers, however, none of them considered the security perspective of the IoT system. In this paper, we proposed a security-oriented architecture of the IoT system. The architecture has seven layers with security problems at each layer, having the most vulnerability at network layer in the system. So, in this paper, we covered the security issues with network and prevention mechanisms for the attacks. The main contributions of the paper to the research community are summarized as follows: • To proposed a security framework which concerns about the security problems at each layer. • We focused on distinguish security problems and attacks that may disturb the performance of network layer. • We provide countermeasure techniques to prevent security-related issues at the network layer. The remaining paper can be organized as follows: Related work with existing IoT architectures and security issues covered in Sect. 2. In Sect. 3, we proposed a security framework which includes the security perspective of each layer. Section 4 points out challenges and security issues that exist at the network layer. Section 5 defines the prevention mechanisms which may cure the security attacks at the network layer. Finally, Sect. 6 defines the conclusion with future scope.
2 Related Work Distinguished survey papers have been published which cover different domains on IoT challenges. Al-Fuqaha et al. [2] surveyed the various aspects of IoT in general, compare IoT architectures, growing opportunities, Protocols, and research challenges in IoT. Gronbaek [8] discussed the comparative study of IoT architectural model
IoT Security Framework and Analysis . . .
87
which is just like OSI model of network. Atzori et al. [4] compared several visions of IoT and discussed merits or demerits of several architectures. Authors also focused on distinguish security issues, privacy, and hardware security, etc. Security problems need to be conveyed at different levels in IoT and require a lot of attention by the researchers. Peng et al. [20] proposed a multilayer security model that can enhance the security techniques in IoT. To provide strengthen to the model, authors implemented 3DES encryption algorithm when the system is link with the network. It also ensured that different layers must be authenticated with distinguish permission. Wu et al. [19] suggested that the three-layered architecture cannot define the full features and connotations of the IoT. Authors proposed a new architecture which contain five layers with two new layers such as processing and business layer. Matharu et al. [12] proposed a four-layer general architecture, these are perception, network, middleware, and application layer along with its constituent elements. Although there is no predefined standard architecture proposed by any of the organization, several researches going on to provide a secured and standardized architecture of IoT. There is a scope of research to provide security attacks at the network layer of the model. Here, we try to proposed a new security framework for IoT and cover security attacks that may occur at the most vulnerable layer in the IoT system that is network layer which can easily break by the attackers. We also discuss some countermeasure techniques which may be the solutions of network layer attacks.
3 Proposed Security Framework There are several architectures that have been proposed by authors which may include physical layer to business layer or application layer. It has been observed that there is no standard framework defined by any researcher. So, there is a critical requirement which can provides a standard mechanism to deal with IoT system includes security mechanism of each layer [16]. We propose an architecture (shown in Fig. 1) that can bridge the gap between the integration of two different technologies. The description of framework is as follows:
3.1 Perception Layer The foundational layer of the IoT framework is perception layer which is responsible to obtain the data from real-time environment using sensors. The collected data can be shared using a local network connection which can be shared further for decision making or processing. Perception layer functionalities can be categorized into three parts.
88
N. Gupta and U. Garg
Fig. 1 Proposed secured framework
Perception Node Layer First sub-layer is perception node layer which have transponders, actuators, microcontrollers, chips, and sensors at this point to sense the data with a unique identity (can be provided using RFID tags). Sensors can be used to sense the real-time environment and share it with the controller using short-range communication. The major design issues can occur of this layer are natural disaster, replay of messages, and low-energy issues. Short-Range Communication This sub-layer provides the connection between several nodes with a local server. These nodes can communicate with each other within a network and collect data on a local server. The major objective of the layer is transmission of data using Wi-Fi, Bluetooth, WSN, or Zigbee. There are several hardware devices or components which can provide support at this layer such as switches,
IoT Security Framework and Analysis . . .
89
routers, or local administrative server. Some threats may occur at this sub-layer such as man-in-the-middle attack, interception of information, software vulnerabilities, and weak cryptography. Edge Computing Layer The perception node layer collects ample data which may not be useful for the future processing. The edge computing sub-layer is used to filter the data before transmission on the Internet to reduce the traffic for the faster response. This sub-layer collects the data on a local server and abstract the data for filtration and aggregation. several components such as edge server, firewall, decision system, or application can be used at this sub-layer. The main design issues are like modification of information, logging, or malfunctioning in server.
3.2 Network or Internet Layer The major responsibilities of this layer is to provide route of the packets, IP-based communication, and reliable delivery of packets. This layer collect the data received from distinguish nodes and routed the data through gateway to the other nodes in the network form communication. The abstracted data received from the edge computing layer can be shared to the remote system through the Internet. The Internet is having a billions of connected devices using TCP/IP Internet protocol which is having a wide variety of vulnerabilities such as IP address spoofing, route spoofing, traffic analysis attack, false routing, etc. Some design issues may be associated with this layer such as session hijacking, DDoS attack, network reconnaissance, eavesdropping, and communication protocol hijacking. Here, we will discuss several security issues and countermeasures (Secure Route selection, Secure Trust Management, End-to-End Encryption, Digital Certificate, and Modified hop count) which may apply at this layer.
3.3 MiddleWare Layer The IoT devices are generate a great amount of data in every second by using sensors and puts a tremendous strain on the Internet infrastructure. And the IoT devices concur with low-powered sensors, less storage capability, less battery power, and several network limitations. Therefore there is a requirement of more storage, computation, and analysis of IoT data that can handle the heterogeneity of data and devices. The main functionality of this layer is divided into two categories. Data Storage or Analytics Layer An IoT system must provide some essential services like data storage, processing, integration, and prepare for the application layer. Data received from the lower level can be stored at a remote place so that it can be accessed from anywhere. Although there are several benefits of the database and back-end servers to store the data remotely. However, it is most vulnerable part of IoT
90
N. Gupta and U. Garg
system due to its accessibility using user credentials only, there are some design issues at this sub-layer such as privacy, alteration of information, data-sensitive leakage, and information gathering. Data Abstraction Layer There may be a case of cloud where the large amount of data collected by sensors present at the data accumulation layer. So, if the data is move toward the database without filtering, it may generate huge strain on the data generation process. The major functionality of this sub-layer is used to analytic of data which is processed based on the network traffic and user priority. Several design issues may exist at this layer like sensitive information leakage, software vulnerabilities, and redundant data.
3.4 IoT Service and Application Layer Application provisioning layer is the front-end for the IoT applications those can retrieve the data from cloud using user’s credentials. This layer provides some kind of application software, Web sites, or virtual tools with the end-user to deal with the IoT system. It is responsible to manage IoT applications at the user-end and for secure communication. There are several design issues may occur at this layer such as software bugs, weak authentication, configuration errors, or third-party failure.
4 Security Issues at Network Layer Although all layers in the IoT architecture have its own merits or demerits, however, the network layer is the highest vulnerable layer in the IoT architecture. Therefore, in this section, we will discuss several security threats may occur at the network layer (shown in Fig. 2).
4.1 Reconnaissance Attacks Some of the attacks are used to gather the important information done by threat actors. An attacker needs several information about the target device before attack such as basic information about the target, network information, and open-port information. Firstly, the attackers collect some initial information of the target devices. After collecting the basic information, attackers require the network information of the target device. To obtain the network information, it initiate the ping sweep and find out which IP addresses are active. There are several open-source tools are available which can provide the port scanning and vulnerability scanning such as superscan, Nmap, Nipper, Saint, and OpenVAS.
IoT Security Framework and Analysis . . .
91
Fig. 2 Classification of attacks at network layer
Traffic Analysis Attack: One of the major reconnaissance attack is the traffic analysis attack. To gather significant information like network flows or the payload of the packets can be captured and analyzed through the traffic analysis. The traffic analysis functionality is divided into two parts: sniffer and analyzer. The sniffer can capture a copy of transmitted packets, and the analyzer decode and analyze the packet. The main motive of these attack is to exploit the confidentiality of information. There are several traffic analyzer software packages available like Tcpdump, Wireshark, and Scapy.
4.2 Access Attacks These kind of attacks can exploit the recognized vulnerabilities in validation services such as FTP service and web services. The main objective of these attacks is to access confidential database, change accounts accessibility, and sensitive informations. Attackers use access attacks on the network devices to retrieve data, gain access, or to access privilege to the administrative status. Access attacks can be categorized into two types of attacks as follows: Password Attacks In this kind of attacks, the attacker attempts to find out the critical system password using different methods or password cracking tools. Sinkhole attack is [18] one of the most destructive attacks which is used to collapse the network communication. By applying the power analysis attack, some IoT devices can be physically captured and the sensitive information can be extracted by an attacker. After that attacker installs a malicious script in the device to launch a sinkhole attack and expand these malicious nodes in the destination area. The main motive of these attacks is to leak, lost, or delay the confidential information and a step ahead to launch additional attacks.
92
N. Gupta and U. Garg
Spoofing Attacks In spoofing attacks, attacker device attempts to pose as another device by falsifying its address as the legitimate user device. In the IoT system, an attacker can perform MAC spoofing, IP spoofing, RFID spoofing, and DHCP spoofing. Man-in-the-Middle attack: This is also one of the most dangerous attacks that happen in the IoT system. In this attack, an adversary can intercept a communication between two-party through fake ID and uses that information to steal the login credentials, spy, corrupt data, or personal information. In the MITM attack, the adversary sitting between the connection of two parties and either modify or observe the traffic. The MITM attack can violate the several security parameters such as integrity, confidentiality, or privacy. By relying on the communication protocols, these attacks can be sent in the IoT network. False Routing or Alteration: In the false routing attack, an unauthorized node sends a data packet to the wrong destination. This can be achieved either by updating the destination address or just forwarding a data packets to the wrong next node in the route. As a results, several virtual nodes may exist in a route discovery phase. The traffic is diverted to the other routes which may appear short but it is malicious route. Sybil Attack: In the sybil attack, the malicious IoT device can create multiple fake identities which may be used to mislead other nodes in the network. The data sent by the sybil node in the network can easily acceptable by other nodes due to its legitimate identities. In this attack, all kind of network traffic must pass through the sybil node only due to that jamming or DoS attacks can be performed. Sybil attack can access the full network without using the physical device.
4.3 Denial-of-Service Attacks Most of the IoT devices are constructed from cheap generic hardware components. These hardware components having several security vulnerabilities due to some reasons like always-on policy of IoT devices or quick product launch. DoS attack happens when the targeted servers or devices denies to serve any request generated by the clients because of enormous alluvion of data in the communication bandwidth. If the service violation done by distributed in nature, then it is known as distributed DoS attack. Mirai, Wirex, Reaper, and Torii are the well-known IoT botnet attacks done through the DDoS attack.
5 Prevention Mechanisms The major responsibility of the network layer in the architecture of IoT is to provide an interface between the perception layer and the cloud layer. It is most vulnerable layer due to its global accessibility. If there is no certain security mechanism applied at the
IoT Security Framework and Analysis . . .
93
network layer, it may lead to the information lost, unauthenticated access, leakage of data, or lack of confidentiality and authentication. So, there is a requirement to apply some prevention mechanisms for the secured data transmission among the layers of IoT and provide access to the legitimate users. There are several prevention mechanisms can be applied at this layer for secured transmission.
5.1 End-to-End Encryption End-to-End encryption of information through a key management [15] policy was used in the traditional security to guarantee the information security. To send the encrypted data over the Internet, a transportation layer security (TLS) is provided a secure communication channel paired with AES encryption technique. Packet encryption including the packet header and routing information can provide the confidentiality and prevent data leakage. Key management is the core of the security which can be present centralized or distributed manner. This can provide full end-to-end security and alleviating the security burden of hardware manufacturers.
5.2 Digital Certificate To prevent the man-in-the-middle attack, a strong encryption method is required between the client and server. A client’s request can be authenticated by the server using a valid digital certificate [6]. To do so, IoT manufacturers should consider identity and authentication factors during the manufacturing of the IoT devices. Although digital certificate provides a secure device, it is very difficult to manage billions of devices with issuance, revocation, and lifecycle of the certificates. A solution to this problem is to assign a GlobalSign for the high volume deployment to save the manufacturer’s time and money.
5.3 Modified Hop Count A packet traversal algorithm is generally making a count the number of hops during the transmission of information. To do so, the procedure is first map the IP along with hop count and store it into a table called IP2HPC. Once the packet received, it matched with the table entry for hop count. If the hop count match with the stored value, the packet is coming from the legitimate user; otherwise, it can discarded. Hop count filtering (HCF) algorithm [14] designed to deployment at the end host and entry point of the organization. It can protect the traffic pass through the periphery router and then firewall for security.
94
N. Gupta and U. Garg
5.4 Secure Route Selection Process Secure route selection protocol [1] is the first step for the secure route establishment. It must include self-stabilization feature which is able to recover automatically without human intervention. A malicious node can be identified using the secured routing protocol to isolate it from the network. Lightweight computation is the primary objective of the secure routing protocol due to energy constraints of IoT devices. A crucial requirement of the secure routing protocol is the location privacy. It can provide confidentiality and integrity for the IoT routing.
5.5 Secure Authentication and Identification Mechanism Secure data transmission and authenticated access permissions are two major factors for the IoT applications designed by manufacturers. Trust-Extended Authentication Mechanism (TEAM) [5] satisfies the location privacy, mutual authentication, MITM attacks resistance, and identification requirements for IoT devices. The TEAM project involves eight schemes such as initial registration, authentication, trust authentication, key revocation, secure communication, modification of password, login, and key updation. It offers a lightweight authentication scheme known as TEAM to validate the user and prevent from the malicious attack.
6 Conclusion In this paper, several security requirements of IoT discussed in the network layer. This paper also concentrates several architecture proposed by authors with certain limitations. So, to provide a well-organized and secured behavior of IoT devices, we proposed a security framework which is able to deal with several design issues and components at each level of data movement. We focused also reconnaissance attack in detail along with two other major threats such as DoS and access attack. Some countermeasure or prevention mechanisms are also discussed to provide secured communication at the network layer of IoT.
References 1. Airehrour, D., Gutierrez, J.: An analysis of secure MANET routing features to maintain confidentiality and integrity in iot routing. In: International Conference on Information Resources Management (CONF-IRM) (2015) 2. Al-Fuqaha, A., Guizani, M., Mohammadi, M., Aledhari, M., Ayyash, M.: Internet of Things: a survey on enabling technologies, protocols, and applications. IEEE Communi. Surv. Tutor. 17(4), 2347–2376 (2015) 3. Ashton, K., et al.: That ‘Internet of Things’ thing. RFID J. 22(7), 97–114 (2009)
IoT Security Framework and Analysis . . .
95
4. Atzori, L., Iera, A., Morabito, G.: A survey. The Internet of Things. Comput. Netw. (2010) 5. Chuang, M.-C., Lee, J.-F.: Team: trust-extended authentication mechanism for vehicular ad hoc networks. IEEE Syst. J. 8(3), 749–758 (2013) 6. Forsby, F., Furuhed, M., Papadimitratos, P., Raza, S.: Lightweight x. 509 digital certificates for the internet of things. In: Interoperability, Safety and Security in IoT, pp. 123–133. Springer (2017) 7. Green, J.: The Internet of Things reference model. In: Internet of Things World Forum, pp. 1–12 (2014) 8. Grønbæk, I.: Architecture for the Internet of Things (IoT): API and interconnect. In: 2008 Second International Conference on Sensor Technologies and Applications (sensorcomm 2008), pp. 802–807. IEEE (2008) 9. Gubbi, J., Buyya, R., Marusic, S., Palaniswami, M.: Internet of Things (IotT): a vision, architectural elements, and future directions. Future Gener. Comput. Syst. 29(7), 1645–1660 (2013) 10. Khoueiry, B.W., Soleymani, M.R.: A novel machine-to-machine communication strategy using rateless coding for the internet of things. IEEE Internet of Things J. 3(6), 937–950 (2016) 11. Mao, Y., You, C., Zhang, J., Huang, K., Letaief, K.B.: A survey on mobile edge computing: the communication perspective. IEEE Commun. Surv. Tutor. 19(4):2322–2358 (2017) 12. Matharu, G.S., Upadhyay, P., Chaudhary, L.: The Internet of Things: challenges & security issues. In: 2014 International Conference on Emerging Technologies (ICET), pp. 54–59. IEEE (2014) 13. Mosenia, A., Jha, N.K.: A comprehensive study of security of Internet-of-Things. IEEE Tran. Emerging Top. Comput. 5(4), 586–602 (2016) 14. Mukaddam, A., Elhajj, I., Kayssi, A., Chehab, A.: IP spoofing detection using modified hop count. In: 2014 IEEE 28th International Conference on Advanced Information Networking and Applications, pp. 512–516. IEEE (2014) 15. Peng, S., Shen, H.: Security technology analysis of IoT. In: Internet of Things, pp. 401–408. Springer (2012) 16. Poonia, A., Banerjee, D., Banerjee, A., Sharma, S.: Security Issues in Internet of Things (IoT)Enabled Systems: Problem and Prospects, pp. 1419–1423 (2020) 17. Tsai, C.-W., Lai, C.-F., Chiang, M.-C., Yang, L.T.: Data mining for Internet of Things: a survey. IEEE Commun. Surv. Tutor. 16(1), 77–97 (2013) 18. Wang, D., Ming, J., Chen, T., Zhang, X., Wang, C.: Cracking IoT device user account via bruteforce attack to SMS authentication code. In: Proceedings of the First Workshop on Radical and Experiential Security, pp. 57–60 (2018) 19. Wu, M., Lu, T.-J., Ling, F.-Y., Sun, J., Du, H.-Y.: Research on the architecture of internet of things. In: 2010 3rd International Conference on Advanced Computer Theory and Engineering (ICACTE), vol. 5, pp. V5–484. IEEE (2010) 20. Yang, X., Li, Z., Geng, Z., Zhang, H.: A multi-layer security model for Internet of Things. In: Internet of Things, pp. 388–393. Springer (2012)
Cloud Data Storage Security: The Challenges and a Countermeasure Kamlesh Chandra Purohit, Mahesh Manchanda, and Anuj Singh
Abstract Cloud computing for utility-based services has emerged rapidly in recent years. Characteristics of cloud computing like on-demand network access, ubiquitous access to a shared pool of computing resources, pay as you go model, scalability gave popularity over traditional computing technologies. Now, by moving to the cloud environment, organizations can focus on quality of services by saving cost and time in infrastructure setup. Although the features of cloud computing empower the users, but few features like scalability, multitenancy leads threat to confidentiality and integrity. As a user of cloud does not have administrative control over physical resources and client data resides on vendor’s premises, threat to confidentiality, integrity, and privacy may lead to customers to think before switching to a cloud environment. In this paper, we investigate flexible distributed storage integrity auditing mechanism, utilizing the homomorphic token and distributed erasure-coded data. Keywords Cloud computing · Utility-based services · Ubiquitous access · Confidentiality · Integrity
1 Introduction 1.1 A Subsection Sample The first cloud came into existence and became possible due to several innovative technologies in infrastructure like the emergence of broadband, hardware platforms and virtualization. Cloud computing in terms of National Institute of Standards and technology is a model for enabling convenient, on-demand network access to a shared pool of computing resources that can be rapidly provisioned and released with minimal management effort or service provider interaction [1]. Cloud data K. C. Purohit (B) · A. Singh Graphic Era (Deemed To Be University), Dehradun, Uttarakhand, India M. Manchanda Graphic Era Hill University, Dehradun, Uttarakhand, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_10
97
98
K. C. Purohit et al.
storage service is popular among large enterprises who want to backup their data and as their storage requirement is not previously known cloud storage service best suites their requirement. Data storage and maintenance is a tedious task; when data is stored in the cloud, it is the cloud service provider’s responsibility to maintain the data. By saving time and money from data storage and maintenance, organizations can focus on new constructive work. Although there are lots of benefits of cloud computing, many organizations are hesitating to switch to cloud environment because of security concerns associated with shared infrastructure. A survey came from Fujitsu in 2010 tells that issue of security is a top most concern among users. In public cloud achieving network security is very difficult as a problem is that multiple cloud users share the same physical resources, which prevents individual user to deploy their own security policies. Although private cloud is a solution for security as the client is itself owner of premises and resources are managed internally or by trusted third party, private clouds lack of scalability, and for an organization where requirements vary frequently, private cloud is not economical solution, so public or hybrid cloud is the solution. Again, here cloud services are provided through the Internet, the data in transit is prone to various kinds of network-level attacks like eavesdropping and session hijacking. In the cloud environment, data can be accessible to cloud provider or to any employee of the service provider. Storage as a service provider may outsource infrastructure from infrastructure as a service provider, and this way owner or employee of infrastructure as a service can access the data. As searching, sorting, finding patterns using data are responsibility of storage service provider, encryption is not a solution for data security and therefore finding a solution for storage security in cloud environment is very difficult. This paper describes various security issues in data storage in a cloud environment. This paper is organized as follows—Sect. 2 describes a threat model for the cloud environment by classifying attackers and attacks that they can generate. Section 3 describes security issues in cloud data storage. Section 4 investigates distributed storage integrity auditing mechanism, and in Sect. 5, we propose solutions for improving third-party auditing of data storage security. Section 6 gives a conclusion derived from this survey.
2 System Model In this section, we will describe the entire ecosystem including data owners, cloud service provider, storage servers, and auditor. The last servers are prone to attack from external or internal enemies and sometimes an external or internal enemy together can lead to an attack process. The data owner extracts data, processes data in the cloud, and can access, modify, or delete managed data. The storage server stores data and provides access to modifying or deleting information. Storage servers are remote, and data is stored on many servers for security reasons. There may be cases
Cloud Data Storage Security: The Challenges and a Countermeasure
99
where the service provider itself is the attacker in the case who is the third-party auditor responsible for the final audit.
3 Threat Model An organization using cloud services faces different security challenges than traditional IT solutions. Security in cloud environment can be seen either in cloud service consumer’s point of view or service provider’s point of view. For consumer, security is confidentiality, integrity, and availability of data that is stored in the service provider’s premises while the meaning of security for cloud service providers is to protect its infrastructure from internal and external attacks. These attacks can be network-based and virtualization-based attacks. On the basis of origin of the attack, the enemy can be categorized as internal enemy and external enemy. (A)
Internal enemy: When a consumer of cloud services that is consumer behaves in such a way that like by running malicious code which leads to harm, physical resources of a service provider like processors, storage devices, and also networking interfaces, we call it internal enemy.
Internal attackers have access to virtualized resources so attacker can damage the infrastructure by executing some malicious code or script. Some of the security issues origination through malicious VM are as follows. (1)
(2)
(3)
(4)
(5)
ARP spoofing: Sometimes ARP spoofing is also known as ARP poisoning. An attacker can send spoofed address with the intention of changing the destination of traffic to attacker’s virtual machine instead of authenticated receiving. Traffic analysis: Host machine communicates with virtual machine through virtual resources. Attacker monitors network traffic coming from the virtual network at the virtual network interface present at virtual machine. This attack is a threat to confidentiality [3]. VM Escape: Virtual machines are difficult to secure because there exist certain vulnerabilities in virtualization software. VM Escape attacks are designed to exploit a hypervisor. If a virtual machine is improperly configured, virtual machine could allow code to completely bypass the virtual environment and obtain full root access to the physical host. VM Escapes attack other virtual machines that reside within the same host machine. VM hopping: In this kind of attack, attacker virtual machine gains access to the victims virtual machine and now attacker can read and write access to the storage space of the victim’s machine. So this attack is a threat for all three security goals, i.e., confidentiality, integrity, and availability. For this kind of attack, an attacker needs to know the IP address of victims virtual machine. VM rootkits: These hypervisor-aware rootkits gain access to a virtual machine by starting a thin hypervisor and virtualize the rest of machine under it. In this way, VM rootkits can gain control over virtual machine and host applications.
100
K. C. Purohit et al.
(B)
External enemy: They are not actually part of cloud infrastructure, they can be anywhere; as cloud services are provided through the Internet, any Internet user can be attacker for infrastructure. External enemy mainly targets at the third pillar of security goals that is availability. An attacker may launch distributed denial-of-service (DDoS) attacks. In such attacks, the goal of an enemy is to make resources unavailable to intended consumers of resources. Bots are used to bombard as many request to the service provider so that processors of service provider get exhausted. DDoS attacks: This attack is threat to availability of cloud services. In this attack, attacker bombards high-quantity IP packets at specific network element so that corresponding hardware goes in out-of-service situation resulting failure of communication channel. Stopping DDoS attack is difficult as network countermeasures cannot distinguish good traffic from bad traffic [3]. TCP session hijacking: In this attack, attacker steals session between legitimate client and cloud server. To hijack a session, attacker sniffs network traffic and obtains TCP session information. Session fixation, session side jacking, and cross-site scripting are a few common techniques used by computer criminals to hijack a session [5].
(1)
(2)
In this paper, we are not focusing on how to avoid these attacks as a lot of work has already been done on developing attack prevention techniques. We have explained these attacking techniques here to give a better understanding of system’s threat environment. In paper [6], these techniques are compared. Attackers are becoming stronger we cannot rely on prevention technique. So here our main objective is reviewing existing techniques, identifying there shortcoming and what improvement we can make in them. A flexible distributed storage integrity auditing mechanism has been studied,[7] using an utilizing the homomorphic token and distributed erasurecoded data. Ensuring the integrity of data storage in cloud computing has been carried out in [8] using a third-party auditor. The study in [9] is also based on public auditability for cloud storage so that reliability could be ensured to the user using third-party auditing.
4 Existing System Cloud conservation research program consists of three multiagency clients, a cloud server, and a third-party auditor. The client owns the data to be stored in the cloud, the cloud server provides data storage services, and the third-party auditor is an approved system for checking the integrity of the data. Data flow between any business cluster occurs in an embedded way. We know that the system is prone to internal and external attacks; other issues such as hardware failures, software bugs, and network connectivity may also affect the system. We believe that, a third-party research program, proposed by most researcher
Cloud Data Storage Security: The Challenges and a Countermeasure
101
with some modifications, can make the entire cloud storage environment more reliable and secure. A survey related to security requirements, threats, vulnerabilities, and countermeasures used for cloud security has been carried out in study [14] for end-to-end mapping. A third-party auditor to record the data property using twodimensional data structure for dynamic auditing has been proposed in one more study [15]. A mechanism that integrates data repudiation with dynamic data operations to attain privacy preservative using public auditing for cloud storage security has been proposed in [16] also. Ensuring the integrity of data in fog-to-cloud-based IoT scenario using public auditing for cloud storage has been proposed in [17, 18]. All these studies are based on either public auditing or third-party auditing which could provide reliability to the end user of cloud.
5 Security Issues in Storage Service In cloud environment, data crosses boundary of owner’s premises at service provider’s end. So additional security features are expected from service provider to protect data from external or internal enemies. This requires strong encryption algorithms, powerful access control schemes, and independent auditing schemes. While implementing cloud storage strategies, the following issues must be carefully considered. Data locality. Data integrity. Data segregation. Data access. Authentication and authorization.
6 Related Work (A)
Present scenario
In field of network security, a lot of research has been done, proposed, or deployed. Therefore, secured network architecture keeps external enemy away from client’s data but what if internal enemy like employees of service provider or service provider is itself an attacker. So checking integrity, privacy, and correctness of third-party auditor was introduced by some researchers but what if the third-party auditor itself becomes an attacker. Table 1 gives a brief description of the research work done by the researchers in the field of data storage security in cloud environment. The table includes the issues identified, their brief description and drawbacks identified in them.
102
K. C. Purohit et al.
Table. 1 Brief description of existing most relevant schemes Method proposed
Problem addressed
Publicly auditable secure cloud data storage [10]
To gain trust on TPA can get access to service provider client’s private data introduced third party for auditing
Limitations
Performance analysis Supports public auditability
Secure and dependable Data availability storage services [11] against byzantine failure Fast localization of error
TPA together with Support data dynamic storage server provider operation may fool data owner for storage correctness
Public auditability and Blockless data dynamics for verification, dynamic storage security [12] data operation
May leak data content to auditor
Privacy preserving public auditing [13]
Heavy storage Supports batch overhead on the server auditing
(B) (1)
(2)
Privacy of client’s data against third-party auditor
Support data dynamic operation
Research gap identified Existing scheme fully relies on third-party auditor, while there may be a situation when TPA may support an attacker by sending correct token to the client even if the data is not in correct form in server because in the scheme for every data block, user sends correct token set in file distribution phase. Proposed scheme stores data block in meaningful manner; therefore, if an attacker becomes successful in snooping security keys, which is possible as even a lot of advancement in network security techniques has been done yet network cannot be assumed safe, privacy of data blocks can be violated.
7 Proposed Work Based on the literature survey, we analyze that there is an immediate need of improvement in the file distribution phase. We propose an algorithm in which a client would never trust fully on the service provider or the third-party auditor. Furthermore, the file distribution would be in such a way that data blocks become meaningless before going to the cloud environment and do not deliver any message individually. The key by which the original data is to be recovered must be kept in a machine which is isolated from the Internet in client’s premises. File Distribution: Let F be the private file, which is to be uploaded in the cloud environment. For this, we want to store the bytes of this file F in nine components, f1 , f2 , f3 ………… f9 . Number of components should be an optimum value as there is a trade-off between storage overhead and security, and we assume this value to be 9 here. (Fig. 1).
Cloud Data Storage Security: The Challenges and a Countermeasure
103
rth byte of File F
.. .. .. .. . 1
2 3 4 5………. r……………………………… FLength
Fig. 1 Display of file F, which will be uploaded to the cloud environment
Fig. 2 Display of the key K
j 1
2 3 …. i…………….9
To define the selection of the component for a particular byte of the file F, we first define a key K as follows: Key ‘K’ is an array of 9 digits. We represent the ith digit of key K as K[i]. Where K[i] = j (1 ≤ j ≤ 9) (Fig. 2). Now, we define the selection procedure of the File component for storing a particular byte. rth byte of file F will be stored in f j component, where j is obtained as: i = r %9 j = K [i] In this way, file F is divided into nine small files which individually possess no useful meaning. Therefore, recovery of file F with components without key K is difficult for the attacker. Furthermore, we show the storage of the bytes of file F in a component f j . rth byte of the file F is the qth byte of the file component f j , where relation between r and q is (Fig. 3): r = i + 9∗ q Client keeps a database which stores a token for each file component f j . Token has structure like Domain_File_randomnumber, ‘Domain’ categorizes main file, ‘File’ is the name of the main file, each component of the file is identified by a random number, which is used by third-party auditor to fetch the data component. Fig. 3 Display of the component file fj
f jr 1
.. ..
2 ..………...q………………………….
104
K. C. Purohit et al.
Now we can use erasure correcting code to tolerate multiple failures as described in [7] for each file component f j ( j ∈ {1, 2 . . . 9}). Encrypt each component f j ( j ∈ {1, 2 . . . 9}) by key k, and calculate hash function for each encrypted (f j )k as h[(f j )k ] = xi. Now encrypt (fi)k with a shared key ‘s’ in between client and TPA and send it to TPA along with random number. Third-party auditing scheme. Third-party auditor keeps a database with client’s identity, random number, and xi. TPA fetches file components by client’s identity and random number, calculates hash function. Fetched component is in correct form if the outcome of hash function matches the corresponding value stored in the database, otherwise the data is assumed to be corrupted and a warning message need to be sent to the client.
8 Data Obfuscation Algorithm Step 1. Step 2. Step 3. Step 4. Step 5.
Let F be a private file, which will be uploaded to the cloud environment. (We can see the F file as a set of bytes from 0 to F length ) Select Key K, a 10-digit list from 0 to 9 without duplication digits. Launch 10 files f 0, f 1, f 2, f 3, f 9. We call them, ‘file segments’ For each F file [i] of the F file where i = 0 to Flength, calculating j = i% 10. For each j, find K [j] and insert the F [i] Byte into FK form [j].
9 Conclusion It is essential to hide meaning of data effectively at the clients end to ensure cloud storage security. The method we proposed introduces a barrier for accessing data by the third-party auditor. Use of random number to access file components hides relation among file components. In this way, we can make cloud storage service more reliable to clients.
References 1. Yandong, Z., Yongsheng, Z.: “Cloud computing and cloud security challenges”. In International Symposium on Information Technologies in Medicine and Education, vol. 2, pp. 1084–1088. IEEE, 2012. 2. Zhu, Y., Ma, D., Hu, C. J., Huang, D.: How to use attribute-based encryption to implement role-based access control in the cloud. In Proceedings of the 2013 international workshop on Security in cloud computing pp. 33–40, (2013) 3. Danzi’s, G.: “Introducing traffic analysis, attacks, defenses and public policy issues.” In: Proceedings of the Santa’s Crypto Get-together (2005)
Cloud Data Storage Security: The Challenges and a Countermeasure
105
4. Institute, I.S.: “DDoS attack categorization.” University of Southern California 5. Lam, L.K., Smith, B.: “Theft on the web: prevent session hijacking.” In: TechNet Magazine: Microsoft (2005) 6. Wang, C., Ren, K., Lou, W., Li, J.: Towards publicly auditable secure cloud data storage services. IEEE Netw. Mag. 24(4), 19–24 (2010) 7. Wang, C., Wang, Q., Ren, K.: “Toward secure and dependable storage services in cloud computing.” IEEE Trans. Serv. Comput. 5(2), April June (2012) 8. Wang, Q., Wang, C., Ren, K., Lou, W., Li, J.: Enabling public auditability and data dynamics for storage security in cloud computing. IEEE Trans. Parallel and Distrib. Syst. 22(5), 847–859 (2011) 9. Wang, C., Chow, S.S. M., Wang, Q., Ren, K., Lou, W.: “Privacy-preserving public auditing for secure cloud storage.” IEEE Trans. Comput., preprint, doi: https://doi.org/10.1109/TC.201 1.245 (2012) 10. Wang, C., Ren, K., Lou, W., Li, L.: Toward publicly auditable secure cloud data storage services. IEEE Netw. 24(4), 19–24 (2010) 11. Qian, W., et al.: Enabling public auditability and data dynamics for storage security in cloud computing. IEEE Trans. Parallel Distrib. Syst. 22(5), 847–859 (2011) 12. Wang, C., Wang, Q., Ren, K.: “Toward secure and dependable storage services in cloud computing.” IEEE Trans. Serv. Comput. 5(2), April–June (2012) 13. Yang, K., Jia, X.: An efficient and secure dynamic auditing protocol for data storage in cloud computing. IEEE Trans. Parallel Distrib. Syst. 24(9), 1717–1726 (2013) 14. Kumar, R., Goyal, R.: On cloud security requirements, threats, vulnerabilities and countermeasures: a survey. Comput. Sci. Rev. 33, 1–48 (2019) 15. Tian, H., Chen, Y., Chang, C.C., Jiang, H., Huang, Y., Chen, Y., Liu, J.: Dynamic-hash-table based public auditing for secure cloud storage. IEEE Trans. Serv. Comput. 10(5), 701–714 (2015) 16. Wu, Y., Jiang, Z.L., Wang, X., Yiu, S.M., Zhang, P.: “Dynamic data operations with deduplication in privacy-preserving public auditing for secure cloud storage”. In IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC), vol. 1, pp. 562–567. IEEE (2017) 17. Tian, H., Nan, F., Chang, C.C., Huang, Y., Lu, J., Du, Y.: Privacy-preserving public auditing for secure data storage in fog-to-cloud computing. J. Netw. Comput. Appl. 127, 59–69 (2019) 18. Mishra, N., Sharma, T.K., Sharma, V., Vimal, V.: “Secure framework for data security in cloud computing.” “Soft computing: theories and applications”, J. Adv. Intell. Syst. Comput. pp. 61–71 (2017)
Comparative Analysis of Numerous Approaches in Machine Learning to Predict Financial Fraud in Big Data Framework Amit Gupta and M. C. Lohani
Abstract Nowadays, financial frauds that are taking place across the globe generate much more threats and thus have a thoughtful impact on the financial subdivision. Due to this fact, financial institution is forced to improve their fraud detection mechanism so that these various kinds of financial frauds can be detected in early stages. The various studies done in past few years show that the usage of ML and big data analytics has improved the efficiency of these methodologies. This paper basically proposed a state of art on different kinds of financial frauds and specify various financial frauds, detection and prevention techniques used in financial frauds. The key purpose of the work done is to discuss in detail various fraud detection methodologies and technologies, their comparison and performance efficiency. It provides a complete comparison of all techniques based on machine learning that are used in detecting and preventing the different kinds of financial frauds. It also delivers a complete study of all the methods which were used in recent past with their efficiency and capability for the detecting various kinds of frauds in financial sector. Keywords FFD · Big data · Big data analysis · Machine learning · Deep learning · CNN · KNN · Random forest
1 Introduction With the trends in the latest technologies and easiest accessibility to these technologies, the data which is digitally generated is increasing exponentially and that too at a very high speed. This data is just not very huge or large in size but also composed of different formats like structured, unstructured and semi-structured and can be incomplete in nature as it may have some missing or uncertain values. The type of data which can be categorized into 5 V concept that is volume, variety, velocity, veracity and value is defined as big data. Various domains which normally generate big data are social networking sites, shopping sites, sensor-related data, banking data, A. Gupta (B) · M. C. Lohani Department of CSE, Graphic Era Hill University, Uttarakhand, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_11
107
108
A. Gupta and M. C. Lohani
etc. Unless this data is been analysed efficiently and quite effectively, this huge data cannot give good-quality results. Thus, big data analytics techniques are required to analyse these big data sets to explore some meaningful information so that it can be used in various application domains like business intelligence or enhancing various other tasks. ML is a concept under the umbrella of AI which provides enhancing and learning capability to any system. It has the capability to learn and perform better from its previous experiences or results. Unlike conventional way of programming, it basically learns from the past results and examples and then it generates a code snippet to perform any specified task. Nowadays, machine learning is being used in almost every area like recommendation systems, friend suggestion in Facebook, Gmail spam filtering, etc., and provides greater enhancement in the identification of any kind of financial fraud. The rest of the paper has following organization. Section 2 describes the financial frauds and their types with the ML techniques for fraud detection. Section 3 represents the various methods and results proposed by many researchers. Section 4 consists of the analysis and discussion for the selection of best method for fraud detection. Finally, Sect. 5 specifies the conclusion and some future scope related to the financial fraud detection.
2 Financial Frauds and ML Techniques for Fraud Detection Generally, there are several types of frauds but in this paper following types of financial frauds are discussed: • Tax fraud: This is the type of financial fraud in which non-paid taxes are been detected. The unsupervised anomaly detection approach is mostly used in the detection of these types of frauds. • Accounting fraud: This is the most popular type of financial fraud which needs to be detected and prevented on a timely basis. If not detected in proper way, then it can adhere huge loss to the stakeholders of any financial institution. • Bank fraud or credit card fraud: Fraud in the banking system, mainly various kinds of frauds done through credit card, is the most frequent occurred form of fraud in which credit card is used without the knowledge of authorized person. • Insurance fraud: Insurance fraud is mostly done by any person in the terms of eligibility, claims, age and hiding health conditions. • Financial statement fraud: Commonly these are also known as corporate fraud in which the official document of the company has some conflict values and statements so that the company can achieve a lot of profit. • Other related financial frauds: Nowadays, online transaction frauds are also occurred at a very high rate as most of the users used online payments and bank transactions.
Comparative Analysis of Numerous Approaches in Machine Learning …
109
Fig. 1 Various ML techniques
The past studies show that the usage of data mining techniques and machine learning concepts helps the financial institutions in the detection of these kinds of frauds done in financial sector. Generally, ML techniques can be classified as: SL, UL and RL (Fig. 1).
2.1 Supervised Machine Learning Supervised ML is the collection of different algorithms that creates any function which is used to map various input values to different desired outputs. Simplest example of supervised learning process is any kind of classification problem. In this, learning system is thus obligatory to acquire or create any method that is supposed to associate a vector into one of numerous classes just by watching at numerous input–output instances of the function. Training data sets consists of a various sets of training examples. Model is trained on labelled data sets. Supervised machine learning techniques can be categorized into classification and regression. Classification: They have defined labels. In classification problems, the output has defined labels that is discrete vales can be either binary classification (0,1) or multiclass classification (multiple values). It always predicts actual values. Algorithms used are: linear classifier, K-nearest neighbour, SVM, decision trees [34] and random forest.
110
A. Gupta and M. C. Lohani
Regression: They do not have any defined labels. In this, the output has continuous value. In this, a value is predicted which is much closer to the actual value. Algorithm used are: linear regression, logistic regression and polynomial regression.
2.2 Unsupervised Machine Learning It is the type of technique in which no labels are provided to the learning algorithm. Unsupervised learning specifies or discovers hidden patterns within the data. It is just like a learning without a teacher. Unsupervised learning techniques can categorize into following types. Clustering: Clustering is used to specify or to find a pattern or structure within a group of various unstructured or uncategorized form of data. It is mainly processing of the data and tries to construct natural clusters (groups) if they are present in the data. There are different types of clustering such as independent component analysis, K-means clustering, K-nearest neighbours (KNN), principal component analysis (PCA), hierarchical clustering and singular value decomposition (SVD). Association: Association is a technique that allows to create association rules with the help of which you can establish associations among various kinds of objects inside huge databases or data sets. It is mainly the unearthing exciting associations among different variables in huge databases. Algorithm and techniques that are commonly used in unsupervised machine learning are K-nearest neighbours (KNN), K-means, principal component analysis, independent component analysis, distribution models, hierarchical clustering, mixture models, dimensionality reduction, singular value decomposition, deep learning/neural networks [35] and fuzzy system [36].
2.3 Semi-Supervised Learning Supervised learning can be handled efficiently by data scientists which serves to be costly when it comes dealing with large volume of data. Unsupervised learning is the application spectrum which is limited. To overcome the limitation of both supervised and unsupervised learning, the concept of the semi-supervised learning was developed. In this, the algorithm which is used is trained on the collection of labelled and unlabelled data. It contains very less degree of data which resides under labelled data and huge amount data under unlabelled category. Firstly, cluster the similar types of data using unsupervised Learning technique and then use existing labelled data to labelled the unlabelled data.
Comparative Analysis of Numerous Approaches in Machine Learning …
111
Table. 1 Different algorithms used to detect different types of fraud S. no.
Fraud type
Machine learning algorithm
1
Vat fraud
Anomaly detection Technique
2
Accounting fraud
Ensemble machine learning
3
Bank fraud
ANN and harmony search algorithm
4
Credit card fraud
Optimized light gradient boosting machine (OLightGBM), random forest algorithm, supervised machine learning (stacking classifier), AdaBoost, K-nearest neighbour technique, Naïve Bayes, local outlier factor, isolation forest, logistic regression, decision tree, decision tree Bayes minimum risk threshold (BMR), logistic regression Bayes minimum risk threshold (BMR), random forest Bayes minimum risk threshold (BMR), cost-sensitive decision tree (CSDT ), cost-sensitive logistic regression (CSLR), SVM, neural network Convolutional neural network (CNN), balanced bagging ensemble (BBE), Gaussian Naïve Bayes
5
Online credit card fraud
HMM and K-means clustering, predictive analytics and an API module (SVM)
6
Tax fraud
Unsupervised machine learning
7
Financial fraud detection
CoDetect, Bayesian belief network, genetic algorithm, neural network, decision tree, self-organizing map, K-means clustering and probabilistic neural network (PNN)
8
Shill bidding fraud detection SVM
9
Automobile insurance fraud Iterative assessment algorithm (IAA)
2.4 Reinforcement Learning This is the kind of machine learning technique which specifies that how development model should take actions in a specific environment. It is the part of deep learning technique which helps to maximize some portion of cumulative award. It is the method of neural network which helps to learn how to attain a objective that is complex in nature or maximizes a specific dimension over many steps. In Table 1, some ML techniques that are applied for specifically financial frauds are listed.
3 Literature Review This research is purely based on the comparative analysis of work done in the field of fraud detection. This section of the paper gives a brief description of the historical review in the same domain. For this purpose, Table 2 is created showing the work
112
A. Gupta and M. C. Lohani
Table. 2 Contribution of various ML techniques applied to financial fraud S. no.
Paper name
Year
Type of fraud
Technique used
Accuracy (%)
Evaluation technique used
1
[1]
2019
Vat fraud detection
Unsupervised anomaly detection technique
2
[2]
2019
Accounting fraud
Ensemble machine learning
94.6
K-fold cross-validation
3
[3]
2019
Accounting fraud
Ensemble machine learning
4
[4]
2020
Bank fraud
Artificial neural network technique and harmony search algorithm
86.00
Comparison
5
[5]
2020
Credit card fraud
Optimized light gradient boosting machine (OLightGBM)
98.40
Fivefold CV procedure
6
[6]
2016
Online credit HMM and K-means card fraud clustering
7
[7]
2019
CCFD
Random forest algorithm
8
[8]
2018
Tax fraud
Unsupervised machine learning
9
[9]
2018
CCFD
Supervised machine learning (stacking classifier)
95.27
10
[10]
2018
CCFD
AdaBoost
99.90
Matthews correlation coefficient (MCC)
11
[11]
2019
Real time credit card fraud
Predictive analytics and an API module (SVM)
91.00
Appropriate performance matrix
12
[12]
2017
CCFD
K-nearest neighbour technique
97.70
Matthews correlation coefficient (MCC)
13
[13]
2018
CCFD
K-nearest neighbour technique/multilayer perceptron
99.13
14
[14]
2019
CCFD
Unsupervised learning algorithms
Performance evaluation metric
99.93
Comparative study (continued)
Comparative Analysis of Numerous Approaches in Machine Learning …
113
Table. 2 (continued) S. no.
Paper name
Year
Type of fraud
Technique used
Accuracy (%)
Evaluation technique used
15
[15]
2018
Real-time synthetic financial fraud detection
Ensemble of decision 90.49 tree (EDT)
ROC curve
16
[15]
2018
Real-time synthetic financial fraud detection
Stacked auto-encoder 80.52 (SAE)
ROC curve
17
[15]
2018
Real-time synthetic financial fraud detection
Restricted Boltzmann 91.53 machine (RBM)
ROC curve
18
[16]
2018
Financial fraud detection
Random forest
69.17
Comparison method
19
[17]
2018
Financial fraud detection
CoDetect
95
Feature matrix
20
[18]
2010
Automobile insurance fraud
Iterative assessment algorithm (IAA)
87.2
AUC
21
[19]
2006
fraudulent financial statements (FFS)
Bayesian belief network
90.3
Tenfold cross-validation
22
[19]
2006
fraudulent financial statements (FFS)
Neural network
80
Tenfold cross-validation
23
[19]
2006
fraudulent financial statements (FFS)
Decision tree
73.6
Tenfold cross-validation
24
[20]
2009
fraudulent financial statements (FFS)
Self organizing map and K-means clustering
89
Silhouette index
25
[21]
2019
CCFD
AdaBoost
69.6
Comparison of Mathews correlation coefficient (continued)
114
A. Gupta and M. C. Lohani
Table. 2 (continued) S. no.
Paper name
Year
Type of fraud
Technique used
Accuracy (%)
Evaluation technique used
26
[21]
2019
CCFD
Random forest
76.4
Comparison of Mathews correlation coefficient
27
[21]
2019
CCFD
Naïve Bayes
75.4
Comparison of Mathews correlation coefficient
28
[22]
2019
CCFD
Local outlier factor
45.82
Matthews correlation coefficient
29
[22]
2019
CCFD
Isolation forest
58.83
Matthews correlation coefficient
30
[22]
2019
CCFD
Logistic regression
97.18
Matthews correlation coefficient
31
[22]
2019
CCFD
Decision tree
97.08
Matthews correlation coefficient
32
[22]
2019
CCFD
Random forest
99.98
Matthews correlation coefficient
33
[23]
2016
CCFD
Decision tree
40
Bar graph
34
[23]
2016
CCFD
Random forest
38
Bar graph
35
[23]
2016
CCFD
Logistic regression
10
Bar graph
36
[23]
2016
CCFD
Decision tree Bayes minimum risk threshold (BMR)„
50
Bar graph
37
[23]
2016
CCFD
logistic regression Bayes minimum risk threshold (BMR),
35
Bar graph
38
[23]
2016
CCFD
Random forest Bayes 50 minimum risk threshold (BMR),
Bar graph
39
[23]
2016
CCFD
cost-sensitive decision tree(CSDT )
55
Bar graph
40
[23]
2016
CCFD
cost-sensitive logistic 55 regression(CSLR)
Bar graph
41
[24]
2016
CCFD
SVM
62.5
Comparison
42
[24]
2016
CCFD
Random forest
75
Comparison
43
[24]
2016
CCFD
Neural network
70
Comparison (continued)
Comparative Analysis of Numerous Approaches in Machine Learning …
115
Table. 2 (continued) S. no.
Paper name
Year
Type of fraud
Technique used
Accuracy (%)
Evaluation technique used
44
[24]
2016
CCFD
Convolutional neural network (CNN)
87
Comparison
45
[25]
2018
Shill bidding SVM fraud detection
86
Tenfold cross-validation
46
[25]
2018
CCFD
Balanced bagging ensemble (BBE)
94
AUC
47
[26]
2018
CCFD
Random forest (RF)
90
AUC
48
[26]
2018
CCFD
Gaussian Naïve Bayes
86
AUC
49
[27]
2018
CCFD
Logistic regression
72
ROC curve
50
[27]
2018
CCFD
Decision tree
72
ROC curve
51
[27]
2018
CCFD
Random forest decision tree
76
ROC curve
52
[28]
2019
Financial statement fraud
Probabilistic neural network (PNN)
98.09
Comparative analysis
53
[28]
2019
Financial statement fraud
Genetic algorithm
95
Comparative analysis
54
[28]
2019
CCFD
Naïve Bayes
99.02
Comparative analysis
55
[28]
2019
CCFD
SVM
98.8
Comparative analysis
56
[29]
2020
CCFD
Logistic regression
95.0
Comparative analysis
57
[30]
2019
CCFD
Hybridized classification model
83.83
Matthews correlation coefficient
58
[31]
2006
Fraudulent financial statement (FFS)
Stacking algorithm
74.1
Comparative analysis
59
[32]
2015
CCFD
Bayesian belief network
68
ROC
60
[33]
2018
FFS
Gaussian anomaly detection
71.68
AUC and ROC
* CCFD means Credit Card Fraud Detection
116
A. Gupta and M. C. Lohani
carried out till now in a structured format. This table describes the technique used to detect the fraud and its accuracy for detection of fraud. [1]
[2]
[3]
[4]
[5]
[6]
In this paper, methods have been applied to various organizations which are under same sectors and are signified by the collection of tax proportions which are developed and created on some field-specific acquaintance. It specifies an efficient strategy for auditing and can be used for decreasing fraud losses. This model is independent of inefficient audits and can be used by various tax administration across world. This study is done to help the various auditors. In this paper, a model was built that helps to predict any fraud firm based on the basis historical risk and present factors. It uses K-fold cross-validation method, which uses web application based on classifier that helps in various kind of decision-making process. It uses a framework which is based on ensemble classifier and the performance of classifier can be optimized using TOPSIS algorithm. This classifier produces accuracy, sensitivity, specificity, F-measure, MCC, AUC 94, 1, 0.92, 0.94, 0.83 and 0.98, respectively. This study has the future scope so that it can handle last 10 years data using various techniques used in big data framework like Hadoop, Spark, etc. This study developed a fraud prediction model using various machine learning approach. They used ensemble learning, the most powerful machine learning approach. They have discovered a new method for performance evaluation that is better for fraud detection tasks. This model works better than Dechowet al.’s [2011] that uses logistic regression model and Cecchini et al.’s [2010] which uses SVM. This study uses a fusion system which is built on the ANN and harmony search algorithm. It also uses HAS for the purpose of optimizing the various parameters of ANN. This study focuses on searching the hidden patterns between the fraudulent customers and normal customers. This research paper specifies an intelligent grounded method for the detecting various frauds in credit card transaction. It uses OLightGBM. In this they have tuned some parameters of light gradient boosting machine. This model provides the highest performance as per the following details: accuracy (98.40%), AUC(92.88%), precision (97.34%) and F1 score(56.95%). The conclusion of this study is that the specified technique gave the improved results than the other classifiers and also highlighted the usage of optimization to enhance the prediction. This paper deals with the streaming analytics, that facilitates the detection and analyses of the frauds related to credit card. This model analyses the historical transactional data and then use the model to analyse the transactions in real time. Using this model, the detection and prevention of the fraud transaction while it is being performed are done, so that the user can be notified. They describe the usage of HMM for the detecting fraudulent transaction. With the use of streaming analytics, this model along with the prevention of the fraud transaction also diminishes the false alarm rate.
Comparative Analysis of Numerous Approaches in Machine Learning …
[7]
[8]
[9]
[10]
[11]
[12]
[13]
117
Due to the drastic development in the technologies, it becomes very difficult to track the behaviour and pattern of any fraud transaction, thus the usage machine learning, artificial intelligence, etc., have given a good result in the detection of any fraud transaction. This study focuses on the usage of random forest and decision trees. The performance of the results indicates that best accuracy has been achieved for random forest that is 98.6% approximately. They are using random forest for classification of the credit card data sets. Then the uses of decision tree predict the fraud transactions. This study provides a method for detecting fraudulent tax payers. It uses unsupervised ML techniques. The most common type of tax fraud is the filling the tax return with lesser amount. This model uses ML algorithms that uses labelled data. The capability of this model is to recognize under reporting tax payers, and the results obtained demonstrate the proposed model does not miss to detect statements as apprehensive. This paper provides many supervised ML methodologies for detecting fraud transactions in credit cards. These practices are used to implement a superclassifier, that have used an ensemble learning method. In this paper, various ML algorithms were used for the detection of credit card frauds. Firstly, some standard techniques are used, and then, various hybrid methods or techniques that uses AdaBoost and other majority voting methods are applied. The results clearly specify that majority voting methods give better accuracy in the detection of frauds related to credit cards. In this study, MCC metric was used for performance measure. As a conclusion, the perfect MCC score that is one has been achieved using AdaBoost and majority voting methods. This paper mainly focuses on four basic types of fraud transactions. Each category of fraud is evaluated on the basis of ML techniques. The model that is proposed for detecting the fraud related to credit card is based on detecting four different patterns of fraud transactions. This research specifies best and efficient algorithm related to specific type of fraud. The model proposed in this paper indicates 74, 83, 72 and 91% accuracy for LR, NB, LR and SVM for detecting four fraud patters namely risky MCC, unknown web addresses, ISO response code and transaction above 100$. This paper comprises the comparison of various techniques like Naïve Bayes, KNN and LR on highly skewed credit card fraud data. Results that were specified in this paper conclude that KNN gives better performance among all used techniques for all kind of metrices except for the 10:90 distribution of data This paper describes such a model in which the frauds are detected by observing the usage pattern in the past and the present transactions. The various techniques that are described in this paper are KNN, Naïve Bayes, LR, CFLANN, multiplayer perceptron and decision tree. The evaluation result indicates that MLP performs better than CFLANN as time required to detect the fraudulent transaction is less in this.
118
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
A. Gupta and M. C. Lohani
In this paper, the different ML techniques have been evaluated for detecting various frauds related to credit cards. The study in this research concluded that the unsupervised algorithms have the capability to handle the skewness in better and efficient way This study specifies the usage of ML algorithms and deep learning algorithms for the detection of frauds. The best results were given by ensemble decision tree (EDT), stack auto-encoders (SAE) and restricted Boltzmann machines (RBM) Researchers in this study had proposed an efficient model for the detection of financial frauds. They have also used some techniques for feature selection in which XGBoost gives the better results. In this study, they have also compared the performance and effectiveness of different (five in this case) ML algorithms and concluded that random forest performs better In this paper, a method CoDetect has been introduced which performs efficiently. This model makes use of network and feature information for the finding of any financial fraudulent transactions. The method that was used in CoDetect is graph-based similarity matrix and feature matrix. It also capable of tracing the source of fraud. This paper specifies the methods used for the detection of fraudulent transactions in the field of automobile insurance. The algorithm which was used in this model is iterative assessment algorithm (IAA). It provides an expert approach for the uncovering of scams in automobile insurances. The usage of empirical evaluation specifies that the fraud can be detected efficiently. This paper describes the various classification techniques based on data mining that can be used for the detection of financial frauds. The model is created in such a manner so that it can able to detect the firms that is responsible for issuing any fraudulent statement. Various methods that were used are decision tree, neural network and Bayesian belief networks. Out of these three techniques, the Bayesian belief networks perform best and give the accuracy of 90.3% This paper describes the usage of the unsupervised learning method that is clustering. A model V-KSOM was described that takes advantages of SOM which is unsupervised self-learning process and spreaded over K-means clustering to the outcomes of SOM This paper describes two types of credit card frauds. In this paper, applicationlevel credit card frauds are detected using various techniques and their performances are compared and analysed based on sensitivity, specificity, precision, recall and MCC. The work presented in this paper concludes that the precision of J48, AdaBoost and random forest has improved to 78.2, 77.6 and 78.7 using information gain method. The model described in this paper was developed for detection of frauds for streaming transaction data. This model works on the basis of historical particulars of customers and tries to retrieve the behavioural designs. The model groups the customers according to transactions and apply different classifier and at last rating score was generated for each classifier. The results
Comparative Analysis of Numerous Approaches in Machine Learning …
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
119
specified in the paper conclude that Matthews correlation coefficient was better in case of imbalance data sets but using one class classifier LR, decision tree and random forest gave better results. In this paper, the importance of the features has been described. The paper analyses the behaviour of the customer for each individual card holder. The results presented in the study show that there was an increase in the performance and the predicting capability if the right features were selected. This paper proposed a fraud detection framework based on CNN which is used on real transactions of some commercial bank and is used to demonstrate performance compared with other methods. This research proposed the development of such a model that is capable of differentiating between the legitimate and shill bidders. This framework uses SVM classifier which works efficiently and provides good accuracy for the detection and misclassification rate of shill bidders. This paper shows the comparative study of different ML algorithms which can be used for the fraud detection and works efficiently and faster on large data sets. The comparison of three algorithm RF, balanced bagging ensemble and Gaussian Naïve Bayes is shown on highly imbalanced and big data sets. In this paper, a detailed discussion has been made which provides an efficient way to detect various types of frauds in banking sectors using big data analytical framework which provide a capability to process large amount of data. The model which is described in the work uses some machine learning techniques or algorithms which enhance the efficiency of the model which provides the capability so that it can work on real-time basis. From the various techniques used, the performance of Random Forest Decision Tree is best on the basis of accuracy, precision and recall. Random forest could not perform well when data is increased due to overfitting. This paper proposed a framework that can be used to process huge or large data sets using big data analytical techniques. The comparison of three different analytical techniques was given out of which the random forest gave the best accuracy. In this paper, the researchers have identified various techniques that give better results for the fraud detection. It is found that hybrid fraud detection methods were mostly used in the fraud detection because these methods are capable of combining the advantages and features of various traditional methods and techniques. The proposed model can be extended in such a manner so that it also works efficiently for the real date and imbalanced data In this research work, the imbalanced data is resampled to give the better results. The three different techniques were used for sampling of skewed data sets. A comparison of three machine learning algorithms (Naïve Bayes, KNN and logistic regression) was given. Out of these three, logistic regressions given the best results. The LR showed the maximum accuracy of 95%, NB showed 91% and KNN 75% In this study, various algorithms such as KNN, ELM, RF, MLP were compared. The researches had proposed a predictive classification model
120
[31]
[32]
[33]
A. Gupta and M. C. Lohani
by using specified techniques. Ensemble of ML algorithms is proved to be a novel approach for detecting credit card fraud. This paper describes the usage of ML for the detection of firms which issues fraudulent financial statements. They also specify the factors associated with FSS. They proposed a hybrid decision support system which combines algorithm using a different technique and gives improved performance results. This paper shows the comparison of two important methods of ML namely ANN and Bayesian belief networks. According to the results shown in the study, it has been concluded that BNN yield better results as compared to ANN In this work, the researchers constructed an ensemble model for detecting accrual handling by using theory from the work done by Beneish. They also proposed a novel simulation-based method for sampling which can handle the unbalanced data effectively. The two important work describes in the paper are: functional contributions for suggesting easy method for the detection of companies with high risks and determining a simulation-based technique for sampling that can be used in data which is imbalance in nature.
4 Analysis and Discussion For credit card fraud detection, many supervised and unsupervised machine learning algorithms are applied but with the help of Table 3, it is clearly visible that random forest gives the accuracy of 99.98% which is very high as it is specifically used text processing techniques. For real-time credit card fraud detection algorithms like HMM, KNN and support vector machine (SVM) are applied because it is hard to perceive the fraud online so very less techniques are applied among them SVM performs better with 91.0% accuracy as well as in Shill Billing Fraud it gives best result with 86.0% accuracy. This is because SVM with high dimension space and a Table. 3 Comparative analysis of different techniques
Fraud type
Technique
Accuracy (%)
Credit card fraud
Random forest
99.98
Real-time credit card fraud
SVM
91.0
Financial statement fraud
Probabilistic neural network (PNN)
98.09
Accounting fraud
Ensemble machine learning
94.6
Bank fraud
Artificial neural network (ANN)
86.0
Shill bidding fraud
SVM
86.0
Automobile insurance fraud
Iterative assessment algorithm (IAA)
87.2
Comparative Analysis of Numerous Approaches in Machine Learning …
121
Accuracy 99.98%
98.09% 91.00%
Random forest
SVM
94.60%
Probabilisc Ensemble Neural Machine Network learning (PNN)
86.00%
86.00%
87.20%
Arficial Neural network (ANN)
SVM
Iterave Assessment Algorithm (IIA)
Fig. 2 Comparison of accuracy of various techniques used for fraud detection
smaller number of samples SVM is effective and it needs a clear data without any errors and also memory efficient. Financial statement frauds are very rottenly applied so to detect these frauds a number of ML algorithms are applied, probabilistic neural network (PNN) works better than another algorithm by giving 98.09% accuracy. In detection of accounting fraud ensemble machine learning algorithm with 94.6% accuracy performs better. In bank fraud, artificial neural network (ANN) gives best result by giving 86.0% accuracy. Automobile insurance fraud can be detected well with iterative assessment algorithm (IAA) as the accuracy percentage is 87.2%. So it is observed that real-time frauds can be detected easily with SVM and for other types of frauds complex neural networks can be applied as the accuracy of deep neural network is much higher than other networks but it takes a high computation cost and needs a good hardware system to build the model and to train the model. The comparative analysis is also given in Fig. 2 which is the graphical representation for the accuracy of ML techniques.
5 Conclusion Auditing these days needs to adapt to an expanding number of the fraud cases. Supervised ML techniques can encourage evaluators in achieving the errand of the executives fraud detection. The point of this examination has been to explore the value and think about the exhibition of ML techniques in identifying false fiscal summaries by utilizing distributed monetary information. Moreover, a generally little rundown of money-related proportions to a great extent decides the order results. This information, combined with ML techniques, can give models fit for accomplishing impressive order exactness. As far as execution, the proposed stacking variation strategy accomplishes preferable execution over any analysed straightforward and group technique. Following advancement is a tedious employment that can be dealt with naturally by a learning apparatus. While the specialists will at present have a
122
A. Gupta and M. C. Lohani
basic function in checking and assessing progress, the instrument can aggregate the information needed for sensible and productive observing. In future, new deep learning techniques can be applied to fraud detection so that the accuracy can be improved with less computation overhead and CPU time.
References 1. Vanhoeyveld, J., et al.: Value-added tax fraud detection with scalable anomaly detection techniques. Appl. Soft Comput. J. (2019), https://doi.org/10.1016/j.asoc.2019.105895 2. Hooda, N., et al: Optimizing fraudulent firm prediction using ensemble machine learning: a case study of an external audit. Appl. Artif. Intell. DOI: https://doi.org/10.1080/08839514. 2019.1680182 3. Bao, Y., et al.: Detecting accounting fraud in publicly traded U.S. Firms Using a Machine Learning Approach. https://doi.org/10.1111/1475-679X.12292 4. Daliri, S.: Department of computer engineering, science and research branch, Islamic Azad University, Tehran, Iran: Using Harmony Search Algorithm in Neural Networks to Improve Fraud Detection in Banking System: Hindawi Computational Intelligence and Neuroscience Volume 2020, Article ID 6503459, 5 pages https://doi.org/10.1155/2020/6503459 5. Altaher Taha, A., et al.: An Intelligent approach to credit card fraud detection using an optimized light gradient boosting machine. Digital Object Identifier https://doi.org/10.1109/ACCESS. 2020.2971354 6. Rajeshwari, U., et al.: Real-time credit card fraud detection using Streaming Analytics. 978– 1–5090–2399–8/16/$31.00 c 2016 IEEE 7. Jonnalagadda, V., et al.: Credit card fraud detection using Random Forest algorithm. Int. J. Adv. Res. Ideas Innovations Technol., ISSN: 2454–132X Impact factor: 4.295 (Volume 5, Issue 2). 8. de Roux, D., et. al.: Tax fraud detection for under-reporting declarations using an unsupervised machine learning approach. https://doi.org/10.1145/3219819 9. Dhankhad, S., et al.: Supervised machine learning algorithms for credit card fraudulent transaction detection: a comparative study. 978–1–5386–2659–7/18/$31.00 ©2018 IEEE DOI https:// doi.org/10.1109/IRI.2018.00025 10. Randhawa, K., et al.: Credit card fraud detection using adaboost and majority voting. Digital Object Identifier https://doi.org/10.1109/ACCESS.2018.2806420 11. Thennakoon, A., et al.: Real-time credit card fraud detection using machine learning. 978–1– 5386–5933–5/19/$31.00 c 2019 IEEE 12. Awoyemi, J.O., et al.: Credit card fraud detection using machine learning techniques. 978–1– 5090–4642–3/17/$31.00 ©2017 IEEE 13. Dighe, D., et al.: Detection of credit card fraud transactions using machine learning algorithms and neural networks. 978–1–5386–5257–2/18/$31.00 ©2018 IEEE 14. Mittal, S., et al. Performance evaluation of machine learning algorithms for credit card fraud detection 15. Mubarek, A., et al.: Deep learning approach for intelligent financial fraud detection system. 978–1–5386–78930/18/IEEE 16. Yao, J., et al.: A financial statement fraud detection model based on hybrid data mining methods. 978–1–5386–6987–7/18/$31.00 ©2018 IEEE 17. Huang, D., et al.: CoDetect: financial fraud detection with anomaly feature detection. 2169– 3536 2018 IEEE 18. Subelj, L., et al.: An expert system for detecting automobile insurance fraud using social network analysis. Expert Syst. Appl. 38, 1039–1052 (2011) 19. Kirkos, E.: Data mining techniques for the detection of fraudulent financial statements. 0957– 4174/$—see front matter 2006 Elsevier Ltd. All rights reserved. doi:https://doi.org/10.1016/j. eswa.2006.02.016
Comparative Analysis of Numerous Approaches in Machine Learning …
123
20. Deng, Q., et al.: Combining self-organizing map and k-means clustering for detecting fraudulent financial statements 21. Singh, A., et al.: Adaptive credit card fraud detection techniques based on feature selection method. © Springer Nature Singapore Pte Ltd., Bhatia, S.K., et al. (eds.), Advances in Computer Communication and Computational Sciences, Advances in Intelligent Systems and Computing, vol. 924 (2019), https://doi.org/10.1007/978-981-13-6861-5_15 22. Nath, V., et al.: Credit card fraud detection using machine learning algorithms. 1877–0509 © 2019 The Authors. Published by Elsevier B.V 23. Correa, A., et al.: Feature engineering strategies for credit card fraud detection. https://doi.org/ 10.1016/j.eswa.2015.12.030, 0957–4174/© 2016 Elsevier Ltd 24. Fu, K., et al.: Credit card fraud detection using convolutional neural networks. c Springer International Publishing AG 2016, Hirose, A., et al. (eds.) ICONIP 2016, Part III, LNCS 9949, pp. 483–490 (2016). DOI: https://doi.org/10.1007/978-3-319-46675-053 25. Ganguly, S., et al.: Online detection of shill bidding fraud based on machine learning techniques. © Springer International Publishing AG, part of Springer Nature 2018, Mouhoub, M., et al. (eds.) IEA/AIE 2018, LNAI 10868, pp. 303–314, (2018). https://doi.org/10.1007/978-3-31992058-0_29 26. Mohammed, R.A., et al.: Scalable machine learning techniques for highly imbalanced credit card fraud detection: a comparative study. LNAI 11013, pp. 237–246 (2018). https://doi.org/ 10.1007/978-3-319-97310-4_27 27. Patil, S., et al.: Predictive modelling for credit card fraud detection using data analytics. Proc. Comput. Sci. 132, 385–395 (2018) 28. Sadgali, I., et al.: Performance of machine learning techniques in the detection of financial frauds. Proc. Comput. Sci. 148, 45–54 (2019) 29. Itoo, F., et al.: Comparison and analysis of logistic regression, Na¨ıve Bayes and KNN machine learning algorithms for credit card fraud detection. Int. J. Inf. Tecnol. https://doi.org/10.1007/ s41870-020-00430-y 30. Debachudamani, et al.: Fraudulent transaction detection in credit card by applying ensemble machine learning techniques. IEEE—45670, 10th ICCCNT 2019 July 6–8, 2019, IIT, Kanpur 31. Kotsiantis, S., et al.: Forecasting fraudulent financial statements using data mining. Int. J. Comput. Intell. 3(2) (2006) ISSN 1304–2386 32. Maes, S., et al.: Credit card fraud detection using bayesian and neural networks. Researchgate 33. Rahul, K., et al. Spotting earnings manipulation: using machine learning for financial fraud detection. https://doi.org/10.1007/978-3-030-04191-5_29 34. Sabharwal, M.: The use of soft computing technique of decision tree in selection of appropriate statistical test for hypothesis testing (2018). https://doi.org/10.1007/978-981-10-5687-1_15 35. Giri, J.P., et al.: Neural network-based prediction of productivity parameters. https://doi.org/ 10.1007/978-981-10-5687-1_8 36. Rajkumar, A., et al.: New arithmetic operations of triskaidecagonal fuzzy number using alpha cut. https://doi.org/10.1007/978-981-10-5687-1_12
Low-Cost Automated Navigation System for Visually Impaired People Chetan Bulla , Sourabh Zutti, Sneha Potadar, Swati Kulkarni, and Akshay Chavan
Abstract Blindness is one of those world’s most feared afflictions. Blind people have trouble getting to the desired destination. The smart jacket for visually impaired people or say visually impaired system (VIS) supports this process by providing key facilities a short-range system for detecting obstacles, a short-range system for identifying obstacles, a signboard recognition system, and the shortest path guidance system for source to destination. Obstacle detection, distress calling, global location tracking, voice command functionality, and shortest route guidance are all features of this system in real time. The aim is to build a program that will direct visually disabled people arrive at the destination they want and help them understand the natural world around them. The blind or visually impaired rely primarily on other senses such as sound, touch, and smell to perceive their surroundings. We find it very daunting to go out alone, not to mention toilets, subway stations, restaurants, and so on. The visually impaired program seeks to make blind people fully exposed to their surroundings. Keywords IoT · Machine learning · Navigation system · Visually impaired
1 Introduction Blindness is one of the world’s most feared afflictions. Blind people have trouble getting to the desired destination. The visually impaired system (VIS) supports this process by providing key facilities a short-range system for detecting obstacles, a short-range system for identifying obstacles, a signboard recognition system, and the shortest path guidance system for source to destination. Obstacle detection, distress calling, global location tracking, voice command functionality, and shortest route guidance are all features of this system in real time. The aim is to build a program that will direct visually impaired or disabled people to be at the desired destination and help them understand the natural world around them. The blind or visually impaired C. Bulla (B) · S. Zutti · S. Potadar · S. Kulkarni · A. Chavan Dept of CSE, KLE College of Engineering and Technology, Chikodi, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_12
125
126
C. Bulla et al.
primarily rely on their other senses such as sound, touch, and smell to understand their environment. To go out alone is pretty hard for them, not to mention subway stations, locating toilets, restaurants, etc. The visually impaired program seeks to give blind people the accessibility to the surrounding. The visual disability is characterized as restricting the visual system’s behavior and functions. Visual impairment, also known as a visual deficiency or loss of vision, is a diminished ability to see to a degree that creates complications that are typically not fixable, such as glasses. Visual impairment is also described as a visual acuity poor than either 20/40 or 20/60 which is better corrected. The word blindness is used for loss of vision in total or near completion. Visual disability can cause difficulties for people with regular everyday activities such as driving, reading, socializing, and walking. A system is a collection of objects that function together as parts of a process or a network of interconnections, a complex whole. A network is an interacting or interrelated community of entities that form a single whole. A system is delineated by its spatial and temporal boundaries, surrounded and influenced by its surroundings, defined by its structure and purpose, and expressed in its workings. The device is which processes correct input and generate suitable output.
1.1 Objectives The main objective of our proposed work is to provide a reliable, cost-effective, low power solution for a blind people which would help them to move almost like any other normal pedestrians. The cost of this system makes it affordable for the majority of the society which in turn an effective device for them to spend on, just for once and assures wonderful travel guide for them. 1. 2. 3. 4.
The proposed system’s camera captures image which will be integrated on the jacket. The program will recognize images and determine each image object. Calculate the distance from each target to the user and identify the target. Convert the information to voice using certain program and hear via headphone. Navigate the path through the GPS system which locates and directs the user to the destination
The rest of the paper is organized as follows: Sect. 2 discussed existing work, Sect. 3 represents the design and implementation, Sect. 4 discusses the experimental evaluation, and Sect. 5 concludes the work.
2 Related Work and Problem Definition In a rapidly flourishing country, an innumerable number of attempts has been made for the welfare of especially disabling people of our society.
Low-Cost Automated Navigation System for Visually Impaired People
127
One of such attempts is the project “Project Prakash”, an empathetic attempt toward the blind children to help them gain knowledge of a set of obstacles around them by using their brains. Many worked on how a blind people can be able to detect any type of pits, potholes, and several ups and downs by using a smart white cane where they have used ultrasonic sensors. In this device, a multilingual system for audio feedback cannot be used because it can record only for 680 s. The major limitation of this device is that it is not at all suitable for completely blind people. It is recommended only for people with low vision or night blindness. There is another new attempt of assisting the blind people which is named as HALO or Hectic Assisted Location of Obstacles. It consists of rangefinders that will take input from ultrasonic sensors and feedback from the output to pulse vibration motors mounted on the blind man’s head. As the person gets closer to the target, the vibration strength and frequency increase. The main limitation is the use of a vibration motor. The vibrations as output feedback are far way irritating for any blind person (Table 1).
3 Proposed Model A system architecture is a conceptual model defining a system’s structure, behaviour and more views. A description of the architecture is a formal description and representation of a system, organized in a way that supports reasoning about the system’s structures and behaviours. It is a concept that focuses on components or elements of a structure. Technical and non-technical audiences are more able to understand this architecture. It can give a detailed representation of components of the system. Visually impaired system implementation includes two components: hardware and software implementation. And the system architecture diagram for the unit is shown on Fig. 1. The camera on the board records the video stream in the “unit block diagram” through the blind and then transfers it to the Raspberry Pi. When the system matches the public signs, the kernel will give the correct instructions according to the respective signs. And the system can call both the (if not wired) Bluez module and the audio module to forward the voice instructions through headphones. Ultrasonic sensors would be the best sensors that can be used to detect objects because ultrasound is a strong point, with the slow-wave energy absorption propagating over a relatively distant medium distance. It is thus often used for calculating distances over large distances. Around the same time, ultrasound in the dark, dust, smoke, electromagnetic interference, harmful and other harsh conditions for the device has certain adaptability, with a wide variety of applications. GPS module also works the same as other modules as mentioned above, but it also refers to external Google Map API for Navigation/Monitoring [11]. As shown in Fig. 4.1, the system is interconnected in this way. A flowchart is a visual representation of the sequence of steps and decisions needed to perform a process. Each step in the sequence is noted within a diagram shape. Steps are linked by connecting lines and directional arrows. We will see the flowchart of each module which is present in our system as follows in further subsections. The
128
C. Bulla et al.
Table. 1 Existing work with its limitations Ref no. Feature 1 Feature 2 Feature 3 Feature 4 Feature 5 Problems found in referred paper [1]
✓
✓
[2]
✓
[3]
[4]
✓
✓
[5]
✓
[6]
[10]
✓
Positioning can be significantly degraded due to the blockage of signals from the satellites
Using DGPS, improve both the accuracy and availability of the positioning
Separate tools for obstacle detection
All modules are integrated into one device
Cannot detect transparent objects
Can detect transparent objects
Face/object detection is unavailable
Object detection is implemented
More number of features are considered
Does not Detect the help to obstacle and identify what identifies it type of obstacle
✓
[9]
Changed the position of sensor
Performance decreases with complexity of images ✓
[8]
Radiations may harm to brain
Less space Provided with for continue more space power supply
✓
[7]
Solution for respective problem
✓
Only text recognition module is implemented
Text and sign recognition module is available
High cost
Cost-effective
Feature 1: ultrasonic sensor module, Feature 2: object detection/recognition module, Feature 3: navigation module (GPS), Feature 4: audio and buzzer module and Feature 5: signboard recognition module.
Low-Cost Automated Navigation System for Visually Impaired People
129
Fig. 1 System architecture
flowchart displays the steps as different types of boxes and their order by connecting the boxes to arrows. This diagrammatic representation shows a model of a solution to a given problem. Flowcharts are used in different fields in the research, design, documentation or management of a method or system.
3.1 Object Detection Module The principle of ultrasonic sensor detection of obstacles has been used here as shown in Fig. 2. So soon as the sensor senses the obstacle, the distance is forwarded to the process. We convert the distance from milliseconds into centimetres and test if the obstacle distance is less than 3 m, if yes then we send the output via a headphone. Ultrasonic sensors work by transmitting sound waves at frequencies which are too noisy for human detection. We then wait for a sound echo, measuring the distance by the correct time.
3.2 Object Recognition Module Object and face recognition through the camera. A method of recognizing faces or objects, as shown in Fig. 2b, is a technology capable of recognizing or verifying a person or object from a digital image or a video frame from a source [12, 13]. There are many ways facial recognition systems operate, but in general, they operate by matching selected facial characteristics from a given image with faces within a database, similar to the system of object identification. It is also defined as a biometric
130
C. Bulla et al.
Algorithm Object Detection Algorithm: Input: Echo Pulse output by Ultrasonic Sensor Output: Audio output from which side the object is detected Step 1: Initialise Ultrasonic Sensors Step 2: Trigger Pulse Input (Set TRIG = High) Step 3: Set TRIG = Low Step 4: WHILE (ECHO_PIN == 0 ) Start_time = current time END WHILE Step 5: WHILE (ECHO_PIN == 0 ) End_time = current time END WHILE Step 6: Find the difference between end time and start time and assign it to duration Step 7: Then find the distance in centimetres. (By using, distance = 17150 * duration ) Step 8: IF distance is less than 15 cm THEN Audio output ELSE do nothing
(a)
Algorithm: Object and Sign-Board Recognition Algorithm: Input: Image from Video Output: Recognized Object’s name in form of voice Step 1: Initialise Pi-Camera Step 2: Grab Reference to the raw capture (captured image) Step 3: Acquire frame dimensions and expand frame dimensions to form: [1, None, None, 3] Step 4: Realize detection by running the model with the image as input Step 5: Draw the Detected Results Step 6: Display the result and give audio output
(b)
Fig. 2 a Object detection module flowchart b Algorithm for object detection module
artificial intelligence-based application which can uniquely identify a person by analyzing patterns based on the individual’s facial textures and shape [14]. Here, we use this tensor flow technology for object/face recognition system to help the blind identify real-world human faces. The identification result is transmitted to the blind person through an audio voice via headphones. The camera is installed and is attached to a multiprocessor and is also attached to the computer along with the headset.
3.3 Navigation Module GPS navigation: As shown in Fig. 3, some of the most pressing limitations that blind people face in their lives is their independent pedestrian outdoor navigation. The visually handicapped people used the global navigation positioning system; because their movement and travel rely solely on information and local knowledge, the GPSbased systems help them reach the destination and gain accurate information about a specific destination.
Low-Cost Automated Navigation System for Visually Impaired People
131
Algorithm: GPS location tracking Algorithm: Input: Data received from Satellites Output: Text message to guardian and audio output regarding current location of user Step 1: Initialise GPS receiver device Step 2: Read NMEA string received from satellite. Step 3: Check for NMEA GPGGA string Step 4: IF Data available THEN store data coming after "$GPGGA," string store comma separated data in buffer, get time, latitude, longitude Step 5: Pass Latitude and Longitude values in degrees to Google map or HERE API Step 6: Scrape Current position information from Google Map or HERE API Step 7: Using Twilio API send user’s current location details to guardian (in text form) Step 8: Audio output of current location to user Fig. 3 Algorithm for navigation module
Algorithm: Audio Module Algorithm: Input: Text Data/ Instruction Ouput: Audio Output (Text to Speech) Step 1: install and import gtts module, pygame module Step 2: Pass the text data to gtts.GTTS(text) Step 3: Initialise mixer fro pygame Step 4: Save mp3 file in proper directory Step 5: Play mp3 file
Fig. 4 a Algorithm for audio to speech
3.4 Audio Alert with Speech Module Audio voice alert: Our method is inspired by auditory replacement tools, which encode visual scenes from a video camera and generate sounds as an acoustic representation called a “soundscape”. Images are converted into sound by scanning the visual scene from left to right. Learning this kind of image-to-sound conversion enables the localization and identification of everyday artifacts, as well as the identification of signboards. (Fig. 4).
3.5 Signboard Recognition Module Signboard Recognition: As shown in Fig. 6, the program will identify and understand public signs in cities and provide the blind person with the correct voice hints. It gives
132
C. Bulla et al.
voice clues through the headphones [15]. These voice hints are generated using features.
4 Experiment Evaluation To test our proposed model, we have developed prototypes of jacket where all the hardware resources like Adriano, Raspberry Pi, and various sensors are integrated in that jacket. We have used Python language with computer vision and other predefined libraries to implement our model.
4.1 Result Analysis In this section, we evaluated our proposed model using various parameters. These parameters are user-friendly, response time, transparency, and cost (Fig. 5).
User-Friendly (In Persentage)
100 80 60 40 20 0
Smart Glass
Stick for Blind Smart Jacket
Fig. 5 Comparison of the various existing model for user-friendly parameter
Number of Features
6 5 4 3 2 1 0
Smart Glass Stick for Blind Smart Jacket
Fig. 6 Feature analysis compared with proposed system and existing system
Low-Cost Automated Navigation System for Visually Impaired People
133
Response Time(in sec)
25 20 15 10 5 0
Smart Glass
Stick for Blind Smart Jacket
Fig. 7 Response time analysis compared with proposed system and existing system
In the above Fig. 7.1 we can see that the user-friendly analysis chart is obtained by comparing the proposed system with the existing system. The proposed system does not need extra knowledge to use it and as compared with smart glass and stick, they are difficult to use. The user just needs to on the device. The device will be interacting with the user via audio instructions and the user can interact with the device via buttons embedded on the jacket. The smart stick and smart glass are also difficult to handle, but our device smart jacket will be easy to handle or to hold in any situation. Users need just need to wear the jacket in which the device is embedded; the user will not have to worry about missing the device as glasses and stick need proper attention to carry. Sticks and glasses can fall, but the jacket cannot. In the below Fig. 6, we can see that the feature analysis chart is obtained by comparing the proposed system with the existing system. In the proposed system, user can get the five types of features, as object detection, object recognition, signboard recognition, GPS navigation, and audio module, whereas in smart stick and smart glasses, they provide limited features. In the above Fig. 7, we can see that the response time analysis chart obtained by comparing the proposed system with the existing system. The existing system requires more response time, wherein the proposed system gives the speed response in generating message and user can easily get surrounding environment information by the device. In the above Fig. 8, we can see that the cost analysis chart is obtained by comparing the proposed system with the existing system. The proposed system requires less cost than smart glasses and high cost than the smart stick, wherein the existing system has fewer features as compared to the proposed system, so it would be cost-effective.
5 Conclusion The smart guide devices are also useful for visually impaired people and can help them travel or navigate safely and easily in complicated indoor/outdoor environment. For external environment, the GPS navigation module will help to navigate by guiding the user for direction to their destination. The computation is quick enough for the
134
C. Bulla et al. 18
Cost(in Thousands)
16 14 12 10 8 6 4 2 0
Smart Glass Stick for Blind Smart Jacket
Fig. 8 Cost analysis compared with proposed system and existing system
identification and display of obstacles. The sensors used in this system are simple and with low cost, making it possible to be widely used in consumer market. We have implemented many excellent image processing, object recognition algorithms on the new lightweight jacket system. This system can detect and recognize the object in real time and give the audio output. Those with visual disability need not guess objects/signboards. This will recognize them when their video/image frame is recorded by the camera. Our experiment result shows that the proposed model is economical, faster response and provides more features in one product.
References 1. Agarwal, R., et al.: “Low cost ultrasonic smart glasses for blind,” 2017 8th IEEE Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), Vancouver, pp. 210–213, IEEE (2017) 2. Balachandran, W., Cecelja, F., Ptasinski, P.: “A GPS based navigation aid for the blind,” 17th International Conference on Applied Electromagnetics and Communications, pp. 34–36, IEEE (2013) 3. Akbar, I., Misman, A.F.: “Research on semantics used in GPS based mobile phone applications for blind pedestrian navigation in an outdoor environment,” 2018 International Conference on Information and Communication Technology for the Muslim World (ICT4M), Kuala Lumpur, pp. 196–201, IEEE (2018) 4. Bai, J., Lian, S., Liu, Z., Wang, K., Liu, D.: “Smart guiding glasses for visually impaired people in indoor environment.” In: IEEE Trans. Cossum. Electron. 63(3), 258–266, IEEE (Aug 2017) 5. Lan, F., Zhai, G., Lin, W.: “Lightweight smart glass system with audio aid for visually impaired people,” TENCON 2015—2015 IEEE Region 10 Conference, Macao, pp. 1–4, IEEE (2015) 6. James, N.B., Harsola, A.: “Navigation aiding stick for the visually impaired,” 2015 International Conference on Green Computing and Internet of Things (ICGCIoT), Noida, pp. 1254–1257, IEEE (2015) 7. Krishnan, K.G., Porkodi, C.M., Kanimozhi, K.: “Image recognition for visually impaired people by sound,” 2015 International Conference on Communication and Signal Processing, Melmaruvathur, pp. 943–946, IEEE (2015)
Low-Cost Automated Navigation System for Visually Impaired People
135
8. Mala, N.S., Thushara, S.S., Subbiah, S.: “Navigation gadget for visually impaired based on IoT,” 2017 2nd International Conference on Computing and Communications Technologies (ICCCT), Chennai, pp. 334–338, IEEE (2017) 9. Rajesh, M., et al.: “Text recognition and face detection aid for visually impaired person using Raspberry PI,” 2017 International Conference on Circuit ,Power and Computing Technologies (ICCPCT), Kollam, 2017, pp. 1–5, , IEEE (2017) 10. Subramoniam, S., Osman, K.: “Smart phone Assisted Blind Stick”, The Turkish Online Journal of Design, Art and Communication, ISSN: 2146–5193, September 2018 Special Edition, pp. 2613–2621, TOJDAC (2018) 11. Jain, N.K., Saini, R.K., Mittal, P.: A Review on Traffic Monitoring System Techniques. In: Ray, K., Sharma, T.K., Rawat, S., Saini, R.K., Bandyopadhyay, A. (eds.) Soft Computing: Theories and Applications, vol. 742, pp. 569–577. Springer Singapore, Singapore (2019) 12. Shah, S., Rathod, N., Saini, P.K., Patel, V., Rajput, H., Sheth, P.: Automated Indian Vehicle Number Plate Detection. In: Ray, K., Sharma, T.K., Rawat, S., Saini, R.K., Bandyopadhyay, A. (eds.) Soft Computing: Theories and Applications, vol. 742, pp. 453–461. Springer Singapore, Singapore (2019) 13. Birje, M.N., Bulla, C.: Cloud monitoring system: basics, phases and challenges. Int. J. Recent Technol. Eng. Regul. Issue 8(3), 4732–4746 (2019) 14. Rahul, M., Mamoria, P., Kohli, N., Agrawal, R.: An Efficient Technique for Facial Expression Recognition Using Multistage Hidden Markov Model. In: Ray, K., Sharma, T.K., Rawat, S., Saini, R.K., Bandyopadhyay, A. (eds.) Soft Computing: Theories and Applications, vol. 742, pp. 33–43. Springer Singapore, Singapore (2019) 15. Gunale, K., Mukherji, P.: An Intelligent Video Surveillance System for Anomaly Detection in Home Environment Using a Depth Camera. In: Ray, K., Sharma, T.K., Rawat, S., Saini, R.K., Bandyopadhyay, A. (eds.) Soft Computing: Theories and Applications, vol. 742, pp. 473–481. Springer Singapore, Singapore (2019)
Blockchain Platforms and Interpreting the Effects of Bitcoin Pricing on Cryptocurrencies Nitima Malsa, Vaibhav Vyas, and Jyoti Gautam
Abstract Blockchain technology is one of the most emerging technologies, giving trust to the users. It is a revolution to meet the demands of the users and control the supply. There are various platforms available for the implementation of blockchain technology. This paper focuses on most prevalent five blockchain platforms— Ethereum, NEO, Cardano-Ada, EOS and TRON. It presents comparison on the technical features of five platforms. Comparison has also been performed on features such as founder, founded, purpose, kind of data stored, language used for implementation, native currency, block release time, transaction rate and consensus mechanism features. Cryptocurrency is the most widespread use case of blockchain technology. Karl Pearson correlation coefficient has been calculated between Bitcoin and other five cyptocurrencies of the given platforms (Ether, NEO, CardanoAda, EOS and TRX), which is helpful for interpreting the effects of Bitcoin pricing on cryptocurrencies and can be highly significant for making investments in cryptocurrency. Keywords Bitcoin · Cardano-Ada · Cryptocurreny · Demand supply · EOS · Ethereum · Neo · Tron
1 Introduction Commerce on the Internet permits to do business on a global scale. All E-commerce organizations exclusively rely on financial organizations as a trusted intermediary to process electronic payments. It causes some weaknesses such as increase in transaction costs, restricts the minimum transaction size and higher cost and results in loss of ability to make non-reversible payments. According to the Experian report “The 2018 Global Fraud and Identity Report”, 75% of businesses require higher authentication N. Malsa (B) · V. Vyas Banasthali Vidyapith, Banasthali, India J. Gautam JSS Academy of Technical Education, C-20/1, Sector-62, Noida, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_13
137
138
N. Malsa et al.
and more security actions with the aim of having limited or no impact on the digital consumer experience [1]. A certain percentage of fraud is accepted as unavoidable. This is possible only when two willing parties transact directly without a trusted intermediary. Transactions that are not practically possible to reverse would only protect sellers from fraud, and the same escrow mechanisms can easily be applied to save buyers. Satoshi Nakamoto provides a P2P distributed network as a solution to double spending problem [2]. Thereby, the cryptocurrency (Bitcoin) came into picture and hence the blockchain technology. The blockchain—a revolutionary technology—is nothing but a “chain of blocks” in which each block contains number of transactions, timestamp, hash and nonce. The technology is becoming very popular day by day and has various real-world uses such as medical record keeping, supply chain, digital Ids, digital voting, sharing of data, gaming, immutable data backup and tax regulation [3]. The technology originally developed for Bitcoin cryptocurrency but has potential to be used on various fields. According to the “CoinMarketCap”, more than 5000 cryptocurrencies are there in the world. Current market cap value is around 200 Billion dollar (recorded on January 18, 2019) and can be increased 80 trillion dollar in next 15 years. The Bitcoin (BTC) market dominance is around 52.4% of the total market capitalization [4–6]. The market capitalization shows the various cryptocurrency with various features such as name, symbol, market cap, price, volume, supply, change and price graph. It also shows the ranking of various cryptocurrencies on the basis of their market cap. The blockchain boffins took a deeper look at the five blockchain technology platforms, namely Ethereum, NEO, Cardano, EOS and TRON. The purpose of this paper is to compare the selected blockchain platforms which will be going to be used at large scale in near future. Karl Pearson correlation coefficient has been calculated between Bitcoin and other five cyptocurrencies of the given platforms (Ether, NEO, Cardano-Ada, EOS and TRX). The paper will help the professionals for selecting the appropriate platform for solving complex business problems.
1.1 Ethereum: A Next Generation Smart Contract and Decentralized Application Platform Block chain implementation of Ethereum is not only specialized for cryptocurrency (ether) but also as a decentralized blockchain platform for running smart contracts. This is the main reason of having more use cases of Ethereum than Bitcoin. In general, there are three different kinds of applications that can be implemented on top of the Ethereum. These are financial applications, partially financial applications and non-financial applications. Online voting and decentralized governance lie in the category of non-financial applications. Ethereum smart contracts are executed on Ethereum virtual machine that is the run-time environment of Ethereum. All the nodes which are available in the network
Blockchain Platforms and Interpreting the Effects of Bitcoin …
139
run EVM and interact with each other. Solidity is an object-oriented language that can be used for create smart contracts in Ethereum. The Ethereum code is an open-source code and freely available on GitHub. Anyone can run a full node network of Ethereum and can use a wallet application also. The light version of Ethereum (LES) is also under progress. Public testnet blockchain is also available for testing an application. Proof of work algorithm used in Ethereum is Ethash. Transaction latency for Ethereum is faster than Bitcoin.
1.2 Neo: A Distributed Network for the Smart Economy NEO, previously known as Antshares, is considered as China’s Ethereum. The goal of the NEO is to create a network which fills the gap between digital and traditional assets. NEO blockchain is used to digitize the assets. It also provides an application platform for smart contracts. Each new smart contract of NEO called noncontract deployed on NEO network in a private storage area of its own and can only be accessed by itself. NEO platform has multiple functional capabilities named as NeoAsset, NeoID, etc., which opens the opportunity to users engaging in digital businesses [7]. Proof of stake is the process through which neoGAS tokens can be generated. GAS tokens will be provided by some exchanges to NEO holders. This is similar to dividend given to share holders. The smallest unit of NEO will always be 1 share which cannot be further divided into fractions. GAS tokens are used as fuel for the NEO chain. GAS tokens are generated over time, and after around 22 years, there will be 100 million GAS in circulation. GAS can be divided into smallest unit as 0.00000001. NEO Versus GAS NEO has two native tokens, the first one is NEO, and the other one is NeoGas (GAS). Both tokens have their own uses. The NEO tokens can be used to create blocks and manage the network. Once the user hold the NEO, he will be rewarded with GAS tokens. The GAS tokens make use of the NEO blockchain platform just like fuel that gives the power to the transactions in the system. Users can convert the NEO tokens into GAS using NEO to GAS calculator as shown in Fig. 1.
1.3 Cardano-Ada Cardano is the very first blockchain technology based on proof-of-stake mechanism. This is called a third-generation blockchain technology as the basic belief of the Cardano is based on scientific philosophy and peer-reviewed academic research. Cardano is unique from others as its goal is to achieve “High Assurance Code”. The reason behind this goal is to prevent happening of splitting the blockchain further (e.g. ETH-ETC). The token name of Cardano is Ada [8].
140
N. Malsa et al.
Fig. 1 NEO to GAS convertor
Cardano Team (The Cardano Foundation, IOHK, Emurgo) wanted to get rid of three major issues, namely scalability, interoperability and sustainability. To improve the scalability, one should take care of these features, namely transactions speed, throughput, network and scaling of data. Lack of throughput has been seen in the earlier technology, i.e. seven transactions per second in the Bitcoin and 10– 15 transactions per second in Ethereum which needs to be improved, and hence, the Cardano worked on it. The requirement for network resources increases as the transactions added. To overcome the problem, Cardano uses recursive inter-network architecture technique that divides the networks into different small networks to handle the communication traffic. The third feature data scaling can be handled by applying synergistically pruning, subscription, compression, partitioning, sidechains and sharding to reduce the amount of data [9]. The second pillar of Cardano is its interoperability feature. Crypto exchanges provide an interface between cryptos and banks. But the exchanges are not decentralized hence open to attack. To get rid of the problem of miscommunication between the world of legacy and world of crypto, risk-free solution was needed. The third and final pillar is sustainability. As per the study of the literature, it was toughest problem to solve. If someone wants some changes in the ecosystem, this required some grants and funds. The Cardano handles this by using its treasury feature.
1.4 EOS In the scenarios of today’s business, the technology which handles large and diverse users can only survive in the market. The blockchain technology is one of them. The EOS is such type of platform which is designed to allow scaling (horizontal as well
Blockchain Platforms and Interpreting the Effects of Bitcoin …
141
as vertical) of different DApps. The platform scales large number of transactions per seconds, eliminates charges/fees for transactions and also allows for trouble-free deployment and DApps. EOS uses delegated proof-of-stake algorithms for consensus mechanism. According to the algorithm, those who are holding tokens on a blockchain only have right to select block producers through a voting system. Moreover, any producer cannot add blocks on two forks with the same timestamp. Traditional distributed proof of stake (DPOS) is combined with the Byzantine fault tolerance.
1.5 TRON TRON blockchain-based platform was developed mainly for establishment of a pure decentralized network. TRON is also based on P2P network. Any user, who wants to register in TRON, can register without central authorization, which avoids double spending problem. It is one of the largest blockchain-based protocols that provides high scalability, high availability and high throughput for all DApps. The platform uses proof of work (PoW) consensus algorithm to make the application cryptographically secured, thereby preventing double spending issue. As the other existing cryptocurrencies like Bitcoin, Ethereum has low transaction rate and high transaction fees; the TRON was widespread adopted for creating DAPPs.
2 Literature Review The blockchain is a revolutionary technology. Chain consists of blocks, which contains several transactions, timestamp, hash and nonce. The technology has various real-world use cases such as medical record keeping, supply chain, digital Ids, sharing of data, gaming, digital voting, immutable data storage and tax regulation. As per the study [10, 11], classification of blockchain-enabled applications is as follows: financial applications, integrity verification, governance, Internet of things, data management, education, energy sector, supply chain management, health care management, etc. Various blockchain platforms are available for implementation. Out of them, the most prevalent platforms have been considered for the study. Five most prevalent platforms Ethereum, NEO, Cardano, EOS and TRON have been compared for the study. The study of paper by Vitalik Buterin shows that in Ethereum platform one can develop DApp and create smart contract for their application [12]. NEO white paper [7] summarizes that the main aim of NEO is to develop and automate the process to manage digitals assets. According to the study of EOS whitepaper [13], EOS blockchain platform can scale the DApps. Hence, in EOS, non-centralized applications can be easily developed and deployed. Study of Cardano-Ada paper [8]
142
N. Malsa et al.
concluded that this is a platform with good security features. Study of TRON paper [14] concluded that it can be built on the platform using smart contracts. The study of paper [15] presents a SWOT analysis of Bitcoin that illustrates the influence of Bitcoin pricing on current economy. The paper presents the association of the two cryptocurrencies, Ethereum and Bitcoin. Paper [16] presents the calculated correlation between volatility and return of stocks. Paper [17] presents Karl Pearson correlation coefficient method which has been used for calculating correlation between global warming and El-Nino, and a performance analysis has been done on the basis of this calculated correlation coefficient.
3 Methodology Data has been collected from various sites of cryptocurrency [18, 19]. Correlation coefficient has been calculated using Python (through Google Colab) [20].
4 Result and Discussion The calculated correlation coefficient between btc_open and ether_open is maximum. Hence, Ethereum price will be the most affected with the change of Bitcoin price. Calculated correlation coefficient for BTC_Open and NEO_Open is 0.759258 (refer Fig. 3), which is the highest one, represents BTC_open affetcted NEO_Open most. Calculated correlation coefficient for BTC_Open and Cardano_Open is 0.753005 (refer Fig. 4), which is the highest one, represents BTC_open affetcted Cardano_Open most. Calculated correlation coefficient for BTC_Open and EOS_Open is 0.629902 (refer Fig. 5), which is the highest one, represents BTC_open affetcted EOS_Open most. Calculated correlation coefficient for BTC_Open and TRON_Open is 0.529358 (refer Fig. 6), which is the highest one, represents BTC_open affetcted TRON_Open most. All the calculated coefficient values mentioned above are positive (range from 0.5 to 1.0) and represent that the two variables are highly correlated.
4.1 Ethereum The calculated correlation coefficients represent high correlation between BTCopen, BTC-high, BTC-low, BTC-close to Ethereum_Open, which signify that Ethereum_Open is highly affected by increase in their prices (refer Fig. 2).
Cardano-Ada Charles 2017 Hoskinson
EOS
TRON
3
4
5
Justin
Daniel Larimer
Da Hong Fei
NEO
2
2017
2019
2014
2014
Vitalik Buterin
Ethereum
1
Cryptocurrency, C++ smart contracts
Smart contracts Cryptocurrency, Solidity, Java Script
Smart contracts, dApps, customized wallet
15–20
12
TRONIX
EOS
15
3
Proof of Work (POW)
2000 TPS
3000 TPS
25 GTPS
Delegated
Distributed Proof of Stake (DPOS)
Proof of Stake (Ouroboros)
10,000 TPS Delegated Byzantine FaultTolerance (DBF]
15 TPS
Block Transaction Consensus release rate mechanism time (in s)
Cardano-Ada 20
Cryptocurrency, C#, VB.Net, F# NEO digital assets, /Java, Kotlin/ smart contracts Python
Ether
Languages used Currency for native implementation
Cryptocurrency, Solidity, digital assets, Serpent, LLL smart contracts
Kind of data stored
To improve Cryptocurrency, V. Haskell and scalability, smart contracts, Plutus interoperability governance and sustainability
NEO aims to convert traditional assets into digital ones using smart contracts
Run smart contracts
Founded Purpose
Founder
S. no. Blockchain platform
Table 1 Comparison amongst most prevalent platforms: Ethereum, NEO, Cardano-ADA, EOS, TRON
Blockchain Platforms and Interpreting the Effects of Bitcoin … 143
144
N. Malsa et al.
Fig. 2 Correlation amongst BTC-open, BTC-close BTC-low, BTC-high and Ethereum Open
Fig. 3 Correlation amongst BTC-open, BTC-close BTC-low, BTC-high and NEO-Open
Fig. 4 Correlation amongst BTC-open, BTC-close BTC-low, BTC-high and Cardano-Open
4.2 Neo The calculated correlation coefficients represent high correlation between BTC-open, BTC-high, BTC-low, BTC-close to NEO_Open, which signify that NEO_Open is highly affected by increase in their prices (refer Fig. 3).
Blockchain Platforms and Interpreting the Effects of Bitcoin …
145
Fig. 5 Correlation amongst BTC-open, BTC-close BTC-low, BTC-high and EOS-Open
Fig. 6 Correlation amongst BTC-open, BTC-close BTC-low, BTC-high and TRON-Open
4.3 Cardano-Ada The calculated correlation coefficients represent high correlation between BTCopen, BTC-high, BTC-low, BTC-close to Cardano_Open, which signify that Cardano_Open is highly affected by increase in their prices (refer Fig. 4).
4.4 EOS The calculated correlation coefficients represent high correlation between BTC-open, BTC-high, BTC-low, BTC-close to EOS_Open, which signify that EOS_Open is highly affected by increase in their prices (refer Fig. 5).
146
N. Malsa et al.
4.5 TRON The calculated correlation coefficients represent high correlation between BTC-open, BTC-high, BTC-low, BTC-close to TRON_Open, which signify that TRON_Open is highly affected by increase in their prices (refer Fig. 6).
5 Conclusion Calculated correlation coefficient for btc_open and ether_open is 0.882659, which is the highest one, that represents btc_open affected ether_open most, likewise for btc_open and neo_open is 0.759258, for btc_open and neo_open is 0.759258, for btc_open and cardano_open is 0.753005, btc_open and eos_open is 0.629902 and for btc_open and tron_open is 0.529358. All the calculated coefficient values mentioned above are positive (range from 0.5 to 1.0) and represent that the two variables are highly correlated. The correlation coefficient for btc_open and ether_open is 0.882659, which is maximum, interprets that the change in pricing of Bitcoin affecting ether pricing most. The calculated coefficient can be highly significant for making investments in cryptocurrency. Hence, investor can see the results and accordingly invest for higher returns.
References 1. Experian Report. https://www.experian.com/assets/decision-analytics/reports/global-fraudreport-2018.pdf 2. Nakamoto, S.: Bitcoin: a peer-to-peer electronic cash system (2008) [Whitepaper]. https://bit coin.org/bitcoin.pdf 3. Williams, S.: 20 real-world uses for blockchain technology. Motley Fool (2018) [Online]. Available at: https://www.fool.com/investing/2018/04/11/20-real-world-uses-for-blockchaintechnology.aspx 4. CoinMarketCap website. Available at: https://coinmarketcap.com 5. CoinGecko website. Available at: https://www.coingecko.com/en. Last accessed 12 Nov 2019 6. Coinbase website. Available at: https://www.coinbase.com/. Last accessed 17 May 2020 7. Hongfei, D., Zhang, E.: NEO: a distributed network for the smart economy (2015) [Whitepaper]. Available at: https://docs.neo.org/docs/en-us/basic/whitepaper.html 8. Kiayias, A., Russell, A., David, B., Oliynykov, R: Ouroboros: a provably secure proof-of-stake blockchain protocol. In: Katz, J., Shacham, H. (eds.) Advances in Cryptology—CRYPTO 2017, CRYPTO 2017, LNCS, vol. 10401, pp. 357–388. Springer, Cham (2017) 9. https://www.circle.com/marketing/pdfs/research/circle-research-cardano.pdf 10. Casino, F., Dasaklis, T.K., Patsakis, C.: A systematic literature review of blockchain based applications: current status, classification and open issues. Telematics Inform. 36, 55–81 (2019) 11. Chen, G., Xu, B., Lu, M., Chen, N.: Exploring blockchain technology and its potential applications for education. Smart Learn. Environ. 5, 1–10 (2018) 12. Buterin, V.: A next generation smart contract & decentralized application platform (2014) [Whitepaper]. Available at: https://cryptorating.eu/whitepapers/Ethereum/Ethereum_white_ paper.pdf
Blockchain Platforms and Interpreting the Effects of Bitcoin …
147
13. Hintzman, Z.: Comparing Blockchain Implementations-NCTA Technical Papers. In: Fal Technical Forum SCTE-ISBE, NCTA, CABLELABS (2017) 14. TRON: Advanced decentralized blockchain platform (2018) [Whitepaper]. Available at: https:// 888tron.com/wp/ 15. DeVries, P.D.: An analysis of cryptocurrency, bitcoin, and the future. Int. J. Bus. Manage. Commer. 1(2), 1–9 (2016) 16. Bakar, N.A., Rosbi, S.: Pearson product moment correlation diagnostics between two types of crypto-currencies: a case study of Bitcoin and Ethereum. Int. J. Adv. Sci. Res. Eng. 4(12), 40–51 (2018) 17. Nitima, M., Jyoti, G., Nisha, B.: Prediction of El-Nino Year and performance analysis on the calculated correlation coefficients. In: Kapur, P., Klochkov, Y., Verma, A., Singh, G. (eds.) System Performance and Management Analytics. Asset Analytics (Performance and Safety Management), pp. 167–178. Springer, Singapore (2019) 18. https://coinmarketcap.com/. Last accessed 24 May 2020 19. https://www.coingecko.com/en last accessed 2020/05/24 20. https://docs.python.org/3/tutorial/. Last accessed 24 May 2020
A Design of a Secured E-voting System Framework for Poll-Site Voting in Ghana Samuel Agbesi
Abstract There is a lack of trust and transparency in the current manual voting system in Ghana, and there is the belief that e-voting technology can address the trust issues. However, an e-voting technology is perceived to have security vulnerabilities which can also affect the integrity of an election. This study aimed to design an e-voting system framework that can bring trust and transparency in the electoral process. In this paper, the researcher reviewed existing e-voting designs and their weakness. The study further examined the weakness in the manual voting system and identified the requirements a secure e-voting system must satisfy. The proposed framework was based on a two-tier architecture that was integrated with blockchain technology to provide a transparent means of processing and storing elections results. The design framework was made up of an authentication system, voting system, tallying system, and vote recording system that runs on a local intranet at polling stations. The design used two-factor authentication which includes voters ID and onetime code. The design also allowed voters to verify how their votes were recorded using a blockchain system. The proposed e-voting system was able to address the security vulnerabilities that impact on the integrity and trustworthiness of the electoral process. Keywords Elections · Secure e-voting · Poll site · Blockchain · Trust · Ghana
1 Introduction Elections in Ghana and Africa are entangled with various challenges that impact the integrity and trustworthiness of the electoral process [1–3]. The lack of trust and transparency of the manual paper voting system used in elections have led to the clamor for the application of technology, such as an e-voting system, in Ghana’s electoral process and Africa in general [4]. The belief is that the introduction of S. Agbesi (B) Electronic Systems Department, Aalborg University, Copenhagen, Denmark e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_14
149
150
S. Agbesi
e-voting in the electoral process will bring some sort of integrity and transparency in the current process to inspire confidence and trust among stakeholders [1, 4–6]. Countries such as Estonia, UK, and Switzerland have used e-voting in elections on a trial basis [4, 7], and even though some of these countries have suspended its use, a country like Estonia is still using it [4, 7]. But when it comes to developing African countries, there have been several challenges in the attempt to adopt such technology [8–10], which creates a gap in the application of technology in the electoral process. There are different opinions as to the security and reliability of the introduction of an e-voting in elections [7, 11]. But in this study, the researcher supports the argument of Wiseman [7]. Wiseman argues that the benefit that can be derived from the introduction of e-voting system far outweigh the perceived risk [7]. With the current recurring challenges of the electoral process in Ghana and Africa that has resulted in several election violence in many other African countries [12, 13], the introduction of an e-voting will help in curbing some of these challenges. Hence, this study examined the current electoral system and its associated challenges and conceptualized a secure and transparent e-voting system architecture that can bring integrity and trust in Ghana’s electoral system. The contribution of this study was an e-voting framework that allows for citizens to vote electronically at a poll site and store votes transparently and securely.
2 Related Works 2.1 Introduction This section reviewed previous work on e-voting system design and evaluates their performance in satisfying the requirement for a secure and transparent system.
2.2 Evaluation of Existing E-voting System Designs There have been several e-voting system architectures that have been proposed by different authors which are based on different design architecture. Osho et al. [5] proposed a cloud-based e-voting system that handles both “online and offline voting” to allow for “online poll-site” and “remote e-voting” [5]. Another e-voting system that was based on a two-tier architecture was proposed by Arthur et al. [1]. The system is based on two interconnect sub-systems, a front-end interface that is used to capture votes and a backend server that processes and stores votes [1]. In this proposed design, the front-end interface has to establish a connection with the server before any transaction can be initiated [1]. Kurbatov et al. [14] e-voting system design was based on a ring signature [14]. The importance of the ring signature algorithm is to provide voters anonymity during
A Design of a Secured E-voting System Framework …
151
vote cast, and the signature is generated using the public keys of all the voters or users involved in the group [14]. The proposed architecture is made up of “validators” which are nodes in charge of transaction processing, “user identity system” responsible for validating users and granting them access to the system, and “end users” that initiate transactions in the form of casing votes [14]. Furthermore, recent studies [15–17] have introduced blockchain technology to the design of e-voting architecture. For example, in the work of Agbesi and Asante [15], they proposed an e-voting architecture for recording elections results on a blockchain to address the issue of vote transmission during elections [15]. The authors were of the view that once the results are stored in the blockchain and are available to all stakeholders, it will bring transparency and trust in the way elections results are stored and transmitted [15]. Williams and Agbesi [17] came out with three scenarios of designing an e-voting technology based on blockchain suitable for Ghana’s elections [17]. Scenario one (1) involves a full-fledged online voting which is suitable for remote voting [17]. In scenario two (2), voters can only vote at a designated polling station where voters are biometrically verified before they cast their votes on a blockchain-based e-voting system [17]. Scenario three (3), allows only voting results to be stored on a blockchain system, but the main voting and counting is done manually [17]. Some of the proposed designs reviewed are promising but has some limitation that may be difficult to be adopted and used in the Ghanaian context. Most of the studies [16, 18] also support only online voting which may be difficult to use in Ghana, taking into consideration Internet infrastructure. Similarly, the proposed design of Arthur et al. [1] needs a constant and stable Internet connection for every single transaction, which in the context of Ghana will be a challenge because several polling areas cannot guarantee stable Internet connectivity. Furthermore, the proposed design of Osho et al. and Kurbatov et al. [5, 14] did not look at the transparency of the storage of the voting results. In this study, the aim was to design an e-voting architecture that provides transparency, auditability, confidentiality, and integrity in the voting process. The proposed design was a poll-site e-voting technology that was integrated with a blockchain system to record voting results at the end of polls. In Sect. 3, the researcher reviewed the existing manual voting system and provides a summary of its challenges.
3 The Current Architecture of Voting System in Ghana 3.1 Paper Ballot The current voting system being used in Ghana elections consists of a verification system and a paper ballot system. The verification system is a biometric using fingerprints as a means of authenticating and authorization [17]. During each election year, citizens who have attained the age of 18 and those who for some reason are not in the voter register are captured, and their bio-data added to the biometric verification
152
S. Agbesi
system (BVS) [17]. On the day of elections, voters are verified using a biometric verification device (BVD) and go through all the prescribed formalities before they can receive ballot papers (presidential and parliamentary ballots) for voting cast [1, 17]. Once the voter has completed the voting process, he or she must vacate the voting area. The official opening and closing time are from 7:00 am and to 5:00 pm, and when the voting has ended, the ballots are sorted and counted in the full glare of voters and party agents by the polling station officer after all necessary check has been made [17]. Once the counting is completed, the results are recorded onto the polling station declaration sheet and signed by all party agents present. The results declaration sheet from the polling station is sent to the constituency collation centers. The mode of sending the polling station declaring form is through physical transportation [17]. After the constituency centers have received the results of all the polling stations under its jurisdiction, the results are collated and reentered onto a constituency result declaration sheet which is then faxed to the National Collation Center [15].
3.2 Challenges in the Current System The current system gives room for multiple voting. The current system is also saddled with voting errors in terms of spoilt ballot papers due to wrong thumb-printing [3]. Even though the percentage of spoilt ballot papers decreased in the 2016 December elections in Ghana, the percentage was still significant [19]. Paper ballot counts are also prone to errors [19]. Polling station officers often make errors in counting either accidentally or deliberately, and several recounts requests by losing candidates have yielded a different result. In the current system, there are several points of failure and attack that can impact the integrity of the election outcome. And this can also lead to violence and chaos. In this study, the aim is to come out with a design framework that can address the challenges enumerated in Sect. 3.2. The proposed e-voting system design, which was discussed in Sect. 4, was a poll-site e-voting technology which used a client–server architecture integrated with blockchain technology.
4 Proposed E-voting System Architecture 4.1 Introduction This section discussed the proposed e-voting system framework. To come out with a framework that can address the challenges in the voting system, the section examined the main requirements of the proposed system and further provides an overview of the underlining blockchain technology that will support the proposed system.
A Design of a Secured E-voting System Framework …
153
4.2 Requirements of the Proposed System There are basic requirements any voting system must satisfy, which include but not limited to “auditability”, “integrity”, “transparency”, “verifiability”, “accuracy”, “accessibility”, and “secrecy” [20–22]. In this design architecture, the researcher addressed the issue of transparency by ensuring that results collated at a polling station are stored and transmitted with the approval of all stakeholders. The design also addressed voter authentication by looking into the most appropriate way to authenticate and verify users to prevent multiple voting. The system was also designed to allow the user to verify how their votes were stored and also prevent others from tracing who voted for which candidate.
4.3 Overview of Blockchain Technologies As has been discussed in the introduction section, the proposed design will use blockchain technology to store results collated from the various polling station transparently and securely. Blockchain is a series of blocks that are linked together using a cryptographic hash algorithm that makes it difficult for the data in the blocks to be altered [15, 23, 24]. A block contains a series of transactions [24], and in the context of elections, we can say the individual vote cast. One way to ensure transparency, integrity, and trust in a blockchain network is to distribute the blockchain data among the nodes on a blockchain, such that for any data change to be accepted on the blockchain network, more than 50% of the node must agree for the changes to be committed [15, 17]. Blockchain technology uses public-key cryptography for the secure transmission of data on the blockchain network and digital signature for message authentication [15].
4.4 Proposed Poll-Site E-voting System The proposed design, as shown in Fig. 1, is based on a “2-tier architecture” with a front-end that handles user interactions and a backend that processes and responds to a user request. The e-voting system will run on an intranet local to a specific polling station with a wired connection. The basis for the intranet and a wired connection is to secure the system from outside attack, and the only way such an attack can be successful is for an attacker to have a physical connection to the local intranet. The client– server system, as shown in Fig. 1, can establish an Internet connection to a public blockchain to store polling station results. The main component of the system will consist of network architecture, a database system, and blockchain infrastructure. The functions of each of these components are discussed below.
154
S. Agbesi
Fig. 1 Proposed poll-site e-voting system framework
Network Architecture. Each polling station will have its local area network using a star network protocol. The local network will have a dedicated server and at least two (2) client computers. The server will hold the election database, and the data in the database will be restricted to a specific constituency. The e-voting application will be stored on the client computers which will be used for voting by voters. Database System. The election database will consist of a voter register that will hold all voters within a constituency, a presidential table that will store information about the presidential candidates, the parliamentary table that will store information on the parliamentary candidates, and a voting ID table that will store all the one-time codes to be used by voters to log into the voting interface. The database will also
A Design of a Secured E-voting System Framework …
155
Fig. 2 “Blockchain-based vote recording” system [15]
include transaction table such as detail votes. The vote register table will pull data from the national register which will be used to authenticate voters for a polling area. Blockchain Network. The proposed design will also have a blockchain voting recording system, and in this study, the research adopted the proposed design by [15]. As shown in Fig. 2, the design consists of a light node and a full node. Polling station officials will interact with the light nodes to record the results of a particular polling station after the vote has closed, while the full nodes will be for political parties and EC nodes that validate incoming transactions [15].
4.5 E-voting Process The voting process consists of the voter authentication phase, the voting phase, and vote recording. A description of these phases is discussed below. Authentication Phase. In this phase, the voter presents his or her voter ID, and he or she is verified from the “Voter register” table. Once the voter has been verified, the system will generate a one-time code that will be linked with the voter’s ID, and this will be encrypted and stored in the database. If the voter cannot be verified using the biometric device, but the voters’ name is in the voter reference list, then a manual verification will be performed, and the electoral officer will manually generate the one-time code for the voter. Voting Phase. After the voter has been verified and has received his or her onetime code, the voter then goes to the client computer (voting screen) to cast the vote. The voter must enter his voter ID and the one-time code for authentication and authorization. When the voter ID and the one-time code are submitted to the server, it will perform the following checks: It will check if the voter ID exists. If the voter ID exists, the system will also check if the voter ID matches the one-time code. If there is a match, it will also check if the status is “UNUSED”. If all these conditions
156
S. Agbesi
are satisfied, the voter will be prompted with the voting screen with the candidates, and he/she can submit their votes. Once the vote is submitted backend system handle all the process and update request. The system will also have the functionality of displaying a flash message on the screen that will last for 30 s which will display how the votes were recorded for both presidential and parliamentary. This will serve as a form of confirmation that the votes have been recorded accurately in the database. Vote Recording Phase. The vote recording phase is when the polling station results are transferred to the blockchain e-voting recording system. The data that will constitute the transaction to be stored on the blockchain will be the polling station ID, the presidential results, parliamentary results, and party agent verification codes [15]. Before the transaction is committed, all party agents present have to validate the inputted results and confirm it by appending their verification code [15].
5 Discussion 5.1 Introduction An attempt has been made in this proposed framework to address the requirements of authentication, accuracy, integrity, security, and transparency. In this section, the researcher evaluates the proposed system in terms of these requirements.
5.2 Authentication The proposed design addresses user authentication requirements. Users go through a two-step verification process before they can access the voting interface. The first step is through the biometric verification step. The second step is the use of voter ID and the one-time code to log in to the voting application. Without going through this multiple verification processes, it will be impossible to have access to the voting application to cast a vote.
5.3 Accuracy The proposed system has inbuilt capabilities to prevent modification of results and to record votes accurately into the database. With the use of blockchain technology to store polling station results, it will be difficult for the results store to be altered in future due to the immutability properties of blockchain technology [15, 24]. When the results from the polling stations are stored in a new block, it is secure using the
A Design of a Secured E-voting System Framework …
157
inbuilt cryptographic algorithm that makes it difficult and complex for the content of the block to be modified [15, 23].
5.4 Security and Integrity The proposed system also addresses voter anonymity. The system is designed in a way that voters’ identity is stripped from the transactions before it is stored in the database, and it will be impossible for anyone to trace a vote to a voter ID. Similarly, the design of the system makes it complex for a voter to vote more than once because of the one time. To secure the voting system from hacking and attack the design framework was based on a poll-site e-voting system running on a local intranet. And this makes it difficult for an outside attack. Also, since the polling station results are stored on a blockchain network, it makes it difficult for a hacker to hack the results stored on all the nodes that form the blockchain.
5.5 Transparency The design framework brings transparency to the election process. First and foremost, voters can verify if the candidate they voted for received the votes, through the “flash message” display function. Secondly, polling station results will be stored on public blockchain [25, 26], and all stakeholders can log into the blockchain system to verify and follow all other votes coming from different polling stations.
6 Conclusion The main aim of this study was to address the challenges of the manual voting system. This study conceptualized a secure e-voting framework that satisfies the basic requirements of a secure system. The design framework provides a secure way of casting voting that brings transparency, integrity, and verifiability into the electoral process. One limitation of this study is the inability for people to vote remotely, and this is due to Internet resource unavailability in various parts of the country and the lack of voting devices among voters. Future studies must look at ways to address this issue of the digital divide and remote Internet voting systems.
158
S. Agbesi
References 1. Arthur, J.K., Adu-manu, K.S.: A trustworthy architectural framework for the administration of E-voting: the case of Ghana. Int. J. Comput. Sci. Issues 11, 97–102 (2014) 2. Ahmad, S., Abdullah, S.A.J., Arshad, R.B.: Issues and challenges of transition to E-voting technology in Nigeria. Public Policy Adm. Res. 5, 95–102 (2015) 3. Agbesi, S.: Adoption of E-voting system to enhance the electoral process in developing countries. In: Evaluating Media Richness in Organizational Learning, pp. 262–273. IGI Global (2018) 4. Agbesi, S.: Institutional drivers of internet voting adoption in Ghana: a qualitative exploratory studies. Nord. Balt. J. Inf. Commun. Technol. 1, 53–76 (2020). https://doi.org/10.13052/nbjict 1902-097X.2020.003 5. Osho, O.L., Abdullahi, M.B., Osho, O.: Framework for an E-voting system applicable in developing economies. Int. J. Inf. Eng. Electron. Bus. 8, 9–21 (2016). https://doi.org/10.5815/ ijieeb.2016.06.02 6. Agbesi, S.: Political parties and internet voting system adoption in Ghana. In: International Conference on Electronic Government and the Information Systems Perspective EGOVIS 2020, pp. 174–186. Springer, Cham (2020) 7. Wiseman, R.: Internet voting: If not now, when? J. Japan Soc. Fuzzy Theory Intell. Inf. 29, 100–100 (2017). https://doi.org/10.3156/jsoft.29.3_100_1 8. Balise, J.: BCP EVMs court challenge withdrawn. https://www.sundaystandard.info/bcp-evmscourt-challenge-withdrawn/ 9. Petesch, C.: Voting machines raise worries in Congo ahead of elections. https://apnews.com/ 1764856db1b74c7790a05a65d7a9c5b0/Voting-machines-raise-worries-in-Congo-ahead-ofelections 10. Ross, A., Lewis, D.: In Congo, voting machines raise suspicions among president’s foes. https://www.reuters.com/article/us-congo-election/in-congo-voting-machines-raise-sus picions-among-presidents-foes-idUSKCN1GL13W 11. Simons, B., Jones, D.W.: Internet voting in the U.S. Commun. ACM. 55, 68–77 (2012). https:// doi.org/10.1145/2347736.2347754 12. Atuobi, S.: Election-related violence in Africa. Confl. Trends. 1, 10–15 (2008) 13. Isma’ila, Y., Othman, Z.: Challenges of electoral malpractices on democratic consolidation in Nigeria’s fourth republic. Int. Rev. Manage. Mark. (2016). https://doi.org/10.15405/epsbs. 2016.08.42 14. Kurbatov, O., Kravchenko, P., Poluyanenko, N., Shapoval, O., Kuznetsova, T.: Using ring signatures for an anonymous E-voting system. In: 2019 IEEE International Conference on Advance Trends in Information Theory, ATIT 2019, pp. 187–190 (2019). https://doi.org/10. 1109/ATIT49449.2019.9030447 15. Agbesi, S., Asante, G.: Electronic voting recording system based on blockchain technology. In: 2019 12th CMI Conference on Cybersecurity and Privacy, CMI 2019, pp. 1–8. IEEE (2019) 16. Chaieb, M., Yousfi, S., Lafourcade, P., Robbana, R.: Verify-your-vote: a verifiable blockchainbased online voting protocol. In: Lecture Notes in Business Information Processing (2019) 17. Williams, I., Agbesi, S.: Blockchain, trust and elections: a proof of concept for the Ghanaian National Elections. In: Handbook on ICT in Developing Countries, vol. 2. River Publishers (2019) 18. Yi, H.: Securing E-voting based on blockchain in P2P network. Eurasip J. Wirel. Commun. Netw. (2019). https://doi.org/10.1186/s13638-019-1473-6 19. EU EOM: EU EOM Ghana Presidential and Parliamentary Elections 2016 Final Report (2016) 20. Alaguvel, R., Gnanavel, G., Jagadhambal, K.: Biometrics using electronic voting system with embedded security. Int. J. Adv. Res. Comput. Eng. Technol. (2013) 21. Jacobs, B., Pieters, W.: Electronic voting in the Netherlands: from early adoption to early abolishment. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2009)
A Design of a Secured E-voting System Framework …
159
22. Anane, R., Freeland, R., Theodoropoulos, G.: E-voting requirements and implementation. In: Proceedings—The 9th IEEE International Conference on E-Commerce Technology; The 4th IEEE International Conference on Enterprise Computing, E-Commerce and E-Services, CEC/EEE 2007 (2007) 23. Hanifatunnisa, R., Rahardjo, B.: Blockchain based e-voting recording system design. In: Proceeding 2017 11th International Conferences on Telecommunication Systems Services and Applications. TSSA 2017, 2018-Janua, pp. 1–6 (2018). https://doi.org/10.1109/TSSA.2017. 8272896 24. Nakamoto, S.: Bitcoin: a peer-to-peer electronic cash system, pp. 1–9 (2008) 25. Hiran, K.K., Doshi, R., Rathi, R.: Security & privacy issues of cloud & grid computing networks. Int. J. Comput. Sci. Appl. 4, 83–91 (2014). https://doi.org/10.5121/ijcsa.2014.4108 26. Hiran, K.K., Doshi, R., Fagbola, T., Mahrishi, M.: Cloud computing: master the concepts, architecture and applications with real-world examples and case studies. Bpb Publications (2019)
Pattern Matching Using Face Recognition System Sandeep Kumar Srivastava, Sandhya Katiyar, and Sanjay Kumar
Abstract In last few years, face recognition system is using very large scale on identifying the user. It is an application to perform large number of machine-based visual task and accident avoiding system used by 3D models like appearance from the different angles using edge detections. In this paper, a technique is used to reduce the computational cost and increase the accuracy of the facial recognition system by integrating it with the iris recognition. Iris and face images are manipulated by used Open CV and Python tool. This algorithm will compare all the histograms and produce best label and confidence which gives a better face recognition. Keywords Face recognition system · Image recognition system · Irish reader
1 Introduction Image recognition is a basically used for autonomous security purposes. They are very wide range to perform these task such as performing image content and guiding autonomous robot, self-driven cars and accident avoidance system where as human brain easily recognize the task but computers have to difficult recognize the task. In this, software required for deep machine learning based on biometric applications. We can see three classifiers to identify the pattern. The classifiers are hue histogram, mouth detection, and eye detection [1]. Image recognition algorithm is used for comparative 3D model and the different angle using edge detection fiction is a prelabeled pictures with guided computer learning. Facial recognition is a technique that used to identifying authenticated person [2]. There are many methods in facial recognition. They basically work by
S. K. Srivastava · S. Katiyar (B) Department of Information Technology, Galgotias College of Engineering & Technology, Greater Noida, India S. Kumar School of Computing Science & Engineering, Galgotias University, Greater Noida, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_15
161
162
S. K. Srivastava et al.
Fig. 1 Process cycle of the program
using biometric artificial intelligence system, which identify a person uniquely by analyzing a pattern based on the official texture or shape (Fig. 1). There are three weak classifiers based on skin hue histogram matching, iris detection, and mouth detection [3]. The biggest advantage of our facial recognizer is its accuracy and speed since it has both face and iris features used for identification. Iris identification alone is estimated to be more accurate than finger printing while there are some drawbacks in fingerprint as it is may be damaged, although face recognition system is more secure just because of it is naturally protected by the cornea and its pattern frame seems to remain unchanged for decades [4]. In fingerprint scanner it comes contact to the finger directly; therefore, it should be always clean, while in iris scanner, scan can be performed safely at some distance from the eye. With this augmentation of iris and face, the identification of a subject becomes quicker when combined than when done alone. It is more reliable when augmented together since iris and facial alone can be easily beaten.
2 Literature Survey Dung et al. [5] face detection played very important role in application such as video surveillance and human–computer interface. It is basically defined as feature-based image recognition. Shailypandey et al. [6] according to her there are two methods of facial recognition. First is eigenface method and other fisher face method. Pawar et al. [7] said that human face has many different expressions like eyeglass, mustache, beard, and any others. Fisher linear discriminate (FLD) is a class-specific method that differentiates between facial images into classes and also defines the distance between classes and intra-classes so as to produce better classification. Yadav et al. [8] told that we have multiple face tracking system. A face consists of two phases, first
Pattern Matching Using Face Recognition System
163
phase is face detection, where this process takes place very rapidly in humans, except under conditions where the object is located at a short distance away, and the next is the introduction, which recognizes a face as individuals. According to Weng et al. [9] both security feature text base word and graphical password include face recognition for detect the face but it is second process. According to him, an image has robust point matched with partial face recognition. Shanmugavadidu et al. [8] according to him face detection is easily identified by using face geometry. Sharifara et al. said [3], a basic structure of human face detection. It is defined by neural network and HAAR featured-based cascade classifier in face detection. Quanyou et al. [10]. Zhai told that, face pattern matching is done by corner verifying. It basically structured way of matching of face pattern. Zhai et al. [11] according to him face recognition is defined by the face geometry pattern. It is quantization of geometry pattern.
3 Related Work Kesäniemi [12]: According to this introduction, automatic biometric system is a very important part of last few years. It is basically referring the science of analyzing the psychological and behavioral characteristics of security purposes. There are so many places where we use biometric applications like government sectors, forensic and commercial areas, such as biometric attendance, ratina recognition and iris recognition. Wang [13]: According to him iris is a thin color circular diaphragm which can be extracted between the cornea and lens of human eyes. Position of eyes is a bounded between the pupil and sclera on their surrounding and store the textual information. The pattern of iris is basically third month of gestation and five months of completed structural and their pigmentation can be continuing until the two years old age. 1.
Hough Circle Transform
The circle Hough transform (CHT) is the main feature to detecting circles in iris. It is basically used for finding the imperfect image inputs. Basically, circle is produced by the parameter to identifying or verifying the image candidate circle has to be voting in the Hough parameter space, and then select the local maxima is called accumulator matrix. A circle is defined in two-dimensional as follows In two-dimensional, (a, b) is the center of the circle, and r is the radius. If a two-dimensional value (x, y) is fixed, then result parameters found according to (1). These parameters would be a three-dimensional (a, b, r), and then all parameters that feasible of all the given condition (x, y) that would be lie on the surface. Apex is at (x, y, 0) that is lies on the three-dimensional space, the circle parameters can be intersected by the many surface which is defined on the two-dimensional circle. This is basically done by the two parts; first part is to fix of radius and finds the optimal center of two-dimensional parameter space and the second stage is defined
164
S. K. Srivastava et al.
as the optimal radius and if we talk about one dimensional it is an optimal parameters space. 2.
Binary Image
A binary image is also called as a digital image that has to classify into two possible values for each pixel. Alternatively, we can say that binary image can be only used in black and white. The black color is showing off state as well as white color is showing on state. Basically, object is belonging to the foreground color and rest of the image is belongs to the background color. In the context of document-scanning industry, they referred this term as bi tonal image or two-level image. It means each single pixel store a bit information like 0 or 1 the names referred as black and white, B&W, monochrome.
4 Proposed Methodology The purpose of our paper is basically making a reliable and reduces the computation cost with the higher accuracy of face recognition system by integrating it the iris recognition. (a)
Iris Recognition
Iris technologies capture the unique feature of iris in the human eye for identification. The idea is appeared from James bond film which is using iris pattern as a method recognized as an individual but still remains science and fiction and conjunction. (b)
Haar Cascade Classifier
Haar cascade classifier is using for the comparison between LBHBR face models. It is pretrained machine learning model which has been trained to recognize number of faces. Initially, we need to positive image and negative image. Finding the image by the subtraction of pixel value from the white rectangle to sum of pixel value under the black rectangle. These pixel lies on the images. (c)
Local Binary Pattern Histogram
High dimensional object that has a lot of feature in different spaces, contains about 24 × 24 that is almost 147,456 input spaces. We know that high dimensionality pattern is may be bad. In that case, lower dimensional subspace is identified. eigenfaces approaches to maximize the total scatter which is based on the problem just because of component maximum are not necessarily useful for classifier (Figs. 2 and 3). LBC(xc , yc ,) =
p p=0
2 p i p − ic
Pattern Matching Using Face Recognition System
165
Fig. 2 Haar cascade image classifier
Fig. 3 Example to show working of algorithm
Binary pattern (LBP) operator is given as: LBC(xc , yc ,) =
p
2 p i p − ic
p=0
where x c, yc are central pixel and ic is the intensity of pixel. in define the intensity of the neighbor pixel. Now, function defined as: S(x) = S
1 if x ≥ 0 0 else
(1)
Here, the above equation gives very accurate images and result in the form of texture classification. After applying the operator, it is found that fixed neighborhoods are not encoding efficiently; therefore, extended operators are applied to calculate variable neighborhoods. The idea is basically aligning an arbitrary number of neighbors on the circle with a multiple radius, which shows the capture the following of neighborhood at the given. Point xc , yc the position of the neighbor x p , y p , p ∈ P. It can be calculated
166
S. K. Srivastava et al.
Fig. 4 Different contours being captured
by the given formula (Fig. 4). xp = yp =
xc + R cos 2π p p yc − R sin 2π p p
where R P
is the radius of the circle and is the number of sample points.
If the point coordinate does not work, it gets interpolated. In computer science, there are much more clever interpolation schemes, and the OpenCV interpolation is called bi-linear interpolation. As discussed by the definition of the local binary operator is robust against monotonic gray scale transformations. We can simply verify the image by the LBP operator as given below (Fig. 5).
5 Experiment and Result 1.
Histograms
An image histogram is representation of tonal distribution of a digital image that means it can be graphical representation of digital image. Each tonal have own pixel value. By the feature of histogram, it is a special type of histogram that viewer can be judged the entire tonal distribution. In modern era, everyone has digital camera. It is a feature with that all have the digital image. Most of the photographer aids to distribute the tonal capture by the photographer weather an image lost or blackout shadow. It is not very useful for raw images if we are using raw image format, as dynamic images are distributing so it may be the raw file format. A histogram is a graph which is plotted to show the frequency distribution data that underlying in the image shape. It a set of data that
Pattern Matching Using Face Recognition System
167
Fig. 5 Transformed image
allows the inspection to verify the data for its underlying distribution (e.g., normal distribution). 2.
Choosing the Correct Bin Width
The thumb rules define the sequence of pattern. There is no right or wrong as to how wide should be in bin. We need to take care of image its neither to large nor too small. It is entitled that the histogram feature that we produced earlier: the some following histograms uses the same data, and it can be either smaller or larger in bin. As shown given below (Fig. 6): ⎞⎛ ⎞ 1−y f (0, 0) f (0, 1) ⎠ ⎝ f (x, y) ≈ [1 − x, x] ⎠⎝ f (1, 0) f (1, 1) y ⎛
The histogram feature has been applied that we can see on the bin. That shows the width is too small because it contains too much individual data that does not allow the underlying pattern that is based on the frequency distribution of the data. It is easily seen in given examples. At the end of this scale, its shows in the diagram, where we found the bins are too large and again, we are unable to fetch the exact data that is underlying trend in the data (Figs. 7, 8 and 9; Tables 1 and 2). Above data was gathered by testing against the trained model. The data shown in the table is used to find the deviation of the same person from different angles and positions.
168
Fig. 6 Bins comparison Fig. 7 Image of sunflower
Fig. 8 Images histogram
S. K. Srivastava et al.
Pattern Matching Using Face Recognition System
169
Fig. 9 Example image on how does LBPH algorithm work Table 1 Confidence levels for different images
S. no.
Subject
Confidence
1
23.02
2
81.65
3
63.85
4
85.92
5
74.33
6
42.72
170 Table 2 Results of classifier on same subject
S. K. Srivastava et al. No.
Confidence levels
1
40.91735
2
41.24723
3
42.87689
4
43.11156
5
40.45581
6
44.48900
7
42.35040
8
42.36076
6 Conclusion Although the software is quite capable in adapting to low resolution cameras, but higher resolution cameras will enable a better and more a reliable dataset and, hence, more secured biometric system. Software is quite easy to work with and can run on limited memory and constraint environments. By experimenting the values set for combining the confidence level of face and iris is taken as around 43 (tested with 1 MP camera).
References 1. Cuimei, L., Zhiliang, Q., Nan, J., Jianhua, W.: Human face detection algorithm via Haar cascade classifier combined with three additional classifiers. In: 2017 13th IEEE International Conference on Electronic Measurement & Instruments (ICEMI), Yangzhou, 2017, pp. 483–487 2. Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981). https://doi.org/10.1145/358669.358692 3. Sharifara, A., Mohd Rahim, M.S., Anisi, Y.: A general review of human face detection including a study of neural networks and Haar feature-based cascade classifier in face detection. In: 2014 International Symposium on Biometrics and Security Technologies (ISBAST), Kuala Lumpur, pp. 73–78 (2014) 4. Zhao, Q., Zhang, S.: A face detection method based on corner verifying. In: 2011 International Conference on Computer Science and Service System (CSSS), Nanjing, pp. 2854–2857 (2011) 5. Dung, L., Huang, C., Wu, Y.: Implementation of RANSAC algorithm for feature-based image registration. J. Comput. Commun. 1, 46–50 (2013). https://doi.org/10.4236/jcc.2013.16009 6. Pandey, S., et al. (IJCSIT) Int. J. Comput. Sci. Inf. Technol. 5(3), 4111–4117 (2014) 7. Pawar, K.B., Mirajkar, F., Biradar, V., Fatima, R.: A novel practice for face classification. In: 2017 International Conference on Current Trends in Computer, Electrical, Electronics and Communication (CTCEEC), Mysore, pp. 822–825 (2017) 8. Yadav, P.C., Singh, H.V., Patel, A.K., Singh, A.: A comparative analysis of different facial action tracking models and techniques. In: 2016 International Conference on Emerging Trends in Electrical Electronics & Sustainable Energy Systems (ICETEESES), Sultanpur, pp. 347–349 (2016) 9. Weng, R., Lu, J., Tan, Y.: Robust point set matching for partial face recognition. IEEE Trans. Image Process. 25(3), 1163–1176 (2016)
Pattern Matching Using Face Recognition System
171
10. Quanyou, Z., Shujun, Z.: A face detection method based on corner verifying. In: 2011 International Conference on Computer Science and Service System (CSSS), Nanjing, pp. 2854–2857 (2011) 11. Zhai, Y., Gan, J., Zeng, J., Xu, Y.: Disguised face recognition via local phase quantization plus geometry coverage. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, pp. 2332–2336 (2013) 12. Kesäniemi, M., Virtanen, K.: Direct least square fitting of hyperellipsoids. IEEE Trans. Pattern Anal. Mach. Intell. 40(1), 63–76 (2018) 13. Wang, J.: An improved iris recognition algorithm based on hybrid feature and ELM. In: IOP Conference Series: Materials Science and Engineering, vol. 322, p. 052030 (2018) 14. Yuan, J., Huang, D., Zhu, H., Gan, Y.: Completed hybrid local binary pattern for texture classification. In: 2014 International Joint Conference on Neural Networks (IJCNN), Beijing, pp. 2050–2057 (2014) 15. Knyazev, B., Shvetsov, R., Efremova, N., Kuharenko, A.: Leveraging large face recognition data for emotion classification. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), Xi’an, pp. 692–696 (2018) 16. Shanmugavadivu, P., Kumar, A.: Rapid face detection and annotation with loosely face geometry. In: 2016 2nd International Conference on Contemporary Computing and Informatics (IC3I), Noida, pp. 594–597 (2016)
A Fuzzy-Based Support Vector Regression Framework for Crop Yield Prediction Uduak Umoh , Daniel Asuquo, Imoh Eyoh, Abdultaofeek Abayomi , Emmanuel Nyoho, and Helen Vincent
Abstract This paper proposes a fuzzy-based support vector regression framework for crop yield prediction. To achieve its objectives, interval type-2 fuzzy logic (IT2FL), principal component analysis (PCA) and support vector regression (SVR) algorithms are employed. The IT2-FL algorithm is used to predict the missing predictor parameter values in the original crop yield prediction dataset while PCA performs feature selection and dimensionality reduction, thereby eliminating redundant information (features) from the transformed dataset by IT2-FL. SVR, machine learning algorithm, is used for model training and testing of the reduced dataset. The performance of the SVR model is evaluated using mean square error (MSE) and root mean square error (RMSE) metrics. Results show that SVR is more accurate at predicting crop yield with an error of 0.002071 for MSE and 0.045513 for RMSE, providing robustness to outliers and minimizing generalization errors with accuracy of 99% for MSE and 95% for RMSE. This indicates that farmers can rely on predicted outcomes from the proposed framework to adopt practices that maximize crop production, improve crop yield, and sustain food sufficiency to the teeming population. Keywords Interval type-2 fuzzy logic · Support vector regression · Crop yield prediction · Food sufficiency
1 Introduction Nigeria is a country rich in history and culture and also blessed with so much fertile land and agricultural resources and these agriculture resources have been one of the major backbones in terms of food crop production and revenue generation in the country, providing employment for about 30% of the population as of 2010 [1–5]. U. Umoh (B) · D. Asuquo · I. Eyoh · E. Nyoho · H. Vincent Department of Computer Science, University of Uyo, PMB 1017, Uyo, Akwa Ibom State, Nigeria e-mail: [email protected] A. Abayomi Department of Information and Communication Technology, Mangosuthu University of Technology, P.O. Box 12363, Jacobs, Durban 4026, South Africa © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_16
173
174
U. Umoh et al.
Data generated from the field of agriculture are numerous in nature and need a robust framework to gather and analyse for crop yield prediction and improvement in production. Thus, with farmers experience and information on a particular land, soil type, and crop type [6]. Crop yield prediction is a serious agricultural problem that leaves farmers with the choice of predicting the yield for a particular season based on past experiences. Also, assumptions are made depending on soil type, level of rainfall, crop type, etc., which often leads to inaccurate forecast. Globally, human requirement for food is escalating, and agricultural scientist, farmers, governments, and researchers are in need of tools and techniques for satisfying these demands. Data mining techniques of clustering, classification, and prediction can be applied on considered parameters to give farmers better orientation on the ways crop yield can be determined and improved. Also, soft computing techniques including fuzzy logic models can be satisfactory applied in agriculture for the improvement in crop yield [7]. Several factors that affect the level of crop production are categorized into internal and external factors. However, this work focuses more on the external factors, classified in five categories: they are climatic factors comprising rainfall, temperature, atmospheric humidity, wind velocity, atmospheric gases, and solar radiation; edaphic/soil factors comprising soil moisture, soil mineral matter, soil organic matter, soil organism, soil reactions (P H ), soil temperature, and soil air; biotic factors comprising competition between weed and crop plants which act as parasite, animals like protozoa, honey bee and wasp, snails, insect, nematode; physiographic factors comprising topography of the land, altitude, steepness of slope, exposure to light and wind; and socioeconomic factors comprising society inclination to farming and family members available for cultivation. The above numerous parameters cannot be applied explicitly in determining crop yield prediction. That means, the optimal set of parameters that can be used to enhance productivity and performance of a specific crop’s yield is highly required. Hence, the need for a dimensionality reduction technique [8], such as principal component analysis (PCA), is capable of reducing the dimension of the parameters to a given optimal set. This work develops a hybrid framework for the prediction of crop yield based on IT2-FL and SVR techniques. To achieve its objectives, IT2-FL, PCA, and SVR algorithms are employed. The IT2-FL algorithm is used to predict the missing predictor parameter values in the original crop yield prediction dataset while PCA performs feature selection and dimensionality reduction thereby eliminating redundant information (features) from the transformed dataset by IT2-FL. SVR, machine learning algorithm, is used for model training and testing of the reduced dataset. The rest of the paper is organized as follows: Sect. 2 reviews related works on crop yield prediction along with parameters to consider when maximizing crop production while Sect. 3 discusses the methodology employed in this work with detailed analysis of models and algorithms used in the proposed framework. Section 4 presents results obtained from experiments conducted with the reduced dataset along with performance measures to determine accuracy level attained from the prediction model. Section 5 concludes the paper with recommendations for future work.
A Fuzzy-Based Support Vector Regression Framework …
175
2 Related Literature The importance of crop production or yield prediction is mentioned in [9] with emphasis on factors to consider in order to improve crop production. The experimental results were clustered using k-means clustering algorithm. In [10], polynomials were developed to represent the predicted climatic variation parameters of some parts in Nigeria for 50 years (2000–2050) using climate changes in rainfall and temperature. Kumar and Kumar [11] used matrix laboratory (MATLAB) software to implement a system for crop production prediction using k-means algorithm and fuzzy logic technique on a given dataset. The authors in [12] focused on precision agriculture. Uslan et al. [13] proposed a hybrid learning system that is capable of building a robust fuzzy predictive model through the use of SVM and IT2-FL system. Uslan and Seker [14] showed how SVM-based regression can be used to identify initial parameter values of the consequent part of a type-1 and interval type-2 fuzzy system. In [15], crop yield prediction using random forest is modelled. In [16], a crop yield prediction model based on support vector machine and random forest is proposed. An approach is developed to predict millet crop yield using a high dimensional dataset [17]. A crop yield prediction model based on satellite imagery (remote sensing) dataset is proposed [18]. In [19], an interactive web-based crop yield prediction system using random forest algorithm to help farmers in appropriate decision making and policymaking even before crop cultivation is implemented. From all the referenced works, none has explored fuzzy logic, PCA, and SVR in handling crop yield problem in the Niger Delta region of Nigeria, to the best of our knowledge. This is the reason the authors have employed the fuzzy logic, PCA, and SVR tools in predicting crop yield in this area.
3 Research Methodology This paper integrates soft computing and machine learning models (IT2-FL, PCA, and SVR) for the prediction of crop yield. Crop yield dataset comprising of fifteen (15) features was collected from the Department of Geography and Urban/Regional Planning, University of Uyo, Akwa Ibom State and Niger Delta University weather house, Bayelsa State, all in Nigeria, for the period of ten and a half years from 2007 to half of 2018, covering the two planting (early and late) seasons. The features used as input variables are nitrogen, phosphorus, magnesium, sodium, potassium, soil P H , rainfall, temperature, solar radiation, evaporation, soil temperature, humidity, electric conductivity, soil organic matter, and calcium. The dataset comprises data points with missing predictor (crop yield) values. To estimate the missing values, IT2FL algorithm is employed along with its fuzzifier, triangular membership functions (TMFs), rulebase, inference engine, Karnik–Mendel type reduction algorithm, and defuzzifier to estimate the missing predictor values.
176
U. Umoh et al.
Fig. 1 Conceptual architecture of the fuzzy-SVR crop yield prediction framework
The transformed dataset is then subjected to the PCA algorithm in order to eliminate redundant features. PCA breaks the feature-to-feature correlation in the dataset while encouraging feature-to-predictor relationship. The reduced dataset is partitioned into a training set (60%) and test set (40%) for training and testing of the SVR model and prediction of crop yield. The performance of the SVR model is evaluated using mean square error (MSE) and root mean square error (RMSE) metrics. The conceptual architecture of the proposed crop yield prediction framework is presented in Fig. 1. The components include original dataset, IT2-FL model, transformed dataset, PCA module, reduced dataset, SVR module, and predicted result. The original dataset refers to the raw crop yield data with some missing predictor values obtained. Hence, IT2-FL model is used to transform this dataset. The transformed dataset is obtained as the output of the IT2-FL algorithm implementation. The PCA algorithm in the PCA module is applied to eliminate redundancy (irrelevant features) and achieve dimensionality reduction to achieving higher efficiency during model training. The outcome of PCA is used by SVR module for SVR model training and testing. The regression model generated learns from the reduced dataset and is used to accurately predict crop yield for a given unknown records due to its good generalization capability.
3.1 Original Dataset Analysis Figure 2 gives the original dataset, made up of fifteen (15) features as input variables and output variable (crop yield). The columns with red colour depict data points with missing predictor values.
A Fuzzy-Based Support Vector Regression Framework …
177
Fig. 2 Original crop yield dataset
3.2 Parameter Value Estimation with IT2-FL Algorithm The original dataset IT2-FL model is used to estimate the missing predictor parameter values for the label’s column of the dataset which is presented in Fig. 3. The four major components and processes of the IT2-FL algorithm used in the fuzzy-SVR crop yield prediction framework are depicted in Fig. 4. The fuzzification process depends on the input dataset and a TMF. The input dataset is taken one data point at a time and formulated as a vector depicted in Eq. 1. The TMF is used as presented in Eqs. 2–4. A Mamdani inference process is used to evaluate the rules, formulated as a conditional statement in the form (5): The fuzzy rules are evaluated using the models expressed in Eqs. (6)–(8). Vi = {P1 , P2 , P3 , P4 , P5 , . . . , P15 }
(1)
Output Fig. 3 Interval type-2 fuzzy logic model
178
U. Umoh et al.
Fuzzificati on
•Accept input (Original Dataset), convert it to IT2fuzzy set usingTMF.
Inferencin g
•Accept IT2-Fuzzy set, Carry out inferencing on the fuzzy set using the rule base, Produces a new IT2fuzzy set.
Type Reduction
•Accepts IT-2 fuzzy set, Carries out Karnik-Mendel type reduction, Converts IT2-fuzzy set into a type-1 fuzzy set
Deffuzifica tion
•Accepts Type-1 fuzzy set, Convert Type-1 fuzzy set into a Crisp Output called "Crop Yield" which is the missing predictor value in the dataset.
Fig. 4 Major processes in the IT2-FL algorithm
⎫ ⎧ ⎪ ⎪ 0, x ≤ a ⎪ ⎪ ⎪ ⎬ ⎨ x−a , a ≤ x ≤ b ⎪ b−a , a ∈ [a1 , a2 ], b ∈ [b1 , b2 ] and c ∈ [c1 , c2 ] f (x; a, b, c) = c−x ⎪ , b≤x ≤c⎪ ⎪ ⎪ c−b ⎪ ⎪ ⎭ ⎩ 0, c ≤ x (2) ⎫ ⎧ ⎪ ⎪ 0, x ≤ a1 ⎪ ⎪ ⎪ ⎬ ⎨ x−a1 , a ≤ x ≤ b ⎪ 1 1 , μ A (x) = N (a1 b, c1 ; x) μ A (xi ) = cb−a −x 1 im ⎪ , b ≤ x ≤ c1 ⎪ ⎪ ⎪ c −b ⎪ ⎪ ⎭ ⎩ 1 0, c1 ≤ x ⎫ ⎧ ⎪ ⎪ x ≤ a2 ⎪ ⎪ 0, ⎪ ⎬ ⎨ x−a2 , a ≤ x ≤ b ⎪ 2 b−a2 , μ (x) = N (a2 b, c2 ; x) μ Aim (xi ) = c2 −x ⎪ A ⎪ c −b , b ≤ x ≤ c2 ⎪ ⎪ 2 ⎪ ⎪ ⎭ ⎩ 0, c2 ≤ x
(3)
(4)
R l : I F x1 is F˜1l and . . . x p is F˜ pl THEN y is G˜ l1
(5)
i i Fi x = f i x , f x ≡ f i , f
(6)
f i x = μ F˜ i (x ) ∗ · · · ∗ μ F˜ i (x p )
(7)
i
f x = μ F˜1i (x ) ∗ · · · ∗ μ F˜1i (x p )
(8)
1
1
A Fuzzy-Based Support Vector Regression Framework …
179
The firing interval is obtained by combining all degrees of membership from each of the fifteen (15) linguistic variables as shown in (9)–(10). μ B (y) = μ F (x) ∗ μG (y)∀y ∈ Yd
(9)
μ B (y) = μ F (x) ∗ μG (y)∀y ∈ Yd
(10)
The Karnik–Mendel algorithm seeks to find the leftmost and the rightmost points using Eqs. (11) and (12), respectively, to reduce the firing intervals and the centroid of the consequent set to a type-1 fuzzy set. Defuzzification model presented in Eq. 13. L yl =
⎡ L∈⎣1,
⎤
min ⎦ N −1 R
yr = R∈[1, min N −1 ]
n n N n=1 f y + n=L+1 L n N n=1 f + n=L+1
n n N n=1 f y + n=R+1 R N n n=1 f + n=R+1
y=
f n yn fn
(11)
n
f yn f
n
yl + yr 2
(12) (13)
3.3 PCA Dimensionality Reduction (Feature Selection) PCA is applied to reduce the amount of redundant information in the dataset, thereby producing a reduced dataset. The PCA is stated thus; (INPUT: X = N × d matrix, OUTPUT: Y = N × m matrix); 1. BEGIN; 2. //Initialization; 3.Z : the output dataset. 4. X : the mean; 5.X i : the data values; 6. n: the number of data points; 7. C: the covariance matrix; 8.C i,i (the diagonal): the covariance of variable i; 9.C i, j (offdiagonal): the covariance between variables i and j; 10. //Standardization; 11. for i = 1 to n; n Xi ; 13. Z = X i − X ; 14. //Compute the d x d covariance matrix. 12.X = i=1 n N 1 X q,i .X q, j ; 17. //Compute the eigen15. C = N −1 X T X ; 16. Ci, j = N 1−1 q=1 vectors of the covariance matrix to identify the principal components using the relation. A·vi = λi .vi ; where: A is an n-by-n matrix, v is a non-zero n-by-1 vector and λ is a scalar. Any value of λ for which this equation has a solution is known as an eigenvalue of the matrix A. The vector v which corresponds to this value is called an eigenvector; 19. //Select m eigenvectors that corresponds to the largest m eigenvectors to be the new basis such that; 20. If A is a square matrix, a non-zero vector v is an eigenvector of A if there is a scalar λ (eigenvector) such that; 21. Av = λv; 22. //Project the data and represent it as Y; 23. Y = X v; 24. Where, Y is an N xm matrix; v = [v1 . . . vm ]
180
U. Umoh et al.
is a d xm matrix where columns vi are the eigenvectors corresponding to the largest m eigenvalues; 26. Output Y ; 27. END //end PCA Algorithm.
3.4 SVR Model Training and Prediction for Crop Yield SVR algorithm is used to predict the output, crop yield through a nonlinear SVR model, presented as follows; given a dataset with n dimensional features and a target variable (real number), {(X 1 y1 ), (X 2 y2 ), . . . , (X m ym )}, where X ∈ R n , y = R. The objective is to find a function f (x), with at most ε-deviation from the target y. The relationship between X and y is nonlinear.
max
⎧ ⎫ m m ⎪
⎪
1 ⎪ ⎪ ∗ ∗ ∗ ⎪ αi − αi α j − α j (X i ), X j − ε αi + αi ⎪ ⎪ ⎪ ⎪ ⎪ ⎨2 ⎬ i, j=1 i ⎪ ⎪ ⎪ ⎪ ⎪ ⎩
⎪ ⎪ ⎪ ⎪ ⎪ ⎭
m
+ yi αi − αi∗ i
(14)
Such that: m
αi + αi∗ = 0; 0 ≤ αi , αi∗ ≤ C i=1
where αi and αi∗ are the model weights, ε is the Epsilon, and C is the complexity and number of support vectors. The dot product is computed in Eq. (15) as: ((x).(X i )) = K (x, X i )
(15)
where (X i ) and (x) are the mapped vectors. The (X i )and X j mapping functions are actually computed using radial basis function (RBF) kernel, K (x, X i ) defined in Eq. (16) as: 1 K (x, y) = exp − 2 ||x − y||2 2σ
(16)
The output of the SVR algorithm, which is the prediction of crop yield, is obtained as expressed in Eq. (17). yi =
αl K (x, X i ) + b
(17)
where yi is the predicted crop yield; αi is the model’s weight;b is the bias; and K (x, X i ) is the kernel function. The performance criteria in Eqs. 15 and 16 are applied to measure our experimental results, where y x is desired output, y is the
A Fuzzy-Based Support Vector Regression Framework …
181
computed output, and N is the number of data items, respectively.
2 1 x Mean Squared Error(MSE) = y −y N i=1 N 1 Root Mean Squared Error RMSE = (y x − y)2 N i=1 N
(18)
(19)
4 Results and Discussion The results obtained from our study include the transformed dataset from IT2FL algorithm, the reduced dataset from PCA algorithm, and the SVR crop yield predictions. A sample of the transformed dataset is presented in Fig. 5. From Fig. 5, rows 14 and 24 now have predictor values of 0.7173707 and 0.6453129, respectively. Figure 6a–c gives a fuzzy surface plot that depicts the effect of pairs of linguistic variables on the predictor (crop yield): (a) indicates that when magnesium is non-existing (0), nitrogen is existing (1), the crop yield is moderate (0.56); (b) when evaporation is 15 and humidity is 40, the output, crop yield is 0.53; and (c) when humidity 60 and temperature is 28, crop yield is 0.48.
Fig. 5 Transformed dataset
182
U. Umoh et al.
Fig. 6 a Effects of nitrogen and magnesium, b humidity and evaporation c temperature and humidity on crop yield
Table 1 Cumulative explained variance
Dimension number
Cumulative explained variance
Dimension number
Cumulative explained variance
Dim 1
10.25
Dim 9
67.18
Dim 2
18.55
Dim 10
73.23
Dim 3
26.09
Dim 11
79.15
Dim 4
33.46
Dim 12
84.94
Dim 5
40.78
Dim 13
90.37
Dim 6
47.74
Dim 14
95.6
Dim 7
54.47
Dim 15
100
Dim 8
61.04
Table 1 presents the cumulative explained variance for each number of dimensions that helps in choosing an optimal dimension (i.e. a dimension with at least 80% cumulative explained variance). From Table 1, it is observed that “Dimension 12” is the optimal with a cumulative explained variance >=80%. Hence the transformed dataset with 15 features is reduced to a dataset with 12 features showing that PCA was successful in normalizing and eliminating redundant features found in the original dataset and the transformed dataset from IT2-FL algorithm. The PC is presented in Table 2 and new features selected in order of their increasing eigenvalue are presented in Fig. 7a. The SVR algorithm was trained with 60% of the reduced dataset while the remaining was used for model testing from a total of 1250 data points. The SVR training parameters and values are Kernel Type = Radial Basis Kernel Function, Cost = 1, Gamma = 0.0833333, and Epsilon = 0.1, respectively, while the test result is presented in Fig. 7a and b, indicating a high accuracy from the SVR model and (b) SVR model prediction visualized for crop yield. The blue circles in Fig. (b) represent the actual output (from the dataset) while the red stars represent the predicted output (from the SVR model). The performance result evaluated using mean square error (MSE) and root mean square error (RMSE) metrics shows that the SVR model is accurate at predicting
A Fuzzy-Based Support Vector Regression Framework … Table 2 Mapping feature to PC
Original dataset feature
183 Reduced dataset PC
Nitrogen
PC1
Phosphorus
PC2
Magnesium
PC3
Sodium
PC4
Potassium
PC5
Soil ph
PC6
Rainfall
PC7
Temperature
PC8
S_Radiation
PC9
Evaporation
PC10
Soil_Temperature
PC11
Humidity
PC12
Fig. 7 a Selected features and proportion of information explained b SVR model prediction visualized for crop yield
crop yield with minimal error values of 0.002071 for MSE (with accuracy of 99%) and 0.045513 for RMSE (with accuracy of 95%), respectively. The result of visualize performance of the SVR model for crop yield prediction shows that the SVR model is accurate at predicting crop yield. For instance, at row 10, the model prediction is 0.540705798 while the actual result is 0.552072571 with a deviation of 0.011366 which translates to an accuracy of 98.86%. This shows that the difference between the actual and predicted output is minimal, hence SVR model is effective at predicting crop yield.
184
U. Umoh et al.
5 Conclusion This paper develops a hybrid framework for the prediction of crop yield based on IT2FL, PCA, and SVR techniques. The models are able to handle uncertainty inherent in crop yield data and analyse the data through preprocessing, dimensionality reduction, and prediction algorithms. The fuzzy-SVR-based machine learning approach proves to be effective in predicting crop yield, offers uniform convergence and global optimum solution with an accuracy of 99% and a minimal generalization error of 0.002071. In the future, our developed SVR model can be evaluated based on kernel type such as radial basis function or polynomial Kernel. It provides a means by which farmers can be informed about predicted yield based on environmental and soil conditions such as soil organic matter, rainfall, soil P H , temperature, and humidity. The significance of the paper is in helping farmers to manage resources and make better decision resulting from prior knowledge of crop yield even before cultivation. In the future, other machine learning approaches can be applied in the work to compare the model performance.
References 1. Labour Force Statistics: Nigerian Bureau of Statistics (2010). Retrieved 22 June 2015 2. Williams, S.K.T.: Rural development in Nigeria, pp. 129. Ile-Ife University of Ife Press, Nigeria (1978) 3. Abiwon, B.: The prospects of agriculture in Nigeria: how our fathers lost their way—a review. Asian J. Econ. Bus. Acc. (2017) 4. Ayoola: Essays on the agricultural economy: a book of readings on agricultural development policy and administration in Nigeria. TMA Publishers Ibadan, pp. 81 (2001) 5. Adesina, A.: Agricultural transformation agenda: repositioning agriculture to drive Nigeria’s economy (2012). Retrieved https://www.emrc.be/documents/document/20121205120841-agr i2012-special_session-tony_bello-min_agric_nigeria.pdf 6. Ritchie, H., Roser, M.: Crop yields. OurWorldInData.org (2020). Retrieved :’https://ourworldi ndata.org/crop-yields 7. Prabira, S., Gyana, P., Santi, B., Nalini, B., Amiya, R.: Application of soft computing in crop management, pp. 633–646. Springer (2018). https://doi.org/10.1007/978-981-10-7566-7_64 8. Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000) 9. Manjula, E., Djoditachoumy, S.: A model for prediction crop yield, vol. 6(4), pp. 298–305. Pachaiyappas College India (2017) 10. Olusina, J.O., Odumade, O.M.: Modeling climatic variation parameters of Nigeria using the statistical downscaling approach, p. 1. Department of Survey and Geoinformatics (2012) 11. Kumar, A., Kumar, S.: Prediction of production of crops using K-mean and fuzzy logic, vol. 4(8), pp. 44–56 (2015) 12. Mehta, T.S., Dhaval, R.K.: Survey of data mining techniques in precision agriculture, vol. 4(7), pp. 363–364 (2015) 13. Uslan, V., Seker, H.: Support vector-based Takagi-Sugeno fuzzy system for the prediction of binding affinity of peptides. In: 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC-2013) (2013) 14. Uslan, V., Seker, H.: The quantitative prediction of HLA-B*2705 peptide binding affinities using support vector regression to gain insights into its role for the spondyloarthropathies.
A Fuzzy-Based Support Vector Regression Framework …
15. 16. 17.
18. 19. 20.
185
In: 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 7651–7654 (2015) Priya, P., Murhaiah, U., Balamurugan, M.: Predicting yield of the crop using machine leaning algorithm. Int. J. Eng. Sci. Res. Technol. 7(4), 1–7 (2018) Bondre, D.A., Mahagaonkar, S.: Prediction of crop yield and fertilizer recommendation using machine learning algorithms. Int. J. Eng. Appl. Sci. Technol. 4(5), 371–376 (2019) Manjula Josephine, B., Ruth Ramya, K., Rama Rao, K.V.S.N, Kuchibhotla, S., Venkata Bala Kishore, P., Rahamathulla, S.: Crop yield prediction using machine learning. Int. J. Sci. Technol. Res. 9(2), 2102–2106 (2020) Suganya, M., Dayana, R., Revathi, R.: Crop yield prediction using supervised learning techniques. Int. J. Comput. Eng. Technol. (IJCET) 11(2), 9–20 (2020) Mayank, C., Chandvidkar, C., Darpan, C., Rathod, M.: Crop yield prediction using machine learning. Int. J. Sci. Res. (IJSR) 9(4), 1–4 (2020) S Homepage. https://www.springer.com/lncs. Last accessed 21 Nov 2016
A Mathematical Study of Hepatitis C Virus Model During Drug Therapy Treatment Yogita and Praveen Kumar Gupta
Abstract In this article, the dynamical behaviour of mathematical model for the hepatitis C virus is presented. This model consists of five ordinary differential equations, where the compartments are: healthy epithelial cells, infected epithelial cells, viruses, T lymphocyte, and the amount of drug. Existence and uniqueness conditions for the model are derived. Equilibrium points along with basic reproduction number are also calculated. The first Lyapunov method has been used to achieve the local stability at the disease-free equilibrium point. Numerical simulations and comparative studies are illustrated to validate the analytical results. Keywords Hepatitis C virus · Mathematical modelling · Local stability · Numerical solution
1 Introduction Hepatitis is one of the major liver infectious diseases caused by hepatitis C virus (HCV). It can be self-limiting or progressive, resulting in fibrosis (scarring) cirrhosis or can cause liver cancer. The adaptation flow of hepatitis C is between 10 days to 6 months. Hepatitis C can be transmitted through contact to infected blood, use of contaminated medicines, unsafe health habits, and using used or infected injections. In 2016, WHO reported approximately 400,000 people died from hepatitis C disease. Last two decades, many deterministic and stochastic mathematical models have been used to explain the behaviour of hepatitis C virus. In 2001, Avendano et al. [1] illustrate a mathematical model on hepatitis C, which describes the inhabitants of CD8+ cells or T killer cells, which play a direct role in the removal of infected cells. They study four different compartments in the model which are healthy liver cells, infected liver cells, hepatitis C virus load, and CD8+ type killer cells. They
Yogita · P. K. Gupta (B) Department of Mathematics, National Institute of Technology Silchar, Assam 788010, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_17
187
188
Yogita and P. K. Gupta
describe the analysis of this model and talk about the two existing states, one which is uninfected state and other the infected state. After that Giannini and Bréchot [2] discussed that HCV inflammation initiates a dynamic complication, with over 300 million people suffering from hepatitis C. During these days, no effective treatment was available for HCV but there were some medications that could cure some patients infected with hepatitis C. Few years back, Ble et al. [3] studied a mathematical model and proposed an immune system for infection with hepatitis C virus, with the effect of new parameters. They calculated the basic reproduction number (R0 ), and on the basis of R0 , Ble concluded that under some conditions the HCV disease can be controlled. To avoid organ complication and manifestation of lympho-proliferative diseases, the medical practitioner needs to detect early stage of HCV infection. In this continuation, Mazzaro et al. [4] studied the role of the new direct antiviral agents (DAAs) therapy in hepatitis C virus (HCV) infected patient. Prior to DAAs, several studies suggested after antiviral therapy treatment, mixed cryoglobulinemia (MC) may disappear from the patient’s body along with HCV. A number of clinical studies on DAAs confirm outstanding 90–100% removal rates. These researchers concluded that this treatment has as well as efficacy on viral removal in patients with MC; however, specific clinical improvements of vasculitis can be detected only in half the patients. During COVID-19 pandemic, USA public health officials release the public health bulletin about screening recommendations for hepatitis C virus infection. In keeping mind of this bulletin, recently, Jhaveri [5] reviews the past HCV screening policies and organize a short review article on changes in HCV epidemiology with new existing therapy, and trying to find loopholes in health policies during the eradicating of HCV. In recent times, we can find numeral applications of soft computing in biomathematics, biomechanics, biotechnology, and many more biology fields [6–11]. With this motivation, we have extended Avendano et al. [1] study on the hepatitis C virus, incorporating new assumptions drawn from Ble et al. [3]. This article is organized in the following sections: in the Sect. 2, we discussed the model with new parameters and elaborated the basic properties. The Sect. 3 shows the analysis of the proposed model, and next Sect. 4 expressed the numerical solution of the model with effect of various parameters.
2 Mathematical Model In this model, we have taken five compartments, and these compartments are: healthy epithelial cells, infected epithelial cells, viruses in the liver, killer cells or T lymphocyte, and the amount of drug injected in the body. According to the best knowledge
A Mathematical Study of Hepatitis C Virus Model . . .
189
of authors, this type of drug compartment is novel, which is not considered by any researcher. The proposed model in ODE’s system is, Hs ˙ − KHs V , Hs = rHs 1 − Hmax H˙ i = KHs V − δHi T − μHi − K1 Hi C, V˙ = pHi − βV − K2 CV , T ˙ T = λV 1 − − αT , Tmax C˙ = ν − γC.
(1) (2) (3) (4) (5)
with Initial conditions, Hs (0) = Hs0 , Hi (0) = Hi0 , V (0) = V0 , T (0) = T0 , C(0) = C0 .
(6)
The variables stated in the above model are explained as: Hs (t) is the inhabitants of healthy epithelial cell with respect to time t, where r is the proportionality constant and Hmax is the highest number of healthy cells in the body. Hi (t) is the inhabitants of infected epithelial cell with respect to time t. V (t) is the viruses in the liver with respect to time t. T (t) is the inhabitants of T lymphocyte with respect to time t. Under the existence of hepatitis C virus, T expand proportionally to the V with λ rate, where λ is a replication constant of T and Tmax is the highest number of cells in the body. C(t) is the drug injected in the body with respect to time t. Hs becomes Hi at a constant rate with V and k (proportionality constant), V are produced by Hi with p viruses and die at a rate of β, C die at a constant rate γ.
3 Properties of the Model 3.1 Non-negativity of the Solution Lemma 1 (see [12]) The required solutions Hs (t), Hi (t), V (t), T (t) and C(t) of the model (1)–(5) with initial values Hs (0) = Hs0 > 0, which is strictly positive, and Hi (0) = Hi0 ≥ 0, V (0) = V0 ≥ 0, T (0) = T0 ≥ 0, C(0) = C0 > 0 are non-negative for t ≥ 0. Proof From the model (1)–(5), after solving Eq. (1), we get ⎫ ⎬ t rHs (u) KV (u) + du > 0. Hs (t) = Hs (0) exp rt − ⎭ ⎩ Hmax ⎧ ⎨
0
(7)
190
Yogita and P. K. Gupta
Similarly, after solving Eqs. (2)–(5), we get ⎧ ⎫⎤ ⎤ x ⎨ ⎬ Hi (t) = ⎣ ⎣KV (x)Hs (x) exp μx + (δT (u) + K1 C(u)) du ⎦ dx + Hi (0)⎦ ⎩ ⎭ 0 0 ⎧ ⎫ t ⎨ ⎬ × exp −μt − (δT (u)du + K1 C(u)) du ≥ 0. (8) ⎩ ⎭ 0 ⎧ ⎫⎤ ⎡ t⎡ ⎤ x ⎨ ⎬ V (t) = ⎣ ⎣pHi (x) exp βx + K2 C(u)du ⎦ dx + V (0)⎦ ⎩ ⎭ 0 0 ⎧ ⎫ t ⎨ ⎬ × exp −βt − K2 C(u)du ≥ 0. (9) ⎩ ⎭ 0 ⎫⎤ ⎧ ⎡ t⎡ ⎤ x ⎬ ⎨ λ T (t) = ⎣ ⎣λV (x) exp αx − V (u)du ⎦ dx + T (0)⎦ ⎭ ⎩ Tmax 0 0 ⎧ ⎫ t ⎨ ⎬ λ × exp −αt + V (u)du ≥ 0. (10) ⎩ ⎭ Tmax ⎡
t
⎡
0
ν C(t) = 1 − e−γt + C(0)e−γt > 0. γ
(11)
Hence, for the above model, the solutions Hs (t), Hi (t), V (t), T (t), C(t) are either zero or positive for t ≥ 0.
3.2 Boundedness of the Solution Lemma 2 (see [13]) For the proposed model (1)–(5), the region ϑ is defined by, ν . ϑ = (Hs , Hi , V , T , C) ∈ 5+ , Hs ≤ Hmax , Hi ≤ Hmax , V ≤ pHmax , T ≤ Tmax , C ≤ γ
(12)
Proof The first equation of the model (1)–(5), H˙ s ≤ After simplifying this inequality,
r Hmax
Hs (Hmax − Hs ) .
A Mathematical Study of Hepatitis C Virus Model . . .
H˙ s Hs (Hmax − Hs )
≤
191
r Hs ⇒ ≤ ert . Hmax Hmax − Hs
Hence, Hs ≤ Hmax . Now, from the second and third equation of the model (1)–(5) show that the interaction of viruses and healthy cells is causing the influx of infected cell in the model, and the viruses produced from infected cells. We can also see that If we assume that there is no virus in the model then Hi and V will be zero. Therefore, if the healthy cells are bounded then definitely infected cells and viruses will also bounded, because they are depend on healthy cells. Hence, Hi ≤ Hmax , and V ≤ pHmax .
(13)
Now, we take the fourth equation of the model (1)–(5) and follow the same procedure as Eq. (1), then we can say that (14) T ≤ Tmax . After solving the Eq. (5), linear ordinary differential equation for t → ∞, we get, C → γν . Hence, the amount of drug never exceeds γν , i.e. C≤
ν . γ
(15)
4 Analysis of the Model 4.1 Equilibrium Points After simple calculations for the proposed model, we get one disease-free equilibrium ¯ point (E ∗ ) and one endemic point (E). • Disease-free equilibrium point ν E ∗ = (Hs∗ , Hi∗ , V ∗ , T ∗ , C ∗ ) = Hmax , 0, 0, 0, . γ
(16)
• Endemic equilibrium point E¯ = H¯ s , H¯ i , V¯ , T¯ , C¯ .
(17)
Since the endemic point has no biological significance, because if the disease persists the model that needs more modifications. Therefore, in the next sections, we are concentrated on disease-free equilibrium point.
192
Yogita and P. K. Gupta
4.2 Reproduction Number Let X = (Hi , V , T , C, Hs ), then the model can be noted as,
⎛
⎞ KHs V ⎜ 0 ⎟ ⎜ ⎟ ⎟ where, A(X ) = ⎜ ⎜ 0 ⎟ ⎝ 0 ⎠ 0
dX = A(X ) − B(X ), (18) dt ⎞ ⎛ δHi T + μHi + K1 Hi C ⎜ −pHi + βV + K2 CV ⎟ ⎟ ⎜ ⎟ ⎜ T −λV 1 − + αT ⎟ . (19) ⎜ and B(X ) = ⎜ Tmax ⎟ ⎟ ⎜ ⎠ ⎝ −ν + Cγ s −rHs 1 − HHmax + KHs V
Now, we construct the Jacobian matrix (see [8]) for A(X ) and B(X ) at the value of E ∗ , ⎛ ⎞ 0 KHmax 0 0 0 ⎜0 0 0 0 0 ⎟ ⎜ ⎟ ⎟ P = J [A(X )]atE ∗ = ⎜ (20) ⎜0 0 0 0 0 ⎟ , ⎝0 0 0 0 0 ⎠ 0 0 000 ⎛ K1 ν and , Q = J [B(X )]
atE ∗
⎜ ⎜ =⎜ ⎜ ⎝
γ
+μ 0 K2 ν +β −p γ 0 −λ 0 0 0 KHmax
0 0 α 0 0
0 0 0 γ 0
⎞ 0 0⎟ ⎟ 0⎟ ⎟. 0⎠ r
(21)
Now, we calculated the value of R0 from the spectral radius of P.Q−1 , which is as follows, R0 =
Hmax Kpγ 2 . (K1 ν + γμ)(K2 ν + γβ)
(22)
4.3 Local Stability at Disease-Free Equilibrium Point Theorem 1 (see [13]) The disease-free equilibrium point E ∗ is locally asymptotically stable in ϑ if R0 < 1 and unstable in ϑ if R0 ≥ 1.
A Mathematical Study of Hepatitis C Virus Model . . .
193
Proof. For E ∗ , the Jacobian matrix of the proposed model is ⎛ −r 0 ⎜ 0 −μ − ⎜ J =⎜ p ⎜0 ⎝0 0 0 0
K1 ν γ
−Hmax K Hmax K −β − Kγ2 ν λ 0
0 0 0 −α 0
⎞ 0 0 ⎟ ⎟ 0 ⎟ ⎟. 0 ⎠ −γ
(23)
The characteristic equation of the matrix J is (η + γ)(η + α)(η + ν)(η 2 + a1 η + a2 ) = 0, ν > 0, where, a1 = β + μ + (K1 + K2 ) γ (γμ + K1 ν)(βγ + K2 ν) and a2 = [1 − R0 ] γ2
(24)
(25)
From Eqs. (24) and (25), we can state the following results: • If we substitute R0 < 1 in Eq. (25), then we have a1 > 0, and a2 > 0. Therefore, by Routh−Hurwitz criterion (see [14]), E1 is locally asymptotically stable. • In the other case, if R0 ≥ 1 then a2 ≤ 0; therefore, E1 will be unstable.
5 Numerical Results and Discussion In this section, we have discussed the numerical solution of hepatitis model (1)– (5) for various parameter values δ, K and p using MATHEMATICA software for supporting the analytic results obtained. Figures 1, 4, and 7 show the behaviour of susceptible (Hs ), infected (Hi ) and virus (V ) population for multiple values of δ = 0.28, 0.29, 0.30, 0.31, 0.32 with respect to time for set of initial populations Hs0 = 0.8, Hi0 = 0.09, V0 = 0.01, T0 = 0.1, C0 = 0, and parameter values, r = 1, K = 0.004, μ = 0.001, K1 = 0.1, p = 10, β = 0.08, K2 = 1, λ = 1, α = 0.001, γ = 0.3, Hmax = 1, Tmax = 1. Each subfigures (a), (b), and (c) shows the importance of drug dosage in the population.
Yogita and P. K. Gupta 1.0
1.0
0.8
0.8
Susceptible Population
Susceptible Population
194
0.6 0.28
0.4
0.29 0.30
0.2
0.31 0.32
0.0 0
50
100
150
200
0.28 0.29
0.6
0.30 0.31
0.4
0.32
0.2 0.0
0
50
100
Time
150
200
Time
Susceptible Population
1.0 0.8
0.29
0.6
0.30
0.4
0.32
0.31
0.2 0.0
0
50
100
150
200
Time
Fig. 1 Behaviour of susceptible population with respect to time for various values of δ during different doses of vaccine: a υ = 0, b υ = 0.1, c υ = 0.2
Similarly, Figs. 2, 5, and 8 show the behaviour of Hs , Hi , and V population for multiple values of K = 0.002, 0.003, 0.004, 0.005, 0.006 with respect to time for set of initial populations Hs0 = 0.8, Hi0 = 0.09, V0 = 0.01, T0 = 0.1, C0 = 0, and r = 1, δ = 0.3, μ = 0.001, K1 = 0.1, p = 10, β = 0.08, K2 = 1, λ = 1, α = 0.001, γ = 0.3, Hmax = 1, Tmax = 1. Each sub-figures (a), (b), and (c) show the importance of drug dosage in the population. In Figs. 3, 6 and 9, the behaviour of Hs , Hi and V population describe for multiple values of p = 8, 9, 10, 11, 12 with respect to time for set of initial populations Hs0 = 0.8, Hi0 = 0.09, V0 = 0.01, T0 = 0.1, C0 = 0, and r = 1, δ = 0.3, K = 0.004, μ = 0.001, K1 = 0.1, β = 0.08, K2 = 1, λ = 1, α = 0.001, γ = 0.3, Hmax = 1, Tmax = 1. Each sub-figures (a), (b), and (c) show the importance of drug dosage in the population.
A Mathematical Study of Hepatitis C Virus Model . . . 1.0
0.8
Susceptible Population
Susceptible Population
1.0
195
K
0.6
K 0.003
0.4
K 0.004 K 0.005
0.2
K 0.006
0.0
0.8 K 0.002
0.6
K 0.003 K 0.004
0.4
K 0.005 K 0.006
0.2 0.0
0
50
100
150
200
0
50
100
Time
150
200
Time
Susceptible Population
1.0 0.8 K
0.6
K 0.003 K 0.004
0.4
K 0.005 K 0.006
0.2 0.0
0
50
100
150
200
Time
Fig. 2 Behaviour of susceptible population with respect to time for various values of K during different doses of vaccine: a υ = 0, b υ = 0.1, c υ = 0.2 1.0
0.8
Susceptible Population
Susceptible Population
1.0 p p 9
0.6
p 10 p 11
0.4
p 12
0.2 0.0
0.8 p 9
0.6
p 10 p 11
0.4
p 12
0.2 0.0
0
50
100
150
200
0
50
100
Time
150
200
Time
Susceptible Population
1.0 0.8 p 9
0.6
p 10 p 11
0.4
p 12
0.2 0.0 0
50
100
150
200
Time
Fig. 3 Behaviour of susceptible population with respect to time for various values of p during different doses of vaccine: a υ = 0, b υ = 0.1, c υ = 0.2
196
Yogita and P. K. Gupta 0.15
0.6
Infected Population
Infected Population
0.8 0.28 0.29
0.4
0.30
0.2
0.32
0.31
0.0 0
50
100
150
0.29 0.30
0.10
0.31 0.32
0.05 0.00
200
0
50
100
Time
150
200
Time
Infected Population
0.08 0.29
0.06
0.30 0.31 0.32
0.04 0.02 0.00 0
50
100
150
200
Time
Fig. 4 Behaviour of infected population with respect to time for various values of δ during different doses of vaccine: a υ = 0, b υ = 0.1, c υ = 0.2 0.7 K 0.003
0.6
K 0.004 K 0.005
0.4
K 0.006
0.2 0.0 0
50
100
150
Time
200
K 0.003
0.5
K 0.004
0.4
K 0.005 K 0.006
0.3 0.2 0.1 0.0 0
50
100
Time
150
200
K
0.15
Infected Population
K
0.6
K
Infected Population
Infected Population
0.8
K 0.003 K 0.004
0.10
K 0.005 K 0.006
0.05 0.00 0
50
100
150
200
Time
Fig. 5 Behaviour of infected population with respect to time for various values of K during different doses of vaccine: a υ = 0, a υ = 0.1, c υ = 0.2
A Mathematical Study of Hepatitis C Virus Model . . . 0.4 p 9
0.6
Infected Population
Infected Population
0.8
197
p 10 p 11 p 12
0.4 0.2 0.0
p 8 p 9 p 10
0.3
p 11 p 12
0.2 0.1 0.0
0
50
100
150
200
0
50
100
Time
200
p 8
0.08
Infected Population
150
Time
p 9 p 10
0.06
p 11 p 12
0.04 0.02 0.00
0
50
100
150
200
Time
Fig. 6 Behaviour of infected population with respect to time for various values of p during different doses of vaccine: a υ = 0, b υ = 0.1, c υ = 0.2 100
12 10
0.28
60
0.29
Virions
Virions
80
0.30
40
0.31 0.32
20
0.29 0.30
8
0.31
6
0.32
4 2
0 0
50
100
150
200
0
0
50
100
Time
150
200
Time
2.5
Virions
2.0
0.29 0.30
1.5
0.31 0.32
1.0 0.5 0.0 0
50
100
150
200
Time
Fig. 7 Behaviour of virus population with respect to time for various values of δ during different doses of vaccine: a υ = 0, b υ = 0.1, c υ = 0.2
198
Yogita and P. K. Gupta 60
100
Virions
60 40
K K K K
50
0.003 0.004 0.005 0.006
40
Virions
K K K K K
80
20
30
0.003 0.004 0.005 0.006
20 10
0 0
50
100
150
0
200
0
Time
100
150
200
Time
10
K K K K
8
Virions
50
6
0.003 0.004 0.005 0.006
4 2 0 0
50
100
150
200
Time
Fig. 8 Behaviour of virus population with respect to time for various values of K during different doses of vaccine: a υ = 0, b υ = 0.1, c υ = 0.2 120
p p=9 p=10 p=11 p=12
Virions
80 60
40
p=8 p=9 p=10 p=11 p=12
30 Virions
100
40
20 10
20
0
0 0
50
100 Time
150
3.0
50
100 Time
150
200
p p=9 p=10 p=11 p=12
2.5 2.0 Virions
0
200
1.5 1.0 0.5 0.0
0
50
100 Time
150
200
Fig. 9 Behaviour of virus population with respect to time for various values of p during different doses of vaccine: a υ = 0, b υ = 0.1, c υ = 0.2
A Mathematical Study of Hepatitis C Virus Model . . .
199
6 Conclusion In this paper, we developed a mathematical model for hepatitis C virus with a new compartment drug; basically, this drug is responsible for reducing the infectious power of the hepatitis C virus. We checked the non-negativity and boundedness of the solution for the proposed model in Lemmas 1 and 2, which is very important for any infectious disease model. Thereafter, we showed that the model has two equilibrium points, disease-free and endemic equilibrium point, and obtained the conditions where they exist. To find the dynamical behaviour of any infectious disease, we calculate the basic reproduction number and show the effect of various parameter values. It is concluded that the proposed dynamical model is more effective and more general to understand the behaviour of hepatitis C virus if we incorporate the drug compartment. Moreover, numerical calculations verify all these assumptions, which are demonstrated in Figs. 1, 2, 3, 4, 5, 6, 7, 8 and 9.
References 1. Avendano, R., Esteva, L., Flores, J.A., FuentesAllen, J.L., Gomes, J., LopezEstrada, J.E.: A mathematical model for the dynamics of hepatitis C. J. Theor. Med. 4, 109–118 (2002) 2. Giannini, C., Bréchot, C.: Hepatitis C virus biology. Cell Death Differ. 10, S27–S38 (2003) 3. Ble, G., Esteva, L., Peregrino, A.: Global analysis of a mathematical model for hepatitis C considering the host immune system. J. Math. Anal. Appl. 461, 1378–1390 (2018) 4. Mazzaro, C., Maso, L.D., Mauro, E., Visentini, M., Tonizzo, M., Gattei, V., Andreone, P., Pozzato, G.: Hepatitis C virus-related cryoglobulinemic vasculitis: a review of the role of the new direct antiviral agents (DAAs) therapy. Autoimmun. Rev. 19(8), 102589 (2020) 5. Jhaveri, R.: Screening for hepatitis C virus: how universal is universal? Clini. Therapeut. 42(8), 1434–1441 (2020) 6. Purwar, A., Singh, S.K., Kesarwani, P.: A decision model to predict clinical stage of bladder cancer. In: Pant, M., Ray, K., Sharma, T., Rawat, S., Bandyopadhyay, A. (eds.) Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computing, vol 583. Springer, Singapore (2018) 7. Chauhan, R., Jangade, R., Rekapally, R.: Classification model for prediction of heart disease. In: Pant, M., Ray, K., Sharma, T., Rawat, S., Bandyopadhyay, A. (eds.) Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computing, vol 584. Springer, Singapore (2018) 8. Dutta, A., Gupta, P.K.: Approximate analytical solution of a HIV/AIDS dynamic model during primary infection. In: Rushi, K.B., Sivaraj, R., Prasad, B., Nalliah, M., Reddy, A. (eds.) Applied Mathematics and Scientific Computing. Trends in Mathematics. Birkhäuser, Cham (2019) 9. Kandubothula, V., Uppada, R., Nandan, D.: A review on detection of breast cancer cells by using various techniques. In: Pant, M., Kumar Sharma, T., Arya, R., Sahana, B., Zolfagharinia, H. (eds.) Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computing, vol. 1154. Springer, Singapore (2020) 10. Gupta, K.K., Vijay, R., Pahadiya, P.: A review paper on feature selection techniques and artificial neural networks architectures used in thermography for early stage detection of breast cancer. In: Pant, M., Kumar Sharma, T., Arya, R., Sahana, B., Zolfagharinia, H. (eds.) Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computing, vol. 1154. Springer, Singapore (2020)
200
Yogita and P. K. Gupta
11. Singh, A., Vikram, A., Singh, M.P., Tripathi, S.: Classification of neuromuscular disorders using machine learning techniques. In: Pant, M., Kumar Sharma, T., Arya, R., Sahana, B., Zolfagharinia, H. (eds.) Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computing, vol. 1154. Springer, Singapore (2020) 12. Dutta, A., Gupta, P.K.: A mathematical model for transmission dynamics of HIV/AIDS with effect of weak CD4+ T cells. Chin. J. Phys. 56, 1045–1056 (2018) 13. Gupta, P.K., Dutta, A.: A mathematical model on HIV/AIDS with fusion effect: analysis and Homotopy solution. Eur. J. Phys. Plus 265, 134 (2019) 14. Kapur, J.N.: Math. Model. New Age International Publishers, New Delhi (2008)
Transformation of Medical Imaging Using Artificial Intelligence: Its Impact and Challenges with Future Opportunities Richa Gupta, Vikas Tripathi, Amit Gupta, and Shruti Bhatla
Abstract In healthcare sector, the people contemplate the best treatments and services notwithstanding of cost. Even if a huge amount of national budget disburses in this sector but it has not attained the society conjecture. Entire medical statistics are investigated by specialist. The complexities and the minutiae of the images and statistics can only be extrapolated by the specialists which escalate the workload and insistence of the specialists. The circumstances propagate to the need for the automated models for the healthcare systems. Artificial intelligence (AI) is a well-built domain of computer science which is attainable solution to all the real-world complications. Thus, AI can come up with exceptional and accurate solution with inordinate precision for medical imaging. Medical imaging embraces the identification, medicament and surveil the diseases in the particular images of medical fields like CT scan, X-rays, ultrasound images. AI methods can be employed to radiology, pathology, and dermatology for image processing. AI methods like deep neural networks, machine learning algorithms, fuzzy logic are some best solutions for image processing. In this paper, divergent AI techniques with their strength, limitations, and applications are delineate and the paper also provides a cognizance to contemporary approaches that attain optimum results in their respective domains. This paper concluded with the discussion of the barriers which reduced the growth of AI and the future opportunities of AI in the healthcare sector. Keywords Artificial Intelligence (AI) · Machine learning · Neural network · Fuzzy logic · Medical imaging
R. Gupta · V. Tripathi · S. Bhatla Graphic Era Deemed to be University, Dehradun, India A. Gupta (B) Graphic Era Hill University, Dehradun, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_18
201
202
R. Gupta et al.
1 Introduction During the past decades the medical sector has an inadequate quantity of statistics with images but with the encroachment in the expertise the data increases gradually. The data is unruffled having high dimensions like CT scan, MRI, and ultrasound images and has several special features associated with them. This is extremely exigent to deal with such data particularly images, so this becomes an immense significance of analysts. Thus, to analyse such a massive data is a monotonous mission and obligatory a immense endeavour. The inadequate number of expertise influences the examination in terms of human-made errors in diagnosis as well as the contradict observations of different experts. This leads to the automated solutions with less human interventions. The emerging field of computer science is artificial intelligence which escalates the swiftness and precision of computing to the next level. AI is bespeaking in approximately every area of society like weather forecasting, e-commerce, image processing, natural language processing and fraud detection [1]. Artificial intelligence has an enormous impact on healthcare systems which embrace medicine, medical research, biomedical applications, and biotechnology [2]. The healthcare system is extremely sensitive and has higher priority in all the sectors [3]. In these fields, the medical images are analysed and processed by the medical experts. The growing population and increasing number of diseases make this task very complex and tedious for the experts as the number of experts is inadequate. Subsequently, computer-aided models (CAM) are developed for performing various tedious and complex responsibilities in the field of medical imaging which assists to many doctors, pathologists and clinicians for trouble-free diagnosis procedure with the intent that the auxiliary treatment can be decided. For processing of such images, first the images should be stored in an efficient database system [4]. Quantum information system (QIP) is a system which can be used for picture archiving in medical domain [5]. This research paper examines a variety of AI methods which are essential for medical image processing and provides a comprehensive evaluation of diverse technologies. The rest of the paper is organized as follows: Sect. 2 demonstrates how AI method is implemented for medical imaging thus AI and medical imaging are described in detail, in Sect. 3 the related work is introduced, in Sect. 4 analysis and discussion is performed where the topics like future opportunities and the limitations and challenges in existing systems are discussed.
2 AI Methods in Medical Image Processing The evolution of artificial intelligence fabricates a range of complex tasks effortless for the enhancement of the society. AI optimizes all the complex computations which is not achievable beforehand. Consequently, AI gains prosperity in every domain like medical, banking, NLP, speech recognition and image processing. In medical image
Transformation of Medical Imaging Using Artificial Intelligence …
203
processing, AI is tailored in an extensive scope. Image processing using AI methods achieves higher precision and accuracy in diagnosis procedure than the traditional methods. There is no restriction associated to the image category and implemented algorithms with that particular sort of image. Thus, any kind of algorithm can be deployed on any category of image, e.g. X-ray, CT scan, MRI, and ultrasound. A number of AI methods accompanied by their strength, limitations and applications in miscellaneous disciplines are delineate in Table 1. Various libraries included in Python language for supporting medical image processing like OpenCV [6] and software like MATLAB are also applied for image reconstruction [7].
2.1 Artificial Intelligence AI is the umbrella below in which entirely the machine learning algorithms exist. AI is the supervised machine learning technique and the simulation of human actions and behaviour. Like illustrated in the flow diagram-1, multiple algorithms are catalogued in the flow diagram; with the assistance of these methods autonomous models can be developed for performing innumerable functions. Along with AI the unseen perception can be discovered into clinical decision making. AI relates the patients including resources for self-management and excerpt interpretation from formerly unattainable disorganized statistics exclusively images. This unorganized data, e.g. images, is the affluent origin of patient’s history and information along with the complexities. AI further accelerates the diagnosis process and provides targeted effective treatments (Fig. 1).
2.2 Medical Imaging Medical image processing needs an expert team which includes radiologists or radiographers for X-ray, sonographers for ultrasound, physicians with biomedical staff. Imaging such as MRI, X-ray, CT scans and bone scans is exclusive illustration of pragmatic symptomatic imaging that accelerates systematic diagnosis, prognosis, intervention, and assessment of laceration and demise that physical therapists address regularly. In Fig. 2 a variety of medical images is demonstrated. Appropriate application of medical imaging requires a multidisciplinary approach. Including the enhancement and accuracy exhibited by AI it qualified for the purpose of medical imaging. AI assists radiologists and pathologists as they adopted medical imaging to diagnose wide variety of situations.
204
R. Gupta et al.
Table 1 Different AI techniques and their strength, limitations and applications image processing Techniques
Strength
Limitations
Applications
Fuzzy logic
High precision, rapid operation can be used for complex systems, ability of human reasoning
For high accuracy complex, low speed, lack of real-time response, no feedback, number of inputs are limited
Image processing In healthcare system, detecting disease, facial patterns recognition
Artificial neural network
Learn by itself, no need of database, perform multiple tasks in parallel, easy to maintain, produce output even if input is noisy
Take more time to develop, need a large amount of data, computationally expensive, interpretability is critical
Image processing and character recognition, forecasting, speech recognition, language processing, work in real time
Support vector machine (SVM)
More effective in high-dimensional space, memory efficient, number of dimensions can be higher than samples, works well for clear margin of separation between classes
Not suitable for large data, for noisy data No probabilistic explanation of classification
Image processing, intrusion detection, handwriting recognition, fraud detection
Naïve Bayes
Easy to implement, need few amounts of training data, fast, highly scalable can be used for binary or multiclass classification
Chances of loss accuracy, cannot modify dependencies, assumption of independent predictor features
Face recognition, classification problems like medical image classification
K-nearest neighbour
Very easy to implement, new data can be added seamlessly, fast
Not suitable for large dataset, not works well for high dimension, sensitive for noisy data, missing values
Healthcare system, image processing, forecasting, fraud detection, text mining, agriculture, finance,
Decision tree
Less preprocessing needed, very intuitive, easy to explain and implement, no need to normalize data, scale data
High probability of overfitting, low prediction accuracy, calculation is complex, takes more time
Healthcare management, business management, fraud detection
Random forest
Simple to implement, Not credible for more can work with high attribute’s values and dimension data, not different data values, easy to overfit, fast speed, higher accuracy
Healthcare system, banking system, e-commerce
(continued)
Transformation of Medical Imaging Using Artificial Intelligence …
205
Table 1 (continued) Techniques
Strength
Limitations
Applications
Gaussian mixture model
Fast, robust, flexible, excellent clustering performance
Large number of components are used
Image processing, risk control
AdaBoost
High precision, weak classifiers are used for cascading
Sensitive to noisy data and outliers, have overfitting problem, time consuming
Image processing, for face detection, fraud detection
Genetic algorithm
Easy to understand, parallelized, good for noisy data, support multi-objective optimization
Difficult to implement, computationally expensive, time consuming
Image processing, DNA analysis, robotics, vehicle routing problem
Fig. 1 Various artificial intelligence methods
3 Literature Review Medical image processing has a great bounce after the advancement of computer technologies like image processing and image visualization [8]. Expert-independent feature evaluation can be done using digital image processing; it also increases the accuracy of processing images [1]. There are various types of medical images like CT images, X-ray images, ultrasound images and different diseases images so the processing is also having different steps. Computer-aided system (CAD) is designed to pre-processing images, to identify the ROI, to extract, select and classify important features from images [9]. AI and its techniques are applied to biomedical image processing, thus our main goal is to find out different approaches of AI for medical image processing. For image classification and segmentation, state-of-the-art deep learning models are evolved [3]. Breakthrough extraordinary result of AI algorithm
206
R. Gupta et al.
(b)
(a)
(d)
(c)
(e)
Fig. 2 Various medical images a X-ray b MRI c Ultrasound image d CT scan e Fundus image
for image processing gives expert-performance in medical imaging [10, 11]. Many AI methods can be used to analyse the medical images. These methods can help radiologist in the diagnosis of several diseases like heart disease, tumours, cancers and tuberculosis [2]. AI can also aid to characterize and monitoring of diseases [12]. Convolutional neural networks are specially used for medical image processing [13]. So, CNN is adapted by many researchers for medical imaging. These AI methods for image processing are adapted because these have the characteristics like robustness, repeatability, least dependency and give better accuracy. Early diagnosis process is very important for various diseases so that the patient gets proper and timely treatment. Image processing is basically started with image pre-processing and it has many ways to do this. Image pre-processing is actually the removal of noise, errors and unwanted area of image. Edge detection is the most important step in image pre-processing [14]. It filters the unwanted details of the image. Multi-feature edge detection with local fuzzy fractional detection (LFFD) is another way for edge detection [15]. LFFD is fuzzy set with pixel covering method for echocardiogram image sequence. Image fusion is a method of image pre-processing; it is done by image reconstruction, image decomposition, fusion rules and image quality assessment. To improve the quality of medical images, multi-model image fusion is applied to the images [16]. There are various factors that can affect the quality of image like colour inconsistency and variations in tissues while analysing the histopathology images [17]. In paper [18], the authors suggested a computer-aided diagnosis system by using a genetic algorithm for feature selection and then use an ensemble of neural network for classification. For efficient diagnosis process, artificial intelligence plays a great role. Medical image processing is combined with AI for better result. With
Transformation of Medical Imaging Using Artificial Intelligence …
207
the help of physician’s special comments and experience, this method can be more effective and consumes less time for diagnosis. In another research, the author developed a model which makes use of self-organizing map (SOM) for finding the region of interest (ROI). In this model, two levels of SOM approach are applied which uses colour segmentation for identifying ROI. This approach can be utilized for different types of medical images. Some researchers only focus on some special types of diseases like in [19] the researchers detect for Parkinson’s disease [20] which is done by the classification method. This system classifies the ROI which has substantia nigra in midbrain with normal brain. Image processing with AI is imposed by using neural network in MATLAB software. ANN works better for the medical image processing with sigmoid function. Cellular neural network with multiple threshold logic is utilized for medical imaging [21]. Feed forward NN and cellular NN are another concerned in the framework for medical image processing as they increase the accuracy. Deep learning is another method of AI which also incorporates to the field of medical imaging. The inner working of these methods is very complex as hundreds of neurons in NN are shifted and the associated connections should also be reached there. As the depth of the network increases, the model is able to make the more complex decisions [12]. Feature extraction is another important aspect of image processing which includes boundary extraction also. A hierarchical approach of extraction and rectification of feature boundary is done by using fuzzy multilevel thresholding with a clean-up procedure and for boundary extraction boundary detector is applied [22]. The best part of this method is the outer contour of CT which makes it easy to convert it into solid models. To extract the ultimate performance of a network model, hyper-parameter optimization is performed [13] which optimizes learning rate and dropout rate of the model. Oncology is the main research area from a decades so different type of cancers like breast, brain, lung cancer can also be detected by applying different AI algorithms like fuzzy logic, ANN [23], SVM and adaptive neurofuzzy inference system with different types of medical images [11]. SVM with Gaussian radial basis is applied to detect the blood cancer and it performs very well with microscopic images [24]. SVM also gives best performance for detection of breast cancer [25]. For making better therapy decisions for accurate biomarker assessment and histopathologic diagnosis, AI plays an important role [17]. Brain tumour identification is mainly relying on biopsy and spinal tap method but the magnetic resonant images can be digitally analysed by using AI methods like SVM, Naive Bayes and learning vector quantization [26]. With the help of magnetic resonant spectroscopy, the MR images are analysed for accurate identification of tumour and for segmentation fuzzy c-means method is adapted. In Table 2, we discuss some AI techniques and their applications for identification of different diseases by using different images.
208
R. Gupta et al.
Table 2 Different AI methods for identification of different diseases with different images S. no.
Citations
Year
Image type
Disease
Technique used
1
[18]
2013
CT scan
Liver tissue
CAD
2
[19]
2012
Ultrasound
Parkinson’s disease
ANN
3
[21]
2001
X-ray
Multiple disease
Cellular NN
4
[12]
2018
X-ray
Cancer
CNN
5
[22]
2000
CT scan
Bony tissue
Fuzzy multilevel thresholding
6
[24]
2014
Microscopic images Acute lymphoblastic leukaemia (ALL)
Gaussian Radial Basis Kernel
7
[3]
2018
Digital images
Breast cancer
Deep learning model
8
[25]
2018
Ultrasound
Breast cancer
SVM
9
[9]
2006
Computer tomography
Focal liver Lesions
Fuzzy C-means
10
[26]
2015
Magnetic resonant images
Astrocytoma (tumour grade)
Fuzzy C-means
11
[27]
2019
Fundus image
Age-related macular degeneration
SVM
12
[28]
2019
MRI image
Brain abnormalities
CNN
13
[29]
2019
Neuro image
Alzheimer’s disease
Ensemble SVM
14
[30]
2019
Chest X-ray
Pneumonia
CNN
15
[31]
2019
SPECT image
Thyroid
CNN
16
[32]
2020
Chest X-ray
COVID-19
CNN
17
[33]
2019
MRI
Dementia
18
[34]
2019
Neuroscience image Alzheimer’s disease
Fuzzy logic
19
[35]
2020
MRI
Universal SVM
Alzheimer’s disease
SVM
4 Analysis and Discussion In AI-based medical imaging, frequently practiced algorithms are CNN and SVM that can shaft complex large-scale natural image processing. AS illustrated in Table 2 artificial neural networks are enormously adapted for medical image analysis and ANN accelerates the performance of the automated model with respect to accuracy and efficiency. In Table 2 maximum diseases are identified by employing AI algorithms with distinct images. According to Table 1 which demonstrates the comprehensive comparison of many of the AI methods, ANN is the best suited method for medical imaging as it has less limitations and several strengths over the other methods.
Transformation of Medical Imaging Using Artificial Intelligence …
209
4.1 Limitation and Challenges AI-based applications have presented a great advancement in the healthcare domain. Nonetheless, because of the affectability of medicinal services information and an assortment of difficulties, we should see more refined AI techniques that can manage complex social insurance information effectively. We presume that there are boundless open doors for the improvement of the human services framework using AI-based frameworks. Apart from the above-discussed barriers, some gaps which are examined in the previous researches are: 1.
2. 3.
How to tackle the noise like colour distortion, missing features and artificial effects and lack of sufficient features, expensive instruments and computationally expensive networks with isolated pixel elimination and how to make proper size segment with small size segments [8, 16]? Is it possible to make a model which can perform more than one task [12]? How to deal with the lack of sufficient amount of data so that the proper training of the model can be exercised? But with the large amount of data the chances of overfitting of AI model are more [12].
4.2 Future Work While the possible advantage of AI is very critical, so are the underlying endeavours and cost. The medical industries are functioning together to find out the optimal solutions for medical imaging. An ongoing report at Stanford University, for instance, proposes that AI could assist radiologists with improving their understandings of mammograms by assisting with lessening the quantity of false positives that can happen because of human mistakes. Symptomatic imaging and radiology oncology are seeing inconceivable potential from AI to improve precision and speed up beforehand protracted cycles. A variety of advancements is demonstrated by the authors [36] which describe devices like health sensor for consumers, portable medical, 3D printing and genomic analysis for healthcare systems using AI. In future, new image compression techniques should be there for efficient imaging. An adequate medical image database management system is recommended for exceptional image processing [21]. Multiple abnormalities detection in whole human body AI systems could be implemented [12]. Number of feature extractor can be increased [1]. For precision medicine, advance AI model can be developed with multimers data [17]. SVM with genetic algorithm can be used to implement advance AI system. Contemporary extraction methods like structural, spectral and statistical texture extractions can be accustomed for texture extraction [25]. For tumour detection, a new model can be developed which can identify the objects such as grey or white matter, tumour and tumour boundary with its shape and grey level information [26].
210
R. Gupta et al.
5 Conclusion During recent years AI has attained contemplation to the great extent of the researchers for solving almost each and every problem of real world. AI has furnished a significant enhancement in comparison with the traditional methods. This paper summarized with various AI approaches which are utilized for image processing with divergent images in the medical field. Medical imaging embraces the identification, medicament and survey the diseases in the particular images of medical fields like CT scan, X-rays, ultrasound images. This paper explores the challenges that are associated with these technologies which decrease the speed of the advancements in the AI. Numerous analysts deal with the AI solutions with the intention to persuade the appliance of AI methods for medical imaging. This is the step towards the substitution of human with the automated models for the superior services in the medical field. The above research originates that ANN and deep learning models are expedient for medical imaging. But a number of challenges amid the AI methods are the amount of data, if the adequate amount of annotated data is not present then the training of the model will not be appropriate which results in reducing the performance of the model. The conclusion with the above research turns out that AI methods swift the results with high precision for the above described assignments like identification of diseases, treatment for specific disease and to monitor the patients.
References 1. Douglas Miller, D., et al.: Artificial intelligence in medical practice: the question to the answer? Am. J. Med. 131(2) (2018) 2. Shukla, S., et al.: Approaches of artificial intelligence in biomedical image processing a leading tool between computer vision & biological vision. 978-1-5090-0673-1/16 2016. IEEE (2016) 3. Imran Razzak, M., et al.: Deep leaning for medical image processing: overview, challenges and the future. Springer International Publishing (2018) 4. Perner, P., et al.: Image mining: issues, framework, a generic tool and its application to medicalimage diagnosis. Eng. Appl. Artif. Intell. 15 (2002) 5. Venegas-Andraca, S.E., et al.: Quantum computation and image processing: new trends in artificial intelligence (2014) 6. Sheth, S., Ajmera, A., Sharma, A., Patel, S., Kathrecha, C.: Design and development of intelligent AGV using computer vision and artificial intelligence (2018). https://doi.org/10.1007/ 978-981-10-5687-1_31 7. Yadav, B., et al.: A robust digital image watermarking algorithm using DWT and SVD (2018). https://doi.org/10.1007/978-981-10-5687-1_3 8. Chang, P.-L., et al.: Exploiting the self-organizing map for medical image segmentation. In: Twentieth IEEE International Symposium on Computer-Based Medical Systems (CBMS’07), 0-7695-2905-4/07 (2007) 9. Stoitsis, J., et al.: Computer aided diagnosis based on medical image processing and artificial intelligence methods. Nucl. Instrum. Meth. Phys. Res. A 569 (2006) 10. Joseph, R., et al.: Artificial intelligence for medical image analysis: a guide for authors and reviewers. AJR, 212 (2019)
Transformation of Medical Imaging Using Artificial Intelligence …
211
11. Shamasneh, A., et al.: Artificial intelligence techniques for cancer detection and classification: review study. Eur. Sci. J. (2017) 12. Hosny, A., et al.: Artificial intelligence in radiology: 2018. Macmillan Publishers Limited, Part of Springer Nature (2018) 13. Litjens, G., et al.: A survey on deep learning in medical image analysis (2017). https://doi.org/ 10.1016/J.Media.2017.07.0051361-8415/ @2017 Elsevier B.V. 14. Vardhana, M., et al.: Convolutional neural network for bio-medical image segmentation with hardware acceleration. Cogn. Syst. Res. 50 (2018) 15. Zhuanget, X., et al.: Local Fuzzy fractal dimension and its application in medical image processing. Artif. Intell. Med. (2004) 16. Du, J., et al.: An overview of multi-model medical image fusion (2017). https://doi.org/10. 1016/J.Neucom.2015.07.160 17. Robertson, S., et al.: Digital image analysis in breast pathology-from image processing techniques to artificial intelligence (2017) 18. Sharma, P., et al.: Computer aided diagnosis based on medical image processing and artificial intelligence methods. Int. J. Inf. Comput. Technol. 3(9), 887–892. ISSN 0974-2239 (2013) 19. Blahuta, J., et al.: Ultrasound medical image recognition with artificial intelligence for Parkinson’s disease classification: MIPRO 2012, May 21–25, 2012, Opatija, Croatia (2012) 20. Singh, J., et al.: Effect of intrinsic parameters on dynamics of STN model in Parkinson disease: a sensitivity-based study (2018). https://doi.org/10.1007/978-981-10-5687-1_37 21. Aizenberg, I., et al.: Cellular neural networks and computational intelligence in medical image processing. Image Vis. Comput. 19, 177–183, 0262-8856/00 2001 Elsevier Science, PII: S02628856(00)00066-4 (2001) 22. Kwan, M.F.Y., et al.: Automatic boundary extraction and rectification of bony tissue in CT images using artificial intelligence techniques. In: Medical Imaging 2000: Image Processing, 896 Proceedings Of SPIE, vol. 3979 (2000) 23. Giri, J.P., et al.: Neural network-based prediction of productivity parameters. https://doi.org/ 10.1007/978-981-10-5687-1_8 24. Putzu, L., et al.: Classification for Leukaemia detection using image processing techniques. Artif. Intell. Med. 62 (2014) 25. Sadoughi, F., et al.: Artificial intelligence methods for the diagnosis of breast cancer by image processing: a review. Breast Cancer—Targets and Therapy (2018) 26. Monicaet, M., et al.: A non-invasive methodology for the grade identification of astrocytoma using image processing and artificial intelligence techniques. Expert Syst. Appl. (2015) 27. Garcia, A., et al.: A machine learning approach to medical image classification: detecting age related macular degeneration in Fundus images. Comput. Electr. Eng. 75, 218–229 (2019) 28. Taloa, M., et al.: Convolutional neural networks for multi-class brain disease detection using MRI images. S0895-6111(19)30088-6, 2019 Published by Elsevier (2019) 29. Zhou, T., et al.: Multi-modal latent space including ensemble SVM classifier for early dementia diagnosis with neuroimaging data. S1361-8415(19)30166-5, 2019 Published by Elsevier B.V. (2019) 30. Yadav, S.S., et al.: Deep convolutional neural network based medical image classification for disease diagnosis. J. Big Data (2019) 31. Ma, L., et al.: Thyroid diagnosis from SPECT images using convolutional neural network with optimization. Comput. Intell. Neurosci. (2019) 32. Apostopoulos, I.D., et al.: Covid-19: automatic detection from X-ray images utilizing transfer leaming with convolutional neural networks. Phys. Eng. Sci. Med. 43, 635–640 (2020) 33. Battinenia, G., et al.: Machine learning in medicine: performance calculation of dementia prediction by Support Vector Machines (SVM). Inf. Med. Unlocked 16, 100200 (2019) 34. Munir, K., et al.: Neuroscience patient identification using big data and Fuzzy logic—an Alzheimer’s disease case study. Expert Syst. Appl. 136, 410–425 (2019)
212
R. Gupta et al.
35. Richhariya, B., et al.: Diagnosis of Alzheimer’s disease using universe support vector machine based recursive feature limination (USVM-REE). Biomed. Signal Process. Control 59, 101903 (2020) 36. Garg, N., Gupta, A., Bordoloi, D.: Impact of artificial intelligence in healthcare. Int. J. Innovative Technol. Exploring Eng. (IJITEE), 8(4S3). ISSN: 2278-3075 (2019)
A Keyword-Based Multi-label Text Categorization in the Indian Legal Domain Using Bi-LSTM V. Vaissnave and P. Deepalakshmi
Abstract In this era of information abundance, text segmentation can be used effectively to locate and extract information specific to the user’s needs, within a massive load of documents. Text categorization refers to the process of segregating a document into smaller labeled text chunks based on the semantic commonality of the contents. In general, as legal texts are filled with a lot of semantic information, text segmentation is a very crucial factor in information retrieval from legal documents. Also, such supervised classification demands huge and intense training of data for building an efficient system. Collecting such a high volume of data and manually classifying them is very expensive. Recent advancements in the field of information technology, and in particular, artificial intelligence have opened the gates of automation. This article proposes a model using deep learning techniques to split the judgment text into the issue, facts, arguments, reasoning, decision, majority concurring, and minority dissenting. To evaluate the proposed model, we conducted experiments that revealed that the bidirectional long short-term memory (Bi-LSTM) categorization technique could achieve 97% accuracy and obtained superior categorization performance. The purpose of our automation technique is to provide high-quality legal reference information from past judgment texts and also categorization of the judgment in a cost-effective and time-saving way. Keywords Text categorization · Supervised learning · Indian Supreme Court data · Bi-LSTM
V. Vaissnave (B) · P. Deepalakshmi Department of Computer Science and Engineering, Kalasalingam Academy of Research and Education, Krishnankoil, India P. Deepalakshmi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_19
213
214
V. Vaissnave and P. Deepalakshmi
1 Introduction Text categorization in the legal domain, ever since its emergence in the 1960s, has been posing challenges to the information scientist community. The categorization of legal judgment text is very complex as stated by De Araujo [1]. Legal language terms used for detailed analysis of law, detailed explanation of the law, and used for multipart data require law professionals. Judgment text is a document that includes case description, fact, the argument of the petitioner, and responder details, and the final decision of the case. In the common law domain, the judgment texts are the main origin of understanding for legal professionals. An important task is text classification, which is an area of research that obtains more incitation in the past few years. The recent deep learning methods achieved victorious automatic document classification. For a large number of accessible documents or texts on the computer, manual categorization by field experts becomes ineffective and unworkable. Accordingly, automated techniques to solve these issues minimize human effort. However, word embedding used as the unstructured text is converted to the machineunderstandable format with learning algorithms. Classified short text using learning algorithms, whereas our main objective of this paper is to specify the categorization of lengthy legal documents [2, 3]. Generally, pre-defined models are not much use for this large text categorization technique. Text categorization is a key prerequisite to several advanced applications in various fields such as linguistics identification [4], sentiment analysis [5], and Arabic text classification using deep learning algorithms [6]. Bansal et al. [7] stated that in legal analytics, the domain includes summarization, classification, information retrieval, extraction, and prediction. Judgment text summarization summarizes the lengthy documents into a short description. Generally, even a few judgment texts contain more than 600 pages. So, lawyers could not get the guidelines for their arguments and the same for judges, while drafting a final judgment. Our main contribution in this paper is on the analysis of Indian Supreme Court legal data, a dataset of legal judgment text manually labeled by legal team experts. Categorizing judgment text is a very complex task. Law professionals manually categorize the text into eight different categories including fact, issue, reasoning, the argument of petitioner, argument of the responder, decision, majority concurring, and minority dissenting to prepare the labeled dataset. Table 1 describes the eight categories of judgment text. In our proposed work, we mark the issue of deficiency of labeled training dataset and our automation method categorizes the labels automatically. We propose here a neural network-based model to process the Indian legal data and compare the performance with various baseline models. To the best of our knowledge, none of the work has been published on the automated categorization of judgment text for the Indian legal domain. All the existing works are generally based on deep learning models for legal document classification in the legal domain and usually use fixed input length. Therefore, we propose to analyze the Bi-LSTM model for categorizing the judgment text. Experiments on real datasets highlight the relevance of our proposal and open up many perspectives.
A Keyword-Based Multi-label Text Categorization in the Indian …
215
Table 1 Segmented categories S. no.
Categories
Description
1
Fact
Facts describe the history of the dispute, including the events that led to the lawsuit, the legal claims, and the defenses of each party
2
Issue
An “issue” means a point disputed by two or more parties to a lawsuit. Legal issue or issue of law is a legal question which is the foundation of a case
3
Arguments of petitioner An argument is a statement or set of statements, set forth by lawyers to convince the court to obtain the ruling in the petitioner’s favor
4
Arguments of responder An argument is a statement or set of statements, set forth by lawyers to convince the court to obtain the ruling in the responder’s favor
5
Reasoning
Legal reasoning means why and how the court, lawyer, or judge came to their decision or argument on the case. It is here that the court gives reason for its legal ruling
6
Decision
A conclusion was reached after an evaluation of facts and law. It includes final judgments, rulings, and provisional orders made by the court pending the outcome of the case
7
Majority concurring
A majority opinion is a judicial opinion agreed to by more than half of the judges of a court. A majority opinion sets forth the decision of the court and an explanation of the rationale behind the court’s decision
8
Minority dissenting
A dissenting opinion is an opinion in a legal case written by one or more judges expressing disagreement with the majority opinion of the court which gives rise to its judgment. When not necessarily referring to a legal decision, this can also be referred to as a minority report
Our contributions to the proposed work can be summarized as follows: Our objective is to categorize the judgment text. First, we initiated a new dataset for text categorization gathered from the Indian Supreme Court Web site. The corpus is 50,000 judgments. We have then labeled the dataset by expert lawyers, split the dataset into a 9:1 ratio that is 45,000 for training, 5000 for testing. Our next contribution is to employ the Bi-LSTM algorithm to categorize the text into various eight labels, and the word embedding technique that is word2vec to improve the categorization performance. 1.
2.
We scraped the 50k legal judgment texts from the Indian Supreme Court database. As far as we know, deep learning models apply better to a large dataset [1]. We conduct a large experiment to show the success of the proposed approach and learning algorithm. The model attains a higher level of accuracy and outstanding results.
216
3.
V. Vaissnave and P. Deepalakshmi
We have validated our proposed algorithm with Madras high court real-time dataset and presented the results.
2 Literature Survey De Araujo [1] used the dataset VICTOR, a novel dataset scraped from Brazil’s Supreme Court having 692 thousand documents. The authors also applied linearchain conditional random fields to leverage the sequential nature of the lawsuits, which we find to lead improvements in document type classification. Keeling [8] explained the convolution neural network (CNN) as a productive category method for legitimate manuscripts and correlated numerous learning algorithms like linear regression (LR), support vector machine (SVM), and random forest for content sequence. Wan [9] concentrated on a dataset from the US Securities and Exchange Commission (SEC) and correlated three frameworks. The first one is the hierarchical attention network (HAN), the second framework was bidirectional encoder representations from transformers (BERT), third framework was Doc2vec, and using BI-LSTM to prepare a single paper. Doc2vec + Bi-LSTM prototype was specified by them as a reasonable variety of methods since it collected good precision. Zhou [10] proposed two main classification tasks namely sentiment classification and question classification. They compared various learning algorithms like SVM, LSTM, CNN, and C-LSTM for both the classifications. C-LSTM achieved high accuracy. Zhang [11] enforced the Chinese dataset ACE 2005. The authors enforced the LSTM technique to categorize the Chinese legal documents and establish the exact connection between elements by using the shortest pathway classification. Yinglong (2018) evaluated the paper resemblance of Chinese judgment papers seized from the Chinese Supreme Court database. Originally, the authors used the toplevel ontology and domain-specific ontology for Chinese judgment classification. Elnaggar [12] proposed multi-tasking techniques and they utilized three distinct German valid datasets—Europarl 1, DCEP2, and JRC- Acquist 3 for multi-tasking. The outcomes attained for a single and multi-task overview method using a hard score and the values were as ROUGE-1, ROUGE-2, and ROUGE of 0.82, 0.75, and 0.82, respectively. German multi-label category execution using F-score, recall, and accuracy attained scores of 0.65, 0.63, and 0.67. German-to-English interpretation bilingual evaluation understudy (BLEU) score execution for all single and multi-task interpretation results in was 55.11, 36.79, and 66.6. Liu [13] Convo-LSTM with attention mechanism was much suitable for text classification. Walt [14] illustrated the computerized division of German valid standards and they utilized 601 manually broken valid standards paper. They conducted local-linear model—agnostic explanation (LIME) to categorize the valid articles. Hammami [15] used French legal dataset and explored the LSTM algorithm for text classification. Howe [16] categorized 6227 Singapore Supreme Court verdicts and they utilized ancestral device understanding methods such as LSA, GLOVE, ULMFIT. The BERT category was capable of recollection, accuracy, and F1 count
A Keyword-Based Multi-label Text Categorization in the Indian …
217
of 10, 50, and 100% for an activity dataset. Das [17] developed a custom neural network for their optimization task by added some layers or reduce some layers. Mann [18] in this paper author proposed a genetic algorithm for automatic case generation. Gupta [19] used an artificial neural network with feed forwarded and backpropagation technique for classified the images. Thomas [20] used the neural named entity recognition technique for extract the names in judicial documents. Giri [21] neural networks used to predict the prediction and classification.
3 Proposed Approach In this section, we explain about segmenting the various categories listed in Table 1 of legal judgment text and also describe the proposed architecture as shown in Fig. 1. Fig. 1 Overall architecture
218
V. Vaissnave and P. Deepalakshmi
3.1 Dataset We collected 50,000 judgments from the Indian Supreme Court Web site https:// main.sci.gov.in/. The judgments have manually categorized judgments with the help of some expert law professionals. Once the dataset was prepared and partitioned the datasets into 45,000 for training, 5000 for the testing process. A detailed description of the categories is shown in Table1. Every judgment text inspects more issues, every case is debated individually and then judgment is given. In such a case, for instance, the sequence followed may be case issue, facts, argument of petitioner, argument of responder, reasoning, majority concurring, minority dissenting, and decision. Since the unstructured judgment texts cannot be directly used for training our model. We created the labeled dataset and categorized 50,000 judgments with assistance from some law professionals.
3.2 Preprocessing Our model first displaces the special characters like punctuation, stop words, numbers, and whitespaces. Second, we also use the TreetaggerWrapper module of Python for lemmatization, (transforming a word into its root word). Here, we consider lemmatization because it converts meaningful form rather than stemming. Stemming just gives incorrect meanings also displace the last few characteristics. Finally, each word in the corpus is mapped to a pre-trained model, word2vec vector before being feed into the recurrent neural network for categorization. The following steps are involved: (i) Change digit into text, (ii) remove special characters and HTML links, (iii) return RETURN_BY_SPACE_RE symbols by space in the text, (iv) remove recurring words and stop words, and (v) convert everything to lowercase. The cleaned and labeled dataset will be given into the machine to clean the data before doing the tokenization. The machine will perform tokenization and will pass the input to the Bi-LSTM deep learning algorithm. Once the learned model achieves good accuracy, we feed the remaining 5000 datasets for verifying the judgment text that determines whether our prepared model is categorized properly into eight classes. During the test, our trained model will segment the judgment text into paragraphs like (p1, p2…pn) based on the labels. We use the word embedding technique to specify the text into the numerical value that is digitized form. Every label should contain specified keywords like the decision label including the keywords like disposed or appealed. Our prepared model, which is to identify the keywords and categorize them into the corresponding labels. In some cases, more paragraphs get categorized into one label because the keywords appeared more times. Sometimes, the paragraphs do not drop properly in corresponding labels. Text categorization is not an easy process. Legal professionals usually gather and observe judgment texts to develop the categorization from where we obtained our training dataset. Our model then finds the start location S j and end location E j of
A Keyword-Based Multi-label Text Categorization in the Indian …
219
chunks in a judgment text. We consider T i as the judgment text and L i as the number of paragraphs in the judgment text. Our algorithm will start to tokenize the text based on Mx W and Ov T. In case the text contains a list, we need to convert the text into a token in each entry of the lists using the fit_to_text method. This function should be called before we convert the text into a sequence of an integer. Upon successful conversion of text to the token, the algorithm will convert each text into a sequence of integers and assigned to S q . The S q value should be the same size before passing it to the model. So, we will apply the pad_seq. The same process will be repeated to labels also, this method is going through all labels covering all the judgment texts, to categorize the test dataset. Sometimes, S j value is determined to the position of fragment j, but after the starting point, the remaining paragraph should belong to neighborhood labels. For instance, if for a chunk [0.1 (10%), 0.25 (35%)] are the learned limits, then from each judgment text, the paragraphs starting from after the first 10% of the document till 35% of the document are added to the testing set labeled with that segment as the category.
3.3 Bi-LSTM Algorithm Our architecture is composed of three main layers with 188,793 tokens of input. We followed Braz et al. [22] used word embedding as an input layer of the Bi-LSTM. This layer transforms each token into a distributed array. The recurrent layer has two hidden LSTM, a forward and backward layer model, each with 200 memory blocks and one cell. The output of this layer uses ReLU activation. The two hidden LSTMs are combined by adding their outputs. The last layer is dense, with 8 output neurons and a Softmax activation function. Table 2 disputes our model parameters. Certain keywords are specified in all the categorized labels and based on the Table 2 Our model hyperparameters
Model
Sequential
vocab_size
5000
embedding_dim
64
max_length
200
trunc_type
‘post’
padding_type
‘post’
oov_token
‘< OOV >’
train_fraction_dev
0.8
num_epochs
10
Loss
sparse_categorical_crossentropy
Optimizer
Adam
Activation function
‘relu’, ‘softmax’
batch_size
128
220
V. Vaissnave and P. Deepalakshmi
Fig. 2 Bidirectional LSTM layers diagram
keywords; the paragraph should be split into the corresponding labels. In some cases, several keywords appear in a paragraph making prediction difficult. Our proposed method uses hints provided by the existence of the keywords that occupy the manually prepared training datasets. While testing the raw judgment texts, < br > tags utilize fragmented paragraphs. Each paragraph is inspected in turn for the presence of keywords indicators for each of the fragments. Then, every paragraph is categorized according to the contest in keyword measures. Supposing paragraphs consist of multiple keywords, the process is taking the position of the paragraph in the judgment text and checking with the categories specified to the foregoing paragraphs. Generally, the end part of every judgment text contains a court decision. Likewise, the foregoing paragraphs have represented the case of reasoning and issues of the case, and the initial paragraph should represent a factual description of the cases. If paragraphs do not contain any keywords, the training datasets do not contain the kind of paragraphs and also no label is assigned (Fig. 2). Pseudocode for Algorithm Step1
Step2 Step3
Initialize the tokenization. tx_token ← tx_token (Max_Words). Create index based on word frequency. tx_token .fitext(input value). Converts to text to sequence. Sequence ← tx_token .convert_layer(input value). Apply the padding.
A Keyword-Based Multi-label Text Categorization in the Indian …
Step4
Step5
Step6
Step7
221
output ← pad_seq(seq, max_len = max_seq_len). Convert integer to binary class matrix. lable_data ← to_cat(np.array(lblList)). indice_data ← np.arange(dataList.shape[0]). Shuffle input data. np.randomdata.shuffle (indice_data). output ← output[indice_data]. label_data ← label_data [indice_data]. valid_data ← int (v_split_data * output_data.shape[0]). Split training and validation data. trainingdata_data ← output [:-valid_data]. traininglabel_data ← label_data [:-valid_data]. valid_data ← output_data [-valid_data:] valid_label_data ← label_data [-valid_data:] End.
4 Experiments and Results We train and test our model based on the Indian legal dataset collected from the Indian Supreme Court Web site, it is a documentary collection of all type of cases. The dataset includes 50,000 documents (txt files) organized into 8 legal categories, see Table 1. The annotation of documents is done manually and completely by legal experts. After tokenization, the total parameter size is 394,889. We will shuffle the entire data and randomly splits it into training and test set with 90 and 10%. The dataset was contained all the case types, and unstructured format, we have to make a structured format by using expert lawyers. After completing the preprocessing, the input data transforms into a sequence of integers and develops the matrix shapes. Labels converted into integer value like 0, 1, up to 7, next, initialize the sequential model and add the matrix embedding function. Our proposed approach focused on the Bi-LSTM algorithm and added bidirectional layers. Next, we have added two dense layers. First layer will help us to activate ReLU functions for backpropagation purposes. Second, the dense layer is used to predict the accurate values, and the values are the dimensionality of the output space, which is described as 0, 1, and 2, 3, 4, 5, 6, 7. Sparse_categorical_crossentropy loss function will help us to calculate loss when we have labeled as an integer. ‘adam’, Optimizer technique to adjust the learning rate. In our work, we calculate the metrics of our proposed method which is measured by accuracy, precision, recall, f1 score. Dataset is divided into 128 batches and 1000 samples for each batch and model weights get changed after each batch of 1000 samples. We can change the epoch values, to monitoring the accuracy levels. Here, we can fix epoch value as 10, while we used value as 20, the training process takes more time, and our model suffer overfitting problem. Figure 3 described loss value,
222
V. Vaissnave and P. Deepalakshmi
Fig. 3 Accuracy versus loss value
Fig. 4 Line plot of sparse_categorical_crossentropy loss and categorization accuracy over training epochs
and accuracy results of our training model, while the loss value decreases, accuracy will be increased and fix the epoch value is 10. Figure 4 mentioned the accuracy and loss values during the model training and validation. Results of training accuracy were high when compared to the validation accuracy; as well the line plots for both cross-entropy and accuracy both show the absolute performance of our model.
4.1 Comparative Analysis Result 1: We also compared the various traditional baseline machine learning methods like Naive Bayes classifier with TF-IDF and Word2Vec embedding with logistic regression. The results are shown in Table 3. Our Bi-LSTM algorithm also performs good accuracy results. Bi-LSTM usually proved to perform much better
A Keyword-Based Multi-label Text Categorization in the Indian …
223
Fig. 5 a Line plot of training loss and testing loss. b Training accuracy and testing accuracy
Table 3 Comparison between our method and traditional models Supreme court dataset Methods
Accuracy % (train, test)
Model characteristics
Naive Bayes class, TF-IDF
51, 71
Predicted the features independently, but the categorization attribute has a category in the testing data, which could not notice in the train data
Word2vec and Logistic Regression
59, 54
Feature extraction is the major issue. Word2Vec embedding occupies the more memory
RNN
88, 83
When using ReLU functions, it could not process long series
CNN
86, 78
Computation is very expensive also used large dataset for training
LSTM + CNN
95.6, 86
Accuracy level good. Model suffers over fitting problem
Our Bi-LSTM
97, 88
Semantics attributes are found independently. Increase more hidden layers, change the activation function and adam optimizer is best choice for optimizing the model
accuracy [1]. But surprisingly, in our task, the same Bi-LSTM model outperforms high accuracy of 97, 88%. From this above result, we have the following considerations: It is not unexpected that the Naive Bayes class, TF-IDF and Word2vec, and logistic regression acquires the worst execution in terms of accuracy, and those algorithms were converted into multi-label classification to single-label classification and disregard the relation between the labels. But with the error propagation, the improvement is limited.
224
V. Vaissnave and P. Deepalakshmi
The deep learning models are like CNN, RNN, and Bi-LSTM methods, perform good performance than NB and LR, and the model checks the relation between the multi-labels can increase the model performance level. So the deep neural network methods such as RNN, CNN, combination of LSTM and CNN, and our Bi-LSTM achieved (86% 83%), (86% 78%), (96% 86%) and (97% 88%) accomplish importantly better than machine learning models. Here, our Bi-LSTM model achieves significant improvement than all others. Result 2: Dahiya et al. [23] used Madras high court dataset for their proposed work. Here we used the same dataset, to validate our algorithm. This dataset consists of 1200 judgment texts. First preprocessing steps involved, identify the necessary features, and our existing algorithm going to train and test. We used 971 samples for training and 108 samples for testing. Figure 5a shows the training loss decreased from 0.68 to 0.45, the testing loss value 0.66–0.45 and Fig. 5b shows that the training accuracy increased from 0.72 to 0.83, testing accuracy increased constant 0.83 in all epochs. The epoch is 5. When we used Supreme Court dataset, the training accuracy is higher than validation accuracy as shown in Fig. 4. Figure 5 shows the result of Madras high court dataset, training and validation accuracy is 0.83. May accuracy will improve when use large dataset. So here proved our algorithm is working well in other datasets.
5 Discussion 5.1 Finding Our task requires deciding the categorization goal according to legal judgment texts. Observing the report of Table 3, the proposed technique achieved a more accurate outcome. With our proposed method, we can use different variations of case judgments. Our proposed approach based on the Bi-LSTM algorithm can categorize the judgments into eight various labels based on the keywords, and the results were very accurate. This kind of categorization system helps legal experts to take legal decisions more effectively and smartly. Our results of automation method provide suggestions, based on the need of the lawyer, and the automation method will also provide the lawyers, who will provide solutions to the case at hand. This proposed method can apply various datasets. We wish that this work can assist more researchers to inspect deep learning applied to the legal field, text analysis, document classification, and multi-label classification [24]. In future, we can create the custom CNN or Bi-LSTM by adding more layers and choose the various pre-defined models to provide a better solution.
A Keyword-Based Multi-label Text Categorization in the Indian …
225
5.2 Limitations Every language has unique characteristics, so we cannot apply this technique directly; some issues happen in the preprocessing process. Also, a small amount of dataset was used for this research, accordingly, our proposed system performed well for only 50,000 various legal cases retrieved from the Indian Supreme Court Web site. May be accuracy level can be increased when a large amount of dataset is used.
6 Conclusion The Indian legal system is one of the largest judiciary systems in the world and handles a huge number of legal cases which is growing rapidly every day. The computerized documentation of Indian law is highly voluminous and complex forms. Text categorization is very important in the legal system. For our purposes, we have used 50,000 original legal documents from the Indian Supreme Court, including plenty of cases, which were manually labeled with eight categories like issue facts, reasoning, argument of the petitioner, argument of the responder, decision, majority concurring, and minority dissenting. The manual categorization method takes more cost and time. To overcome these issues, our paper presented bi-long short-term memory used to categorize the text automatically. Our automatic text categorization system will help law professionals to refer to previous case details for their future arguments. For that we initiated a Bi-LSTM model that undertaking the 50,000 tokens of the training set judgment texts. Research experiments are managed, make out the semantics accurately, also improve the accuracy level, and provided quality results. The trained model is well built sufficiently to categorize these texts with an accuracy of 97% as well as a validation accuracy of 88%. Here, we have compared traditional models and recent models with our proposed model. Also, we have tried on a different dataset as Madras high court judgments and proving the efficiency of the experimentation work. In future work focal point, on the experiment need some tuning mechanism. And additionally, some tuning mechanisms are applied to our model development. For future work, we would like to try the advanced word embedding deep learning models. However, our novel approach is not language-sensitive. So we can slightly tweak the implementation for other languages also. For example, Chinese and French language legal datasets can experiment with our approach. But every language has different characteristics, so preprocessing steps have to be handled properly. The attention mechanism can be improving our proposed methodology.
226
V. Vaissnave and P. Deepalakshmi
References 1. de Araujo, P.H.L., de Campos, T.E., Braz, F.A., da Silva, N.C.: VICTOR: a dataset for Brazilian legal documents classification. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 1449–1458 (2020) 2. Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, pp. 1480–1489 (2016) 3. Zhang, X., Chen, F., Lu, C.T., Ramakrishnan, N.: Mitigating uncertainty in document classification. arXiv preprint arXiv:1907.07590 (2019) 4. Elnagar, A., Lulu, L., Einea, O.: An annotated huge dataset for standard and colloquial arabic reviews for subjective sentiment analysis. Procedia Comput. Sci. 142, 182–189 (2018) 5. Liu, Y.H., Chen, Y.L.: A two-phase sentiment analysis approach for judgement prediction. J. Inf. Sci. 44(5), 594–607 (2018) 6. Elnagar, A., Al-Debsi, R., Einea, O.: Arabic text classification using deep learning models. Inf. Process. Manage. 57(1), 102121 (2020) 7. Bansal, N., Sharma, A., Singh, R.K.: A review on the application of deep learning in legal domain. In: IFIP International Conference on Artificial Intelligence Applications and Innovations, pp. 374–381. Springer, Cham (2019) 8. Keeling, R., Chhatwal, R., Huber-Fliflet, N., Zhang, J., Wei, F., Zhao, H.: Empirical comparisons of CNN with other learning algorithms for text classification in legal document review. In: 2019 IEEE International Conference on Big Data (Big Data) Los Angeles, CA, USA (2019) 9. Wan, L., Papageorgiou, G., Seddon, M., Bernardoni, M.: Long-length legal document classification. arXiv preprint arXiv:1912.06905 (2019) 10. Zhou, C., Sun, C., Liu, Z., Lau, F.: A C-LSTM neural network for text classification. arXiv preprint arXiv:1511.08630 (2015) 11. Zhang, L., Moldovan, D.: Chinese relation classification using long short term memory networks. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (2018) 12. Elnaggar, A., Gebendorfer, C., Glaser, I., Matthes, F.: Multi-task deep learning for legal document translation, summarization, and multi-label classification. In: Proceedings of the 2018 Artificial Intelligence and Cloud Computing Conference, pp. 9–15 (2018) 13. Liu, G., Guo, J.: Bidirectional LSTM with attention mechanism and convolutional layer for text classification. Neurocomputing 337, 325–338 (2019) 14. Waltl, B., Bonczek, G., Scepankova, E., Matthes, F.: Semantic types of legal norms in German laws: classification and analysis using local linear explanations. Artif. Intell. Law 27, 143–71 (2019) 15. Hammami, E.: Deep learning for French legal data categorization. In: International Conference on Model and Data Engineering. Springer, Cham (2019) 16. Howe, J.S.T., Khang, L.H., Chai, I.E.: Legal area classification: a comparative study of text classifiers on Singapore Supreme Court Judgments. In: Proceedings of the Natural Legal Language Processing Workshop, pp. 67–77 Minneapolis, Minesotta, Association for Computational Linguistics (2019) 17. Das, G., Panda, S., Padhy, S.K.: Quantum particle swarm optimization tuned artificial neural network equalizer. In: Soft Computing: Theories and Applications, pp. 579–585. Springer, Singapore (2018) 18. Mann, M., Tomar, P., Sangwan, O.P.: Test data generation using optimization algorithm: an empirical evaluation. In: Soft Computing: Theories and Applications, pp. 679–686. Springer, Singapore (2018) 19. Gupta, V., Singh, J.P.: Study and analysis of back-propagation approach in artificial neural network using HOG descriptor for real-time object classification. In: Soft Computing: Theories and Applications, pp. 45–52. Springer, Singapore (2019)
A Keyword-Based Multi-label Text Categorization in the Indian …
227
20. Thomas, A., Sangeetha, S.: Performance analysis of the state-of-the-art neural named entity recognition model on judicial domain. In: Soft Computing: Theories and Applications, pp. 147– 154. Springer, Singapore (2020) 21. Giri, J.P., Giri, P.J., Chadge, R.: Neural network-based prediction of productivity parameters. In: Soft Computing: Theories and Applications, pp. 83–95. Springer, Singapore (2018) 22. Braz, F.A., da Silva, N.C., de Campos, T.E., Chaves, F.B.S., Ferreira, M.H., Inazawa, P.H., et al.: Document classification using a Bi-LSTM to unclog Brazil’s Supreme Court. arXiv preprint arXiv:1811.11569 (2018) 23. Wagh, R.S., Anand, D.: A novel approach of augmenting training data for legal text segmentation by leveraging domain knowledge. In: Intelligent Systems, Technologies and Applications, pp. 53–63. Springer, Singapore (2020) 24. Dahiya, S., Tyagi, R., Gaba, N.: Streamlining choice of CNNs and structure framing of convolution layer. In: Soft Computing: Theories and Applications, pp. 705–718. Springer, Singapore (2020)
Application of Deep Learning Techniques in Cyber-Attack Detection Priyanka Dixit and Sanjay Silakari
Abstract Cyber security is an emerging technology that provides the requisite process and security framework to protect information and resources across the network. Cyber-attack is one of the critical problems of cyber security. To ensure security, a large number of the defensive systems and softwares are available such as firewalls, IDS, and antivirus for protecting information over the network. But due to change in the network algorithms, new attacks are introduced that disrupt the normal functioning of the network. Hence, there is a need of efficient system to detect and mitigate cyber-attacks. The conventional techniques of attack detection such as heuristic, fuzzy, and rule-based techniques failed to provide efficiency on large datasets; therefore, deep learning techniques were introduced. Deep learning techniques provide better performance on large datasets, and as well as they are helpful in detection of unknown attacks. This study focuses on the recent advancement in attack detection system using deep learning techniques and also presents the comparative study of techniques with performance evaluation on different attack datasets. Keywords Cyber security · Cyber-attack · Deep learning · Attack detection · Attack datasets
1 Introduction According to the reports of survey, the destruction through cyber-attacks is likely to be reached up to three trillion by 2021. Furthermore, the information storage by different data-driven companies will be increased up to a hundred times in the coming years [1]. Hence, the demand for robust security systems that ensure secure data sharing and storage also increases. Most of security mechanisms presently available are less secure, time-consuming, and fail to detect new attacks. Due to these loopholes, many cyber-attacks such as Dos, Ransomware, and data theft continuously harm the P. Dixit (B) · S. Silakari Department of Computer Science and Engineering, University Institute of Technology R.G.P.V, Bhopal, M.P, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_20
229
230
P. Dixit and S. Silakari
resources on the network. Therefore, attack detection systems are popularly used to examine network data flow to trace these attacks and generate alarms when finding intrusion activity. The methods used for attack detection are classified into three broad categories anomaly, misuse, and hybrid detection. Anomaly-based detection based on behavior parameters utilizes them to classify attacks from the network traffic. Anomaly-based detection determines patterns that depend on the assessment of data taken from normal usage [2]. On the other hand, for misuse-based detection, actions are identified based on parameters of system weaknesses and known attack signatures, but it fails to detect new or unknown attacks. In past, many techniques were used for the detection and mitigation of cyberattacks. The previous techniques such as rule-based methods, statistical methods, fuzzy-based techniques, and data mining were presented by Bhuyan et al. [3] in depth. Turner [4] presented a review of rule-based methods for attack detection and prevention. Nooribakhsh and Mollamotalebi [5] provide a deep review of statistical techniques for anomaly-based attack detection with a comparative study on different network attack datasets. Jabez and Anadha Mala [6] presented a genetic-based fuzzy rule mining technique and focused on the problem of uncertainty in network attack dataset. Unfortunately, all these techniques are not efficiently applicable to high-dimensional network attack datasets and inefficient to detect unknown or new attacks. Some methods are complex and time-consuming. Hence, for dealing with bulky datasets, intelligent techniques such as machine learning techniques and artificial intelligence techniques were very popular. Buczak [3] presented a deep review of machine learning and data mining techniques for network attack detection and also provided a comparative study, recommendation, and observations on different DM/ML techniques. This paper presents a review of different deep learning techniques and their contribution to network attack detection on popular cyber security datasets. The remaining part of the paper is arranged in such a way that Part 2 focuses on basic steps of attack detection. Part 3 focuses on different deep learning techniques with their comparative study on the basis of learning method, data type and applications, Part 4 performance evaluation, Part 5 attack detection based on deep learning techniques. Lastly, Part 6 presents the conclusion.
2 Basic Steps of Attack Detection Model The attack detection model based on deep learning consists of the following steps shown in Fig. 1 are Preprocessing that consist of normalization and transformation of data, feature selection which depends on knowledge of the particular area, and it also affects the overall performance of the system or model, learning (patterns match/mismatch), and finally, evaluation by parameters. Selection of best feature vector is a significant task of the detection system. Deep learning-based detection models can automatically learn feature vectors. These deep networks can process
Application of Deep Learning Techniques in Cyber-Attack Detection Input data/ Datasets
Preprocessing Of data
Feature selection
231
Learning (training/testing)
Evaluation /output
Fig. 1 Basic steps of the attack detection model
data in an end-to-end fashion and most of the time prove to be efficient for attack detection models. Deep learning techniques are capable to directly process raw data, can also learn features, and perform classification simultaneously [4].
3 Techniques of Deep Learning Deep learning models are the subset of machine learning that consists of varied deep networks. It allows high-computational models with multiple layers of processing to learn the representation of data at many levels of abstract layers. These techniques have vastly enhanced in many applications such as voice recognition, visual recognition, object detection, and many other areas, such as the discovery of the medicine for diseases and genomics [7]. The technique-wise classification of deep learning shown in Fig. 2 are deep brief networks (DBNs), deep neural networks (DNNs), convolution neural networks (CNNs), and recurrent neural networks (RNNs) which comes under supervised learning, whereas autoencoders, restricted Boltzmann machines (RBMs), and generative adversarial networks (GANs) are under unsupervised learning. Deep learning models are capable to learn features directly from any data representations like multimedia or texts. Therefore, deep learning models process end-to-end way. In this paper, the main focus is on the brief review of different deep network architectures and their performance on large datasets [4].
Deep Learning
Supervised
CNN
Bi-RNN
RNN
LSTM
Unsupervised
DBN
DNN
GAN
Auto Ecodern
RBM
GRU Stack Encoder
Fig. 2 Classification of deep learning technique
Sparse Encoder
Denoising Encoder
232
1.
2.
3.
4.
5.
6.
7.
P. Dixit and S. Silakari
Recurrent neural networks are the extent of standard feed-forward NN that consists of control time and sequence dependencies. The basic difference between RNN and other FFN is a recurring neuron connection, which takes the reference of previous outputs of neurons. Hence, a network unit can remember its previous state [8]. RNN is popularly used in the following areas such as language, text, and video processing. A major problem for training RNN is a known problem of gradient exploding or vanishing [8–10]. To overcome this problem, different RNN variants were introduced such as LSTM, Bi-RNN, and GRU. Convolutional neural network is a kind of deep neural network that consists of multiple layers. The popular application of CNN is image recognition or spectrograms of audio, with different architectural add-ons and variant inputs [8]. CNN architecture consists of three basic layers such as convolution, pooling, and classification layers. CNN layers are the main part of the CNN. In CNN architecture, the specific functions for filtering layers are convolution and pooling. CNN is not a fully connected network as other neural networks [11]. Deep brief network is the class of deep neural network that consists of multiple layers of hidden units with connections between the layers but not between units within each layer. The architecture of DBN includes several RBM layers that communicate with each other or the classification layer [4, 8]. The DBN training has two phases: firstly, unsupervised pre-training and then supervised fine-tuning. The RBM network is trained layer-by-layer pre-training. After that, the weights are learned by labeled data [4]. Deep neural networks are deep networks that consist of layer-wise pre-training and fine-tuning. In the training process of DNN parameters are learned using unlabeled data and then tuned with the labeled data. Hence, it can work as either supervised or unsupervised [4]. Generative adversarial network (GAN) includes two main networks that are a generator and a discriminator. The generator is used to generate synthetic data similar to the real data, and the discriminator aims to differentiate the above data. Hence, the generator and the discriminator get better with each other [4, 11]. Autoencoder consists of two components that are encoder and a decoder. The encoder is used to mine features from raw data, and the decoder reconstructs the data from the mine features [4]. In the training phase, the deviation between the input and output of the encoder and decoder is progressively reduced. The entire process is progressively unsupervised learning. The different variants of autoencoder are denoising autoencoders and sparse autoencoders [8]. Restricted Boltzmann machines consist of two main layers: first bipartite and undirected graphical model that collectively construct DBNs architecture [11]. RBM is trained by one layer at a time and worked as unsupervised learning. RBM networks are generally fully connected in nature, and input or hidden units are binary restricted [11].
Table 1 presents a brief comparison among deep learning techniques concerning their application, data types, and learning method.
Application of Deep Learning Techniques in Cyber-Attack Detection
233
Table 1 Comparison of deep learning techniques [4] Techniques
Learning method
Data types
Applications
CNN
Supervised
Raw data, feature vectors
Image processing
RNN
Supervised/unsupervised
Raw data, feature vectors
Text, video recognition, and language processing
DNN
Supervised
Feature vectors
Feature selection, classification
DBN
Supervised
Feature vectors
Feature selection, classification
GAN
Unsupervised
Raw data, feature vectors
Feature selection, classification
Autoencoder
Unsupervised
Raw data, feature vectors
Feature selection, classification
RBM
Unsupervised
Feature vectors
Feature selection, classification
4 Performance Evaluation Learning 1. 1. 2. 3. 4.
The confusion matrix is used to signify the actual and predicted related information classification which is performed by the classifier [2]. True positive (A): indicates that the attack records are correctly classified. False negative (B): represents an attack record that is incorrectly classified as a normal record. False positive (C): represents a normal record that is incorrectly predicted as an attack. True negative (D): indicates that the normal records are correctly classified.
Table 2 shows confusion matrix which consists of two class predicted attack and normal in terms of actual positive and negative records. Table 3 shows the different parameters that can be evaluated to determine the overall performance of attack detection model. 2.
Datasets Table 4 shows the brief detail of network attack datasets of cyber security [1, 7,
8]. Table 2 Confusion matrix
Predicted attack
Predicted normal
Actual positive
True positive (A)
False negative (B)
Actual negative
False positive (C)
True negative (D)
234
P. Dixit and S. Silakari
Table 3 Performance evaluation parameters for an attack detection system S. no.
Parameters
Definition
Formula
1
Accuracy (ACC)
It is a ratio of correctly classified data to total correctly or incorrectly classified data. Accuracy is an important parameter for the balanced dataset
A+D A+B+C+D
2
Precision (P)
It is the ratio of correctly classified attack data to the sum of correctly classified attack data with incorrectly predicted attack data. It also represents confidence in attack detection
A A+C
3
Recall (R)
It is the ratio of correctly classified attack data to the total of correctly classified with incorrect classified as normal data. It is also called a detection date which is the ability to recognize attacks
A A+B
4
F-measure (F)
It is the ratio of the harmonic average of the precision and the recall
2∗P∗R P+R
5
False-positive rate (FPR)
It is the ratio of normal records that predicted correctly as an attack to the sum of data correctly classified attack with normal records incorrectly predicted as an attack. It is also called a false alarm rate
C A+C
6
Specificity
It is the ratio of normal data correctly classified to the sum of correctly classified normal data with incorrect predicted as an attack
D C+D
7
False-negative rate
It is the ratio of attack data incorrectly classified as normal data to the total sum of incorrectly classified as normal data with correctly classified as attack data
B B+A
8
False-positive rate
It is the ratio of normal data incorrectly predicted as an attack to the sum of normal data incorrectly predicted as an attack with normal data correctly classified
C C+D
5 Attack Detection Based on Deep Learning Techniques Deep learning is now widely applied in many applications of cyber security in which attack detection is a more popular one. Here, we discuss some recent researches, in contrast, to attack detection using deep learning techniques. Yang et al. [7] presented a novel intrusion detection model that consists of the fusion of improved conditional variational autoencoder (ICVAE) with a deep neural network (DNN), that is, ICVAE-DNN. The model was used to learn and investigate
Application of Deep Learning Techniques in Cyber-Attack Detection
235
Table 4 Comparative study on popular cyber security datasets Dataset
Description
Attacks
Developed by
DARPA 98,99
The DARPA 1998 41 is seven weeks, and DARPA 1999 datasets are five weeks of network traffic present in packet-based format
Features
Dos, R2L, U2R, Probe
MIT Lincoln Laboratory
KDD Cup 1999
It utilizes TCP/IP 41 data from DARPA 1998 dataset. It consists of redundant and repeated records of data
Dos, R2L, U2R, Probe
University of California
NSL-KDD
It is derivative of the KDD Cup 99
41
Dos, R2L, U2R, Probe
University of California
CAIDA
CAIDA collects a variety of datasets, with different degrees of availability (public access or on request)
20
DDoS
Center of Applied Internet Data Analysis
ISCX2012
ISCX dataset was introduced in 2012 and captured the traffic of a one-week network environment
IP data
DoS, DDoS, Bruteforce, Infiltration
University of New Brunswick
Kyoto
It is publically 24 available honeypot dataset 22 that consist of real network traffic and also having a minute range of realistic normal user behavior
Session attack
Kyoto University
CIC DDoS 2019
It contains the most 85 up-to-date common DDoS attacks, which resembles the true real-world data (PCAPs)
12 DDoS attacks, Canadian MSSQL, SSDP Institute of Cyber while as Security UDP-based attacks include CharGen, NTP, and TFTP (continued)
236
P. Dixit and S. Silakari
Table 4 (continued) Dataset
Description
Attacks
Developed by
UNSW-NB15
It consists of 45 distinct IP normal and addresses malicious network traffic in a packet-based format which was created using the IXIA perfect storm tool in a small network environment over 31 h
Features
attacks are backdoors, DoS, exploits, fuzzers, or worms
Cyber Range Lab of the Australian Center for Cyber Security (ACCS)
CIC-IDS2017
It is created by using network profiles in a specific manner
80
Brute force, Canadian Portscan, Botnet, Institute of Cyber Dos, DDoS, Web, Security Infiltration
CSE-CICIDS-2018 It is created by using network profiles in a specific manner
80
Brute force, Canadian Portscan, Botnet, Institute of Cyber Dos, DDoS, Web, Security Infiltration
sparse representations between feature vectors and classes. The trained encoder is used to reduce high data dimension and to initialize the weight of DNN hidden layers so that it can acquire optimization and well tuning. The NSL-KDD and UNSW-NB15 datasets are used to evaluate the performance of the ICVAE-DNN. Meira et al. [12] presented unsupervised learning algorithms for identifying new attacks. The method explained single class-based autoencoder NN, K-means, nearest neighbor, and isolation forest. The performance of the above algorithms was analyzed on two datasets, the NSL-KDD and ISCX, and compares the results achieved from the entire algorithm. The results have shown that the applied model is successfully identified unknown attacks present in both datasets. Shone et al. [13] summarized deep learning techniques that consist of a nonsymmetric deep autoencoder (NDAE) for unsupervised feature learning. Besides, designed a deep learning classification model using stacked NDAEs. The implementation carried on GPU-based TensorFlow and evaluation on two network attack KDD Cup’99 and NSL-KDD datasets. The test results showed enhanced performance over previous approaches and the strongly preferable to use in modern NIDS. Le et al. [14] presented a framework that consists of three main parts. The first part is focused on the feature selection model. The model is a fusion of the sequence forward selection (SFS) algorithm and decision tree (DT) model. The second part is focused on the attack detection model that trained on the optimal selected features. This model is built on using recurrent neural networks (RNN) which are traditional RNN, long short-term memory (LSTM), and gated recurrent unit (GRU) techniques of deep learning. The experimental results were tested on NSL-KDD in 2010 and
Application of Deep Learning Techniques in Cyber-Attack Detection
237
ISCX in the 2012 benchmark. The final part evaluates the model and compared it with other previous models. Jiang et al. [15] presented a model based on multi-channel attack detection which consists of the fusion of the LSTM-RNNs technique. The multi-channel architecture is introduced that enhances the capability of training and achieved a high detection rate. This model creates classifiers by training neural networks with a variety of features that carried attack features input vectors and distinguish attack data from the normal input. The voting algorithm is used to verify whether the input is an attack or normal data and experiments are performed on NSL KDD dataset. The model outperforms as compared with several attack detection methods such as Bayesian or SVM classifiers. Vijayakumar et al. [16] presented an attack detection model that can effectively work in detecting future cyber-attacks. The model was built on using the deep neural network technique and evaluation of experiments on publicly available benchmark KDDCUP99 datasets. The experiments of DNNs are carried on up to 1000 epochs with different learning rates in between [0.01–0.5]. DNN model can efficiently learn the high-dimensional feature representation of the attack data. The experimental testing confirmed that DNNs perform outstandingly as compared with the classical techniques. Ding et al. [17] build an attack detection model using deep learning techniques. Convolution neural networks (CNN) explore the detection of network attacks, and the performance of the model is tested on the NSL-KDD dataset. The model performed multiclass classification and compared with conventional machine learning and deep learning techniques including random forest (RF), support vector machine (SVM), deep belief network (DBN), and long short-term memory (LSTM). The final test showed that the performance of the model was better than others. Here, Table 5 shows the performance analysis of different deep learning techniques on cyber security datasets.
6 Conclusion This paper summarizes the literature review and description of different deep learning techniques to design the architectures for attack detection systems. Deep learning techniques are very popular in many applications of cyber security. Here, discussing the brief review of deep learning techniques based model for network attack detection system. Also discussing the benchmark datasets and different parameters used to evaluate the attack detection system in the cyber security environment. It is found that deep learning techniques are successfully applied to model attack detection system and performed well in high-dimensional datasets as compared with conventional techniques. Furthermore, very fewer papers found that using the latest datasets, different more parameters for attack detection system evaluation show effectiveness. This study will be helpful for researchers to get new directions regarding robust attack detection systems by using advanced techniques.
238
P. Dixit and S. Silakari
Table 5 Performance of different deep learning techniques applied on network attack datasets References
Techniques
Performance
Datasets used
Comparison
Yang et al. [7]
ICVAE-DNN
Accuracy: 75.43 Precision: 96.20 Recall: 72.86 F-Score: 82.92 FPR: 12.96
NSL-KDD
KNN, Multinomial NB, RF, SVM, DNN, DBN, STL, SCDNN, IDCVAE, RNNIDS
Yang et al. [7]
DNN
Accuracy: 89.08% Precision: 86.05% Recall: 95.68% F-Score: 90.61% FPR: 19.01
UNSW-NB15
KNN, Multinomial NB, RF, SVM, DNN, DBN, STL, SCDNN, IDCVAE, RNNIDS
Jorge Meira et al. [12]
EF + Zscore + Autoencoder
Accuracy: 90% NSL-KDD Precision: Above 60% Recall: Below 70% F-Score: Above 60%
Zscore-kmean, Isolation forest, Zscore-Nearest neighbor
Nathan Shone et al. [12]
Deep Autoencoder
Accuracy: 99.79% (Dos) Precision: 100% Recall: 99.79% F-Score: 99.89% False alarm: 0.04%
KDD Cup’99
DBN
Nathan Shone et al. [13]
Deep Autoencoder
Accuracy: 97.73% (Dos) Precision: 100% Recall: 97.73% F-Score: 98.85% False alarm: 1.07%
NSL-KDD
DBN
Le et al. [14]
RNN,
Accuracy: 89.6%
NSL-KDD 2010
SCDNN, STL, DNN, Gaussian-Bernoulli, RBM, Naive Bayes, ANN, CART, MDPCA-DBN, Zscore + Kmeans, RNN
Le et al. [14]
RNN,
Accuracy: 94.75%
ISCX in 2012
SCDNN, STL, DNN, Gaussian-Bernoulli, RBM, Naive Bayes, ANN, CART, MDPCA-DBN, Zscore + Kmeans, RNN (continued)
Application of Deep Learning Techniques in Cyber-Attack Detection
239
Table 5 (continued) References
Techniques
Performance
Datasets used
Comparison
Le et al. [14]
LSTM,
Accuracy: 92%
NSL-KDD in 2010
SCDNN, STL, DNN, Gaussian-Bernoulli, RBM, Naive Bayes, ANN, CART, MDPCA-DBN, Zscore + Kmeans, RNN
Le et al. [14]
LSTM,
Accuracy: 97.5%
ISCX in 2012
SCDNN, STL, DNN, Gaussian-Bernoulli, RBM, Naive Bayes, ANN, CART, MDPCA-DBN, Zscore + Kmeans, RNN
Le et al. [14]
GRU
Accuracy: 91.8%
NSL-KDD in 2010
SCDNN, STL, DNN, Gaussian-Bernoulli, RBM, Naive Bayes, ANN, CART, MDPCA-DBN, Zscore + Kmeans, RNN
Le et al. [14]
GRU
Accuracy: 97.08%
ISCX in 2012
SCDNN, STL, DNN, Gaussian-Bernoulli, RBM, Naive Bayes, ANN, CART, MDPCA-DBN, Zscore + Kmeans, RNN
Jiang et al. [15]
LSTM-RNN
Detection rate: 99.23, FAR: 9.86 Accuracy: 98.94
NSL-KDD
GRNN, PNN, RBNN, KNN, Bayesian, or SVM classifiers
Vijayakumar et al. [16]
DNN
Accuracy: Multiclass 93.2% Binary class: 99.9%
KDDCUP99
LR, NB, KNN,DT, AB, RF, and SVM-rbf
Vijayakumar et al. [16]
DNN
Accuracy: Multiclass 78.1% Binary class: 96.9%
NSL-KDD
LR, NB, KNN,DT, AB, RF, and SVM-rbf
Vijayakumar et al. [16]
DNN
Accuracy: Multiclass: 66.0% Binary class:97.9%
UNSW-NB15
LR, NB, KNN,DT, AB, RF, and SVM-rbf (continued)
240
P. Dixit and S. Silakari
Table 5 (continued) References
Techniques
Performance
Datasets used
Comparison
Vijayakumar et al. [16]
DNN
Accuracy: Multiclass: 96.6% Binary class:99.2%
WSN-DS
LR, NB, KNN,DT, AB, RF, and SVM-rbf
Vijayakumar et al. [16]
DNN
Accuracy: Multiclass: 97.2% Binary class:97.9%
CICIDS-2017
LR, NB, KNN,DT, AB, RF, and SVM-rbf
Vijayakumar et al. [16]
DNN
Accuracy: Binary class: 96.4%
Kyoto
LR, NB, KNN,DT, AB, RF, and SVM-rbf
Accuracy: 80.1321%,62.320% Detection rate: 96.73 highest (Dos)
NSL-KDD
Random forest (RF), Support vector machine (SVM), deep belief network (DBN), and long short-term memory (LSTM)
Ding et al. [17] CNN
References 1. Thakkar, A., Lohiya, R.: A review of the advancement in intrusion detection datasets. In: International Conference on Computational Intelligence and Data Science (ICCIDS 2019) 2. Aljawarneha, S., Aldwairi, M., Yasseina, M.B.: Anomaly-based intrusion detection system through feature selection analysis and building hybrid efficient model. J. Comput. Sci. 25, 152–160 (2018) 3. Buczak, A.L., Guven, E.: A survey of data mining and machine learning methods for cyber security intrusion detection. IEEE Commun. Surv. Tutorials, 18(2) (2016) 4. Liu, H., Lang, B.: Machine learning and deep learning methods for intrusion detection systems: a survey. Appl. Sci. 9, 4396 (2019). https://doi.org/10.3390/app9204396 5. Nooribakhsh, M., Mollamotalebi, M.: A review on statistical approaches for anomaly detection in DDoS attacks. Int. J. Secur. Appl. 12(6), 13–26 (2018) 6. Turnera, C., Jeremiahb, R., Richardsc, D., Joseph, A.: A rule status monitoring algorithm for rule-based intrusion detection and prevention systems. Procedia Comput. Sci. 95, 361–368 (2016) 7. Yang, Y., Zheng, K., Wu, C., Yang, Y.: Improving the classification effectiveness of intrusion detection by using improved conditional variational autoencoder and deep neural network. Sensors 19, 2528 (2019). https://doi.org/10.3390/s19112528 8. Ossowicka, A.D., Pietrołaj, M., Rumi´nski, J.: A survey of neural networks usage for intrusion detection systems. J. Ambient Intell. Human. Comput. https://doi.org/10.1007/s12652-02002014-x 9. Kim, J., Kim, J., Thu, H.L.T., Kim, H.: Long short term memory recurrent neural network classifier for intrusion detection. IEEE. 978-1-4673-8685-2/16/2016 10. Radford, B.J., Apolonio, L.M., Trias, A.J., Simpson, J.A.: Network traffic anomaly detection using recurrent neural networks. arXiv:1803.10769v1 (2018) 11. Berman, D.S., Buczak, A.L., Chavis, J.S., Corbett, C.L.: A survey of deep learning methods for cyber security. Information 10, 122 (2019). https://doi.org/10.3390/info10040122 12. Meira, J., Andrade, R., Praça, I., Carneiro, J., Marreiros, G.: Comparative results with unsupervised techniques in cyber attack novelty detection. In: ISAmI 2018, AISC, vol. 806, pp. 103–112 (2019). https://doi.org/10.1007/978-3-030-01746-0_12
Application of Deep Learning Techniques in Cyber-Attack Detection
241
13. Shone, N., Ngoc, T.N., Phai, V.D., Shi, Q.: A deep learning approach to network intrusion detection. IEEE Trans. Emerg. Top. Comput. Intell. 2(1) (2018) 14. Le, T.-T.-H., Kim, Y., Kim, H.: Network intrusion detection based on novel feature selection model and various recurrent neural networks. Appl. Sci. 9, 1392 (2019). https://doi.org/10. 3390/app9071392 15. Jiang, F., Fu, Y., Gupta, B.B., Liang, Y., Rho, S., Lou, F., Meng, F., Tian, Z.: Deep learning based multi-channel intelligent attack detection for data security. IEEE (2018) 16. Vinayakumar, R., Alazab, M., Soman, K.P., Poornachandran, P., Al-Nemrat, A., Venkatraman, S.: Deep learning approach for intelligent intrusion detection system, vol. 7. IEEE Access (2019) 17. Ding, Y., Zhai, Y.: Intrusion detection system for NSL-KDD dataset using convolutional neural networks. In: CSAI’18, Association for Computing Machinery, Shenzhen, China (2018). https://doi.org/10.1145/3297156.3297230 18. Tariq, M.I., Memon, N.A., Ahmed, S., Tayyaba, S., Mushtaq, M.T., Mian, N.A., Imran, M., Ashraf, M.W.: A review of deep learning security and privacy defensive techniques Hindawi Mob. Inf. Syst. 2020, Article ID 6535834. https://doi.org/10.1155/2020/6535834 19. Qureshi, A.-U.-H., Larijani, H., Mtetwa, N., Javed, A., Ahmad, J.: RNN-ABC: a new swarm optimization based technique for anomaly detection. Computers 8, 59 (2019). https://doi.org/ 10.3390/computers8030059 20. Li, Z., Batta, P., Trajkovic, L.: Comparison of machine learning algorithms for detection of network intrusions. In: 2018 IEEE International Conference on Systems, Man, and Cybernetics. https://doi.org/10.1109/SMC.2018.00719 21. Khan, M.A., Rezaul Karim, M., Kim, Y.: A scalable and hybrid intrusion detection system based on the convolutional-LSTM network. Symmetry 11, 583 (2019). https://doi.org/10.3390/sym 11040583
Rederiving the Upper Bound for Halving Edges Using Cardano’s Formula Napendra Solanki, Pintu Chauhan, and Manjish Pal
Abstract In this paper we rederive an old upper bound on the number of halving edges present in the halving graph of an arbitrary set of n points in two dimensions which are placed in general position. We provide a different analysis of an identity discovered by Andrejak et al., to rederive this upper bound of O(n4/3 ). In the original paper of Andrejak et al., the proof is based on a naive analysis, whereas in this paper, we obtain the same upper bound by tightening the analysis, thereby opening a new door to derive these upper bounds using the identity. Our analysis is based on a result of Cardano’s formula for finding the roots of a cubic equation. We believe that our technique has the potential to derive improved bounds for the number of halving edges. Keywords k-sets · Halving edges · Graphs
1 Introduction Combinatorial geometry is a field of discrete mathematics which deals with counting certain discrete structures in geometric spaces. There are several questions in this field which are very simple to state, but finding the true solution is notoriously hard. One of the most important questions in this field is the celebrated k-set problem. Given a set of n points in Rd , a k-set is a subset of the given points which can be separated by a (d − 1)-dimensional hyperplane. The question is to find f dk (n) which is the maximum number of such sets possible for an arbitrary set of n point in general position. This question remains to be elusive even in two dimensions, the N. Solanki National Institute of Technology, Shillong, Meghalaya 793003, India P. Chauhan Parul Institute of Engineering and Technology, Parul University, Vadodara, Gujarat 391760, India M. Pal (B) Department of Computer Science & Engineering, IIT Kharagpur, Kharagpur, West Bengal 721302, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_21
243
244
N. Solanki et al.
status of which is open for over forty years. Despite significant efforts put on this problem, there is still a big gap between the upper and lower bounds on f 2k (n). In n/2 this paper, we focus our study on f 2k (n). The special case of f 2 (n) is called the halving number problem. From the perspective of finding upper bounds, it is very easy to see that f 2k (n) is at most O(n k ). One can also observe that corresponding to every valid partition of the point set into k and n − k points there exists a unique line segment joining two points belonging to the set that divides the point sets into k − 1 and n − k − 1 points. This line segment is called a k-edge. The geometric graph constructed as such is called the k-edge graph. This observation leads us to conclude that f 2k (n) is at most the number of edges in this graph which can be at most O(n k ). For n = k/2, this graph is called the halving-edge graph. It is also well known that n/2 the techniques to prove bounds for f n (n) can be translated to the general case of k f 2 (n). The first non-trivial upper bounds for the halving case were obtained by Erd˝os of O(n log n) et al. [1] which proved the upper bound of O(n 3/2 ) and a lower bound in 1970s. The upper bound was later improved to O n 3/2 log∗ n by Pach et al. [2] in 1992. This was later improved to O(n 4/3 ) by [3]. This bound was rederived by Andrejak et al. [4] in a paper that also provided an identity connecting the number of halving edges and the number of crossings in the halving-edge graph in an√arbitrary set of n points in the plane. The lower bound was improved by to ne( log n) [5]. Tóth √ √ √ This bound has been further refined recently by Nivasch to ne ln 4 ln n / ln n [6]. From the lower-bound perspective, another line of approach has been considered by looking into γ -dense sets. These are sets in which the ratio the maximum interpoint √ of distance and minimum interpoint distance is at most γ n . For dense sets, the upper bound can be improved to O(n 7/6 ) [7]. For the d-dimensional case, the upper and lower bounds are known using complex techniques in combinatorics and algebraic n/2 geometry. It is known that f n (n) = O(n d−cd ) [8] where cd is a constant that goes to zero as d increases. For lower bounds, it is known that given a construction of (ng(n)) halving edges in 2 dimension, one canextend this construction to d dimensions, achieving a lower bound of n d−1 g(n) [9]. There are certain results which are obtained by looking into special properties of the graph in three and four dimensions [10–13].
2 Preliminaries In this paper, we focus on the properties of the halving-edge graph and the identity proven by Andrejak et al. The halving graph is a graph defined by joining all the pair of points by segments whose corresponding lines partition the point set into two equal parts, i.e., consisting of (n − 2)/2 points where n is even. This graph has several interesting properties, most notably the Lovász local lemma.
Rederiving the Upper Bound for Halving Edges …
245
2.1 Lovász Local Lemma and Antipodality This result was proven by Lovász in one of the early papers on this problem. Most of the upper-bound results on this problem use this lemma. The lemma basically says that if we consider any halving edge and rotate the line corresponding to that segment about one of its end points in both the directions (clockwise and anticlockwise), then the first point that this rotating line meets also forms a halving edge when connected with the point about which the rotation is performed. Although this result is a result that only reveals local properties of the point set, it is helpful in getting some global properties regarding the halving-edge graph. This technique was used by the proof of Dey to derive the bounds of O(n 4/3 ).
2.2 Convex Chains and Dey’s Original Proof Dey in his original paper uses the notion of convex chains to derive the bound of O(n 4/3 ). The notion of convex chains is based on using the Lovász lemma to derive an equivalence relation that results in the formation of convex chains. In the paper, Dey proves that the number of convex chains constructed as such is at most O(n) and two convex chains cross at most twice with each other. Thus, creating at most O(n 2 ). Crossings which after using the crossing number inequality again implies that the number of edges in the halving-edge graph is at most O(n 4/3 ).
2.3 Main Identity In this section, we describe the main identity that is proven by Andrejak et al. that gives an exact relation between the number of crossings in the halving-edge graph and the degree of nodes in the graph. This identity can be proven using an induction and performing continuous motion on the set of points. cr (G) +
(dv + 1)/2 n/2 = 2 2 V
3 Obtaining Dey’s Bound Using Cardano’s Formula In this section, we prove some results regarding the halving-edge graph which we use to get a different analysis of the previous identity. Proofs of these are omitted as they are fairly easy to see.
246
N. Solanki et al.
Lemma 1 Let n i be the number of nodes with degree i in a halving-edge graph G and dv be the degree of a node v; then, v
dv2 =
i 2 ni
i
Lemma 2 Let n i be the number of nodes with degree i in a halving-edge graph G; then, n i is odd and n i ≤ n j ∀ ≥ j. Lemma 3 Given an undirected graph G, if d. is the maximum degree of G, then . d ≥ 2m n
3.1 Cardano’s Formula Cardano’s formula is a way to find the roots of a cubic polynomial. It was first published by Cardano’s formula in 1545. The formula says that given a cubic polynomial of the form ax 3 + bx 2 + cx + d = 0, The solution is given as
2 3
3 b2 −b3 −b3 bc d bc d c + − = + − + − + 2a 2a 3a 27a 3 6a 2 27a 3 6a 2 9a 2
2 3
3 b2 −b3 −b3 bc d bc d c b − − + + − + − + − 2a 2a 3a 3a 27a 3 6a 2 27a 3 6a 2 9a 2
This formula can be simplified as
1/2 1/3 1/2 1/3 2 2 3 2 2 3 x= q+ q + r−p + q− q + r−p +p where p=
−b c (bc − 3ad) , q = p3 + ,r = 3a 6a 2 3a
Theorem 1 The number of halving edges for an arbitrary set of n points is at most O n4/3 . Proof We prove this using a different analysis of the identity proven by Andrejak et al.
Rederiving the Upper Bound for Halving Edges …
247
(dv + 1)/2 n/2 cr (G) + = 2 2 V
In the paper by Andrejak et al., the second term in the identity which is a positive quantity is ignored to be 0, thereby implying cr (G) ≤ n 2 /8 − n/4 from which we can conclude that cr (G) = O(n 4/3 ) meeting the bound obtained by Dey. In this proof, we do not ignore the second and write the identity as an inequality with variable m where m is the number of edges cr (G) +
d 2 − 1 V
2
=
n2 n − 8 4
cr (G) =
3n d 2 n2 − − 8 8 2 V
cr (G) =
3n i 2 n i n2 − − 8 8 2 V
Let = i i 2 n i . Because of Lemma 2, M ≥ iα i 2 n a ≥ α 2 ≥ 4m 2 /n 2 where α is the maximum degree of the halving-edge graphs. Thus, applying this to the identity, we get cr (G) =
3n i 2 n i n2 − − 8 8 2 i n2 − n − α 2 /8 8 4m 2 n2 − n − 2 ≤ 8 n
cr (G) ≤
Thus, using the fact that (G) ≥
m3 , 64n 2
we conclude that
m3 n 4m 2 n2 m3 4m 2 − − ≤ ⇒ + + 64n 2 8 8 n2 64n 2 n2
n2 − n 8
≤0
Thus, we get a cubic inequality ≤ 0 with m as the variable and the cubic 2P(m) m3 4m 2 n −n polynomial P(m) as 64n 2 + n 2 + 8 . Now, if we find the roots of this polynomial using the Cardano’s formula, then we get a solution for the value of m which is the number of halving edges in the halving graph. Applying the Cardano’s formula, we
248
N. Solanki et al.
d get p = −b = −128 , q = p 3 + 2a = 3a 3 root of the polynomial P(m) as
m=
⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩
−
8.(64)3 27
−
64n 2 − 3n 3 16
−8.(64)3 27
⎡ +⎣ −
− 64n 16−3n and r = 0.. Thus, we get the
8.(64)3 27
4
−
3
64n 4 − 3n 3 16
2
⎫1 ⎤1/ ⎪ /3 2⎪ ⎬ ⎦ − 6 ⎪ 30 ⎪ ⎭ 12806
⎫1 ⎧ ⎤1/ ⎪ /3 ⎡ ⎪ 2 ⎪ ⎪ 2 ⎨ 8.(64)3 64n 2 − 3n 3 ⎣ 64n 4 − 3n 3 12806 ⎦ ⎬ 128 8.(64)3 − + − − − − − 6 ⎪ ⎪ 27 16 27 16 3 30 ⎪ ⎪ ⎭ ⎩
Therefore, if we take the asymptotically dominating terms in the expression for m, we get O(n 4/3 )., reobtaining the upper bound of Dey.
4 Discussion on the Bound of M In this section, we discuss the implications of the bound of M in the analysis of the above identity. According to our analysis, we have used a very crude lower bound on the value of M; in the following, we discuss how one can achieve better upper bounds for the number of halving edges using better lower bounds for M. (a)
(b)
From Lemma 2, we observe that i can only be an odd number. Thus, a better lower bound for M could be obtained if we get a better bound using lower i2 i2 odd although we cannot use the direct formula of odd because some i i of the terms in this series might be missing. A better bound on the number of missing terms, i.e., the number of i such that i is odd, can be used to get a better lower bound on M. Better bound for M can be obtained if we can get a reasonable estimate on n i . In the current bound, we have used a very naive lower bound for n i as 1, whereas an estimate for n i can give us a better bound.
5 Bootstrapping the Identity Analogous to Crossing Number Lemma In this section, we discuss the possibility of bootstrapping the crossing number lemma in the context of this problem. Recall that in the actual derivation of the lower bound for the number of crossing number lemma, one first removes a certain number of edges from the drawing to make it planar and then applies a probabilistic argument to find a value of probability that maximizes the number of crossings. Analogous to this approach, we can use the identity of Andrejak et al. to similarly bootstrap the crossing number lemma for this specific graph.
Rederiving the Upper Bound for Halving Edges …
249
6 Conclusion and Future Work In this paper, we have provided a new analysis of bounding the number of halving edges in two dimensions; this problems although very old not much understanding of the exact bound of halving edges is present. We present a proof that rederived an old upper bound of O(n 4/3 ). Our proof is based on a different analysis of the identity proven by Andrejak et al. Our analysis is interesting because it does not neglect the term in the identity, and we use a result of Cardano’s formula to obtain the aforementioned upper bound. This technique also allows us to apply more rigorous bounds on the neglected term, which may lead to more complex polynomials equations whose roots might be used. As part of the future work, we would like to investigate into deriving more stringent bounds on that term and also we are trying to tighten the general crossing number lemma for halving graph.
References 1. Andrzejak, A., Aronov, B., Har-Peled, S., Seidel, R., Welzl, E.: Results on k-sets and j-facets via continuous motion. In: ACM Symposium on Computational Geometry, pp. 192–199 (1998) 2. Bárány, I., Füredi, Z., Lovász, L.: On the number of halving planes. Combinatorica 2, 175–183 (1990) 3. Dey, T.K.: Improved bounds for planar k-sets and related problems. Discrete Comput. Geom. 19, 373–382 (1998) 4. Edelsbrunner, H., Valtr, P., Welzl, E.: Cutting dense point sets in half. Discrete and Computational Geometry 27, 243–255 (1997) 5. Erd˝os, P., Lovász, L., Simmons, A., Straus, E.G.: Dissection graphs of planar point sets. Surv. Comb. Theory, 139–149 (1973) 6. Kovács, I., Tóth, G.: Dense point sets with many halving lines. arxiv.org/abs/1704.00229v1 (2017) 7. Lovász, L.: On the number of halving lines. Ann. Univ. Sci. Budapest E˝otv˝os Sect. Math. 14, 107–108 (1971) 8. Matoušek, J.: Lectures on discrete geometry. Springer-Verlag, New York (2002) 9. Matoušek, J., Sharir, M., Smorodinsky, S., Wagner, U.: k-sets in four dimensions. Discrete Comput. Geom. 2(35), 177–191 (2006) 10. Nivasch, G.: An improved, simple construction of many halving edges. Contemp. Math. 453, 299–305 (2008) 11. Pach, J., Steiger, W., Szemerédi, E.: An upper bound on the number of planar k-sets. Discrete Comput. Geom. 7, 109–123 (1992) 12. Sharir, M.: An improved bound for k-sets in four dimensions. Comb. Probab. Comput. 20, 119–129 (2011) 13. Sharir, M., Smorodinsky, S., Tardos, G.: An improved bound for k-sets in three dimensions. Discrete Comput. Geom. 26, 195–204 (2001) 14. Tóth, G.: Point sets with many k-sets. Discrete Comput. Geom. 26, 187–194 (2001)
Online Teaching During COVID-19: Empirical Evidence During Indian Lockdown V. M. Tripathi and Ambica Prakash Mani
Abstract The availability of digital platforms enabled the much needed continuity in teaching–learning during challenging pandemic conditions. This research paper explores the different factors that determine the outcome of online teaching–learning process and the shortcomings faced by the key stakeholders. The impact of identified factors affecting online education on students enrolled for different courses is studied. This study provides valuable inputs to the teaching fraternity in understanding the student’s opinion on various routine issues faced in online methodology. The student-centric factors like overall study engagement, attentiveness and attitude of the students towards online classes is also found and this would contribute in enhancing the quality of teaching–learning. Most surveyed students have expressed satisfaction with the present mode of studies due to persisting lockdown conditions forced due to COVID-19. The students have shown preference towards blended learning in coming times when normal teaching in classrooms resumes. The findings of the study are based on sample comprising of 646 students from science, management, commerce, engineering, agriculture, law, humanities and fine arts. The interface factors including hardware, software and Internet are found to be extremely important along with the students and teacher in this mode of teaching. Keywords Online teaching · Digital infrastructure · COVID-19 · Learning platforms
1 Introduction The world is facing one of the most severe crises of all the times in the form of COVID-19, and just like all other sectors, the education sector is also badly hit as the schools and colleges are closed since mid-March’20. The intensity of the spread has V. M. Tripathi (B) Graphic Era Hill University, Dehradun, Uttarakhand, India A. P. Mani Graphic Era (Deemed to be) University, Dehradun, Uttarakhand, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_22
251
252
V. M. Tripathi and A. P. Mani
posed most harsh measures to be imposed in more than 22 countries affecting over 290 million students, according to UNESCO report. The report estimates that about 32 crore schools and college students are impacted in India. The big challenge that emerged was to maintain the much needed continuity of teaching and learning. The transition towards online education was rather swift as not much time was available to contemplate on the potential possibilities and threats of this mode of teaching– learning. The continuity in education which was the biggest issue was handled by online teaching and the students did not face loss of studies. Online learning is a network-enabled transfer of skills and knowledge using computers and other hardware devices and facilitated through Internet connection. According to Shaik et al. [1] instructional service quality, management and administrative services are two important dimensions of online distance learning programmes. The instructional quality refers to classroom experience where role and knowledge of the instructor comes. Management and administrative services refer to the services of the help-desk, advisors, administrative staff and university management. Unlike the chalk and talk method where physical presence of the teachers and students is mandatory, this method of teaching is possible with remotely located students present virtually. This teaching method has wide applicability in the present times where social distancing is the accepted norm. The transition to online classes was unplanned for all stakeholders in the system including the teachers in country like India where the online learning infrastructure is in infancy stage. But the transition to online classes was very important as it ensured that the students do not suffer any loss of studies and at the same time their progress is tracked through online evaluation system. The challenge was heavy for teachers, most of whom were new to the available online teaching platforms like Zoom, Webex, Google classrooms and number of other related platforms. The apprehension that the learning outcomes may not be achieved in such scenario and it may happen that students are just being engaged over the digital platforms rather than being taught was high. The transition and the challenges associated with the prevailing situation also had underlining opportunity and this transformation paved way for technology-based learning, the need for which was felt in the higher education for a number of years. It is going to be new and important form of teaching pedagogy and harbour the potential to revolutionize the complete landscape of education in the years to come post-pandemic. Learning management systems were also started to be used and soon to become a new norm in universities and schools. This also led to improvement in the processes and easy dissemination of the material used for learning. This also bought in the much needed transparency in the education process. Pandemic has proven to be a blessing in disguise for the education system which has been able to expand and test the platforms that were seldom used before. In the changing times, only those industries and individuals will have a higher rate of survival that are adaptable and are able adjust to the paradigm shift. It is realized that the use of technology can improve the quality of education but at the same time the need for improving the infrastructure is also felt. In the pandemic
Online Teaching During COVID-19: Empirical Evidence …
253
times, when technology is catalysing teaching and research in an unprecedented manner, rise of alternatives such teleconferencing is going to lower the cost burden on travel and thus save valuable time in times to come. Private schools that had already adopted smart classes have quickly seized the opportunity and have tied up with online learning platforms to conduct the classes and try to mitigate the disruption in teaching at least in urban areas.
2 Literature Review Yang et al. [2] identified credibility, security, attentiveness, reliability, accessibility and ease of use as important dimension for measuring e-service quality. Lee [3] studied satisfaction with online learning experience of Korean and American students and concluded that quality of online support services correlated with online learning acceptance and satisfaction. Martinez et al. [4] identified service quality as the core of the e-learning. The other important factors in online education are administrative services facilitate the teaching–learning, support services and user interface. Ozkan and Koselar [5] proposed an assessment model of e-learning in UK comprising of six factors including supportive services, attitude of the instructor, quality of content, service quality and system quality. Goh et al. [6] explored the e-learning environment and considered course design, interaction with the instructor, interaction with peer students on learning outcomes and satisfaction. The study concluded that interaction with the peer students was most important in determining the satisfaction. Pham et al. [7] examined relationship among e-learning service quality attributes, e-learning student satisfaction and e-learning student satisfaction in Vietnam. The results indicate that the quality of e-learning services is based on e-learning system quality, instructor and course material, administrative and support service quality. The literature covering the online education scenario in India is very limited. However, the future of online education seems to be very bright according to KPMG report that projects the online education market to be of 1964 million USD by 2021 and online higher education is set to grow by 41%. This platform is set to become the most preferred methods of learning.
3 Research Methodology This research initiative is undertaken to assess the impact of online teaching facilitated through ubiquitous Internet platform on students enrolled for higher education. As the transition towards online education was quick and there was no time for preparation for a smooth transition, it is imperative to understand the impact of this mode of teaching on students and learning. The study is conducted among students enrolled for undergraduate and postgraduate courses at Graphic Era Deemed to be University and Graphic Era Hill University.
254
V. M. Tripathi and A. P. Mani
Objectives of the research are identified as following: • • • •
To assess the impact of online teaching on students To identify the benefits of online teaching To understand the impact of online teaching on marks scored To identify the factors that contribute in enhancing teaching–learning.
Based on the different reports and newsfeed, authors own experience as a university professor and available literature a detailed questionnaire is prepared and circulated among the students using digital platform. Convenience sampling is adopted. All the surveyed students are enrolled for regular, full-time course and are currently studying through online mode. SPSS is used for data analysis using simple techniques.
4 Data Analysis Six hundred and forty six completely filled questionnaires have been considered for the study. The data collection was completed by the mid-July 2020. Most respondents (69.2%) are between 18 and 20 years, 30.8% are in age group of 21–25 years. The genderwise break-up shows that 59.6% of the respondents are females and 40.4% respondents are males. In order to ensure representation across streams, the questionnaire was circulated across students enrolled for different programmes at the undergraduate and postgraduate courses. The data comprises of 27.6% science, 12.2% managements, 41.6% commerce, 11.8% engineering, 2.5% agriculture, 2.3% law, and 2% humanities and fine arts students (Table 1). It is found that English language is adopted as the medium of instruction by 67.5% teachers, Hindi by 1.1% of the instructors and a combination of both by 31.4% of the instructors (Table 2). Table 1 Stream of the students surveyed Frequency Valid
Management
Per cent
Valid per cent
Cumulative per cent
79
12.2
12.2
12.2
Commerce
269
41.6
41.6
53.9
Engineering
76
11.8
11.8
65.6
Agriculture
16
2.5
2.5
68.1
Humanities and fine arts
13
2
2.0
70.1
Law
15
2.3
2.3
72.4 100.0
Science
178
27.6
27.6
Total
646
100.0
100.0
Online Teaching During COVID-19: Empirical Evidence …
255
Table 2 Medium of instruction by teachers Frequency Valid
Hindi
7
Per cent 1.1
Valid per cent 1.1
Cumulative per cent 1.1
English
436
67.5
67.5
68.6
Combination of both
203
31.4
31.4
100.0
Total
646
100.0
100.0
Table 3 Platforms used by students for online learning Frequency Valid
Google classroom
Per cent
Valid per cent
Cumulative per cent
569
88.1
88.1
Zoom
43
6.7
6.7
94.7
Others
34
5.3
5.3
100.0
646
100.0
100.0
Total
88.1
88.1% of the sampled students have used Google classroom as the platform for catching up online lectures, 6.7% Zoom and 5.3% other platforms like Webex (Table 3). Different variables involved in online teaching include hardware, software, cybersecurity issues, information and communication technology (ICT), mindset to study and motivation towards online classes (Table 4). We have studied the impact of stream pursued by the students and the identified factors. The highest mean of 3.63 is observed among agriculture students towards ICT, mean value of 3.62 is observed among humanities and fine arts for software support. The highest standard deviation value of 1.17 is observed for management students towards ICT; this shows the response of management students towards ICT is very heterogeneous. The issues existing at the student’s end are assessed on three parameters like attitude towards online study, overall online engagement and attentiveness in online classes. Mean values were extracted and it is found that the highest mean of 3.5000 is found for engineering students towards online study and attentiveness in online classes. Quality of instruction in online pedagogy is assessed by using parameters like interaction with teachers, personal touch, problem solving, query handling, pace of teaching, flexibility, knowledge and attitude of the instructor and study material support. These factors are rated streamwise. It is found that among the management students the highest mean of 3.55 is for knowledge of the instructor. Highest mean of 3.55 is found for attitude of the instructor and 3.52 is found for the knowledge of the instructor among commerce students. The highest mean value of 3.50 in engineering students is for study material support. The mean value of 4.00 in case of agriculture students is for the attitude of the instructor. These values show the importance of these variables.
Science
Law
Humanities and fine arts
Agriculture
Engineering
3.11
178
0.655
Std. deviation
N
15
Mean
3.00
1.038
Std. deviation
N
13
Mean
2.92
1.014
Std. deviation
N
16
Mean
3.31
1.120
Std. deviation
N
76
Mean
3.16
N
Std. deviation
Mean
269
1.051
N
Std. deviation
3.00
79
1.038
N
Mean
3.00
Mean
Management
Commerce
Hardware used
Streams pursuing
178
3.15
0.884
15
3.07
1.193
13
3.62
0.957
16
3.13
1.008
76
3.25
1.099
269
3.04
1.102
79
2.94
Software support
178
2.99
1.060
15
3.13
1.144
13
3.15
1.088
16
3.13
1.009
76
2.91
1.135
269
2.94
1.107
79
2.92
Cybersecurity issues
Table 4 Mean values across stream pursued and factors affecting online teaching
178
3.12
1.033
15
3.27
0.947
13
3.31
0.719
16
3.63
1.045
76
3.12
1.161
269
3.09
1.170
79
3.06
Information and communication technology
178
3.10
1.060
15
2.87
1.115
13
3.08
0.856
16
3.25
1.098
76
3.32
1.112
269
3.05
1.115
79
3.04
Mindset to study
178
2.9831
0.91548
15
3.1333
0.86232
13
2.9231
1.02470
16
3.1250
1.16499
76
2.9474
1.18551
269
2.8885
1.05604
79
2.9873
(continued)
Motivation towards online classes
256 V. M. Tripathi and A. P. Mani
Total
Streams pursuing
Table 4 (continued)
3.05
646
1.051
N
Std. deviation
1.066
Mean
Std. deviation
Hardware used
1.088
646
3.10
1.110
Software support
1.098
646
2.96
1.087
Cybersecurity issues
1.134
646
3.12
1.168
Information and communication technology
1.115
646
3.10
1.155
Mindset to study
1.14194
646
2.9458
1.15702
Motivation towards online classes
Online Teaching During COVID-19: Empirical Evidence … 257
258
V. M. Tripathi and A. P. Mani
As the switch towards online mode was rather rapid and forced due to prevailing lockdown conditions amidst pandemic, it becomes important to identify the potential problem areas. The different hindering factors identified are—Internet, audio, video, power, hardware, software and problems related to submissions. The highest mean value of 3.05 is for Internet as shown by commerce students. The universities where this research was carried conducted the exam on time and therefore the students across streams were asked to rate their satisfaction towards marks scored 45.8% expressed that online education enabled them to secure more marks. 34.5% expressed that online education did not make any changes to their marks and 19.7% expressed that online education lowered their marks (Table 5). Most of the surveyed students (91.9%) expressed their satisfaction towards the online mode adopted for conducting classes. Internet-related issues were the most common problem expressed by the students. Looking at Table 6 below 46.9% of students across diverse streams wish to opt for blended teaching as the preferred mode of studies in the future. Only 8% of the Table 5 Marks or results Frequency Valid
Per cent
Valid per cent
Cumulative per cent
Online education helped me secure more marks
296
45.8
45.8
45.8
Online education did not make any change in my marks
223
34.5
34.5
80.3
Online education lowered my marks
127
19.7
19.7
100.0
Total
646
100.0
100.0
Table 6 Teaching mode you prefer post-COVID-19
Valid
Frequency
Per cent
Valid per cent
Cumulative per cent
Classroom teaching
291
45.0
45.0
45.0
Online teaching
52
8.0
8.0
53.1
Blended teaching
303
46.9
46.9
100.0
Total
646
100.0
100.0
KMO and Bartlett’s test Kaiser–Meyer–Olkin measure of sampling adequacy
0.931
Bartlett’s test of sphericity
Approx. chi-square
7285.866
Df
171
Sig
0.000
Online Teaching During COVID-19: Empirical Evidence …
259
total respondents have opted for online teaching and 45% still prefer the conventional classroom teaching. The KMO value is found to be 0.931 which shows the adequacy of the data to apply factor analysis. The following table below shows that three factors have emerged. Factor matrixa Factor 1 Hardware used
0.615
Software support
0.653
Cybersecurity issues
0.592
Information and communication technology
0.634
2
Mindset to study
0.694
Motivation towards online classes
0.652
Attitude towards online study
0.637
Overall online engagement
0.672
Attentiveness in online classes
0.649
Quality of online classes
0.670
3
Interaction with teachers
0.622
Personal touch
0.578
Problem solving
0.687
Query handling
0.685
Pace of teaching
0.695
Flexibility to study
0.699
Knowledge of the instructor
0.526
Attitude of the instructor
0.510
Study material provided
0.550
Rate the quality of device
0.564
Rate the quality of hardware
0.525
Rate the quality of Internet
0.454
Audio quality Software used
0.203 0.304
Extraction method: Principal axis factoring a Three
factors extracted. Seven iterations required
Three factors have been identified, the first is named as “interface factors” comprising of hardware, software, cybersecurity, ICT issues, quality of device, quality of Internet. The second factor is “student centric” like mindset to study, motivation towards online classes, attitude towards online classes, overall online engagement, attentiveness and quality of online classes. Third set of factors are “instructor
260
V. M. Tripathi and A. P. Mani
Fig. 1 Dimensions to online teaching
centric” comprising of interaction, personal touch, problem solving, query handling, pace of teaching, flexibility, knowledge, attitude and study material provided. Figure 1 shows the factors affecting online teaching. Online teaching–learning is facilitated through interface-related factors. In online teaching–learning, the conventional proximity between the teacher and student is absent and the quality of interaction depends upon the quality of interface. Theses interface-related factors like the hardware, software and Internet connectivity are extremely important for the facilitation of online teaching and learning.
5 Discussions Online teaching emerged as need to of the hour under the prevailing conditions when everything was facing lockdown as it enabled to maintain the much needed continuity in teaching–learning. However, it was realized that it is equally important and extremely useful under normal conditions also to supplement the conventional mode of teaching. The inhibition associated with this teaching pedagogy is overcome under prevailing conditions and it is proven that this form of teaching holds immense potential and this is recognized both by the teachers as well as the students. The need is willingness to accept and use it in the right format for the facilitation of the teaching–learning process.
Online Teaching During COVID-19: Empirical Evidence …
261
6 Conclusion Digital learning has a number of advantages, the biggest being absence of physical boundaries. Remote reach and easy accessibility are other benefits. This form of learning is associated with the absence of physical presence. However, the same is proving advantageous during these times when social distancing is the norm to be adhered to, in order to curtail the spread of virus. In the present times when work from home (WFH) is the prevalent culture and the online teaching–learning platforms are facilitating learning for the students, online education is proving to be very beneficial as it is keeping the students productively occupied and also preventing the loss of academic sessions. The other benefits that emerged are that teaching pedagogy is self-paced, has scope of personalized content, remote accessibility and is convenient and safe. Blended learning is set to become the new format of teaching and learning. In the blended form, both the advantages of classroom teaching and physical presence of the teacher and the student as well as supplementary role in the form of ICT will be included and this will make the availability of recorded sessions, online submissions a usual form of course coverage.
7 Limitations and Scope for Future Studies This study does not cover the opinion of students enrolled for online initiatives such as SWAYAM, Skill India or other courses imparted through MOOC platform. The study is conducted for university students enrolled for regular programmes and are undergoing classes using online or digital platform due to COVID-19 pandemic conditions. Similar study can be conducted for students pursuing online or distance learning courses over a larger and diversified sample.
References 1. Shaik, N., Lowe, S., Pinegar, K.: DL-sQUAL: a multiple-item scale for measuring service quality of online distance learning programs. Online J. Distance Learn. Adm. 9(2), 201–214 (2006) 2. Yang, Z., Jun, M., Peterson, R.T.: Measuring customer perceived online service quality: scale development and managerial implications. Int. J. Oper. Prod. Manage. 24(11), 1149–1174 (2004) 3. Lee, W.J.: Online support service quality, online learning acceptance, and student satisfaction. Internet High. Educ. 13, 227–283 (2010) 4. Martinez-Arguelles, J.M., Callejo, B.M., Farrero, M.C.J.: Dimensions of perceived service quality in higher education virtual learning environments. Univ. Knowl. Soc. J. 10(1), 268–285 (2013) 5. Ozkan, S., Koseler, R.: Multi-dimensional students’ evaluation of e-learning systems in the higher education context: an empirical investigation. Comput. Educ. (2009)
262
V. M. Tripathi and A. P. Mani
6. Goh, F.C., Leong, M.C., Kasmin, K., Hii, K.P., Tan, K.O.: Students’ experiences, learning outcomes and satisfaction in e-learning. J. E-learn. Knowl. Soc. 13(2), 117–128 (2017) 7. Pham. L, Limu, Y.B., Bui,T.K., Nguyen, H.T., Pham, H.T.: Does e-learning service quality influence e-learning student satisfaction and loyalty? Evidence from Vietnam. Int. J. Educ. Technol. High. Educ. 16(7), 1–26 (2019) 8. https://assets.kpmg/content/dam/kpmg/in/pdf/2017/05/Online-Education-in-India-2021.pdf. Last accessed on 12 Oct 2020 9. https://en.unesco.org/covid19/educationresponse. Last accessed on 14 Oct 2020
An Ensemble-Based Method for Predicting Facebook Check-ins Shobhana Kashyap and Avtar Singh
Abstract The world of technology continues to grow, and the use of digital information is giving rise to massive datasets. To find the most popular locations around the globe, it is very challenging to find the places in chunks. Social media provides a huge dataset, and researchers can use this dataset to predict the places where individuals can establish their businesses based on their popularity. In this research work, an ensemble classification strategy has been applied to several classification methods, such as Naïve Bayes, support vector machine, and multilayer perceptron algorithm. With these two test methods, k-cross-fold and training test methods are used to predict the results. The prediction results show that the 10-cross-fold method gives highest accuracy, as compared to the training test method. The ensemble method improves the overall efficiency in location prediction. Keywords Machine learning · Supervised learning · Ensemble approach · Location prediction
1 Introduction In today’s world, millions of users are online on the World Wide Web and electronic platforms. Further, the use of cell phones has generally led to increased surfing on social platforms such as Facebook, LinkedIn, and Twitter that allow users to share their surveys, ratings, videos, sound, photos, registrations, and so forth [1]. The geological data of users located by cell phones cross any barrier between the physical and digital universe. With the rapid growth in the volume of mobile subscribers, region-based data sharing has become a skillful pattern in the location of social networks [1]. S. Kashyap · A. Singh (B) Dr. B. R. Ambedkar, National Institute of Technology, Jalandhar, Punjab 144001, India e-mail: [email protected] S. Kashyap e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_23
263
264
S. Kashyap and A. Singh
The analysis and prediction of the most popular locations play a significant role in determining which locations are the most likely to increase sales for seasonal, temporary, and mobile businesses [2]. To improve location-service experience, it is necessary to know the location of the user beforehand. Various researchers have worked in this direction to improve location-based service experiences, such as social networking services known as GeoLife [3], iGeoRec location recommendations using geographical influence [4], context model for location prediction based on the user mobility behavior [5, 6], Hoodsquare which allows users to explore activities and neighborhoods in cities around the world [7], and Facebook check-ins [1, 8, 9]. In this paper, we have proposed a machine learning model that predicts the most popular locations that are likely to be visited by users. Location-based check-ins have been collected from the Kaggle dataset [10]. In the first step, the dataset is modified by using a data preprocessing method. Then, four machine learning methods (Naïve Bayes (NB), multilayer perceptron (MLP), support vector machine (SVM), and an ensemble of Naïve Bayes and support vector machine (NB-SVM) method) are compared concerning their abilities to predict the most popular locations. The proposed model can help in predicting the accuracy of a local area that can assist the users in taking decisions regarding the choice of locations that are the most suited for their purposes such as business, malls, and historical places. The rest of the paper is organized as follows: Sect. 2 discusses the related works and comparative study of location prediction-based techniques. Section 3 presents the proposed system model. Experimental results and comparative analysis are elaborated in Sects. 4 and 5, respectively. Section 6 concludes the paper.
2 Related Work Facebook check-in is a very important feature that helps users to identify the most popular locations for their businesses, entertainment, travel, and many other purposes. Area-based services with social services improve customer understanding and improve the public activities of individuals [11]. Choosing an ideal physical location means that the location which is very popular for business purpose as it encourages individuals to establish a flourishing business [12]. Chanthaweethip et al. [11] analyzed limitations of Facebook users’ expectations by using various implicit and explicit field-related characteristics. They found that the clients who are interested to open their business, his current city strongly coincided with other field-related characteristics from client-driven and client–partner perspectives. This research work attempts to inspire “current city” on Facebook for applications based broadly on customers and was to develop a current city prediction model using an artificial neural network learning structure. Bigwood et al. [13] are interested in predicting location-sharing privacy preferences so as to reduce the burden on the users. They demonstrated that users’ day-to-day preferences for sharing their locations do not always match with their static default settings. Two approaches, a simple approach and a machine learning
An Ensemble-Based Method for Predicting Facebook Check-ins
265
approach, were introduced in this paper. Lin et al. [12] showed that the popularity of neighboring businesses is one of the major contributing characteristics in making accurate estimates. Sridhar Raj and Nandhini [14] analyzed different human successive development prediction strategies and their confinements. They made another classifier, named Apriori-based Probability Tree Classifier (APTC), which predicts the sequence pattern of human movement in an indoor. The APTC classifier was coordinated into the incorporated J48 machine learning algorithm, bringing about an ensemble model to anticipate future patterns of human movement. In August 2010, Facebook introduced area-based assistance that helps clients to see their companions’ occasions in their news sources, and they can like or comment on these check-ins. Thottakkara et al. and Chang and Sun [2, 15] examined Facebook datasets to understand users’ check-ins, including information on past check-ins, and similar check-ins areas, including time spent by their friends. They demonstrated that these parameters can be used to develop future models, where users will be able to examine different locations. They also examined how users responded to their friends’ check-ins and what factors users preferred or commented on. This research explained how well the examination of the following factors can be used to improve the condition of check-ins. Developed many models for evaluating the friendships that depend on check-in counts. This research also declared that co-checking significantly affects the friendship. Contrasted with the proposed work, Table 1 sheds light on the methodology, the datasets used for analysis, the proposed model, and the future scope.
3 Proposed System Model The proposed model is divided into four phases, namely data collection phase, data preprocessing and splitting phase, training phase, and prediction improvement phase. The implemented model is shown in Fig. 1.
3.1 Data Collection Phase The Facebook check-ins dataset is available at the Kaggle website [10] which consists of five features: event_Id, x-coordinate, y-coordinate, accuracy, and timestamp. The target value is location_Id. The dataset used for the study and one-line description of features used in the dataset are described in Table 2. Dataset Preprocessing and Splitting Phase In any machine learning process, data preprocessing is that stair in which the data get programmed or transformed into such a shape that now, the machine can straightforwardly parse it. In other words, the features of the data can now be easily interpreted by the algorithm. Data preprocessing is the first phase to regenerate a new dataset by
Methodology
Chang and Sun [15] • Allowed users to collect point-of-interest (POI) from check-ins • Analyzed how stories rankings improve, it also ensured that users see only the most relevant updates from their friends and also saw how businesses improve their installations with maximum value • A model was created to predict friendship based on check-in counts, showing how co-check-ins has an effect on the friendship
Author’s name
Technique used
Check-ins and POI data from Logistic regression San Francisco and California collected between August 2010 and January 2011
Dataset used
Table 1 Different approaches to finding accessible locations using prediction models
(continued)
• They may analyze after effects of various urban communities over the US and over the world to see whether they can imitate these outcomes in districts where technological insightfulness, security mentalities, and density of both clients and POIs might be considered unique
Future scope
266 S. Kashyap and A. Singh
Dataset used Location-sharing status from Twitter
Methodology
• Investigated 22 million check-ins among 220,000 subscribers and reported a quantitative assessment of human versatility design by examining spatial, transient, social, and literary angles related to impressions • Reported that users follow a “Levy Flight” portability design and achieve intermittent practices • It also analyzed the content and sentiment-based investigation of posts related to check-ins provided a rich key filing of the setting for a better understanding of how these administrations attract customers
Author’s name
Cheng et al. [16]
Table 1 (continued) Future scope
(continued)
• Studying the radius of • They can additionally gyration, Human Mobility investigate the social structure Patterns-displacement, and characteristic in the area returning probability sharing services to study • Exploring Factors that flock behavior Influence Mobility-geography • They can likewise customized and economic status area suggestions dependent on check-ins history and companion based social mining
Technique used
An Ensemble-Based Method for Predicting Facebook Check-ins 267
Dataset used Hong Kong City dataset
Facebook Check-ins dataset
Methodology
• Investigate and analyze the interaction between social activities and urban space • The research reflects the relating urban spatial structure and gives experiences in regards to a superior understanding of the information on social activities
• Split the variable group of friends into three subgroups: local friends, distant friends, and local non-friends • Create two different models.: a simple model and complex model • Evaluate the performance of both models based on their AUC and Accuracy • Select the best performing model for further analysis
• Designed an appropriate Facebook Check-ins Dataset generative model for the data from the Kaggle • Proposed a discriminative approach to learn the parameters • Compared the performance of all models using mean, precision and accuracy
Author’s name
Chen et al. [17]
Vernack [18]
Kumar et al. [8]
Table 1 (continued)
K-Nearest Neighbors, Discriminative Model, Generative Model, Ensemble
Random Forest (CART)
–
Technique used
–
(continued)
• This research is limited due to selection effects • Check-ins of friend’s data that is not available
• Urban users using location information from cell phones can consider top to bottom investigation of spatial structure
Future scope
268 S. Kashyap and A. Singh
Technique used
• The proposed model helps in NB, MLP, SVM, and NB-SVM Facebook Check-ins Dataset the location prediction of (an ensemble approach) from the Kaggle infrequent patterns • Initially, dataset clusters are created on accuracy basis using candidate locations • Secondly, data splitting techniques and various models are applied for predicting the locations • The optimization classifiers have been achieved by checking the highest accuracy model
Proposed work (NB-SVM)
Dataset used
Methodology
Author’s name
Table 1 (continued) For intelligent prediction and handle large datasets, a cloud environment can be used
Future scope
An Ensemble-Based Method for Predicting Facebook Check-ins 269
270
S. Kashyap and A. Singh
Fig. 1 Data flow diagram of proposed model
Table 2 Dataset description Name of the features
Missing values
A one-line description of the feature
event_Id
0
It specifies the record number enters for any event
x-coordinate
0
The first coordinate bounded between the given range lies between [0–10]
y-coordinate
0
The second coordinate bounded between the given range lies between [0–10]
Accuracy
0
It specifies the accuracy of the exactness of x and y coordinates and range lies between [166–1019]
Timestamp
0
It specifies the passed away time in the form of minutes
location_Id
0
It specifies the place at which the registration was made
applying some transformations. It consists of a data cleaning process that is performed to remove duplicate entries from the original dataset. Ranker’s algorithm [15, 19] has been used to examine the relevant and irrelevant features. The accuracy feature is awarded by the highest rank, so the dataset is sorted according to this feature. The event_Id feature got zero rank; this feature has irrelevant information so this feature has been removed from the dataset. After performing data preprocessing, a new dataset containing 51,468 instances is generated. These instances are now divided into 10 classes based on their accuracy limits. The clusters of classes, as per the given accuracy range, are shown in Table 3. The dataset has been divided into two parts, with 70% of the observations as the training part and 30% of the observations as the testing part [20]. The random
An Ensemble-Based Method for Predicting Facebook Check-ins
271
Table 3 Formation of classes Accuracy range
Class name
If the accuracy values lie between 166 and 200
Class A
Count 5435
If the accuracy values lie between 201 and 300
Class B
3405
If the accuracy values lie between 301 and 400
Class C
9862
If the accuracy values lie between 401 and 500
Class D
15,668
If the accuracy values lie between 501 and 600
Class E
3260
If the accuracy values lie between 601 and 700
Class F
3720
If the accuracy values lie between 701 and 800
Class G
2993
If the accuracy values lie between 801 and 900
Class H
4018
If the accuracy values lie between 901 and 1000
Class I
138
If the accuracy values lie between 1001 and 1019
Class J
2969
division has been done in order to ensure consistency in the testing set as well as the training set. Training Phase This section discusses various classification models used for finding the most popular locations. Initially, three different models Naïve Bayes (NB), multilayer perceptron (MLP), and support vector machine (SVM) are used to compare the prediction accuracy, generalization ability, and reliability. Benefits of these algorithms are as follows: These methods are easier to understand and interpret results. These parametric models are very fast to learn from data. They do not require as much training data and can work well even if the fit to the data is not perfect [33]. Naïve Bayes: Naïve Bayes is one of the classification methods based on Bayes’s theorem [20]. The Naïve Bayes classifier works on a strong freedom presumption and implies that the likelihood of one feature does not influence the possibility of the other. The model is easy to frame and especially helpful for enormous informational indexes like large datasets. In this method, features should be self-directed to one another. It is quick and straightforward to foresee the classes of the test datasets. The technique works well in the prediction of multiclass [21]. This classifier plays well when it holds independent assumptions [22]. One of the advantages of this method is that it gives good results in categorical data variable(s) as compared to numerical data variable(s). Multilayer Perceptron: The multilayer perceptron has been driven by the brain of humans and is named an artificial neural network [11, 23]. This is an area that examines how simple models of the biological brain can be used to understand annoying computational assignments, for example, the predictive modeling work we do in machine learning. The objective is not to construct some real models of the mind; however, to create powerful computation and information structures, we can use to demonstrate the robust algorithm. The intensity of the neural network arises from the ability to become familiar with the depiction in your training data and how
272
S. Kashyap and A. Singh
it is best related to the output variable that you need to estimate. This helps to learn mapping in neural networks. Scientifically, they are equipped to learn any function of mapping, and an all-inclusive estimate has been demonstrated for calculation. Support Vector Machine (SVM): SVM is a stimulating algorithm, and the concept is fairly simple. The algorithm has its own distinctive way of execution as compared to other machine learning algorithms. They are very popular because of their capability to handle numerous continuous and categorical variables [21, 24]. The objective of SVMs is to prepare a model that appoints new unseen objects into a specific classification. For high-dimensional spaces, this is a handy tool that helps in the ranking. For making a decision, initially, training points are used, and these points are then used to make a decision and are stored in the memory. This method recommends for separation of highly nonlinear classes. Multiclass SVM expects to allot names to the cases by utilizing support vector machines, where the names are drawn from a limited arrangement of a few components. The one-versus-all methodology is the way toward building parallel classifiers which recognize one of the labels and the rest [20]. Prediction Improvement Phase Ensemble methods help in improving prediction results by combining several models. These methods have a plus point that they can be made to adjust to any adjustments in the checked data stream more accurately than a single model procedure [25, 26]. Ensemble methods [21] are meta-algorithms that join several machine learning techniques into one predictive model in order to overcome the variance (bagging) [14], bias (boosting), or improve predictions (stacking) [27]. Naïve Bayes—Support Vector Machine (NB-SVM): We used NB-SVM-based ensemble method for improved results. First, the two algorithms yielding the best results were added together, and then, we extracted the results based on their accuracy. We found that this new scheme gave better results. The NV-SVM algorithm can be shown in Fig. 2. Start [Input: Facebook Check-ins Dataset] Step 1. Generate a dataset after preprocessing. Step 2. Split the data into two set: a training set and a testing set. Step 3. Applying conventional models on it. Step 4. Extract the model with the highest accuracy and giving them ranks. Step 5. Aggregating the best two models based on the accuracy evaluation parameter. Step 6. Calculate the experimental Results. Stop [End of the algorithm] Fig. 2 NB-SVM algorithm
An Ensemble-Based Method for Predicting Facebook Check-ins
273
4 Experimental Results The proposed work on the Facebook check-ins dataset has been implemented using WEKA [28, 29]. In this work, we have selected relevant features using the Ranker’s approach, and then, three classification algorithms have been applied, namely NB, MLP, SVM, and evaluated their performance. The confusion matrix can also be called the error matrix, each row represents the actual values whereas, and each column represents the predicting values. Table 4 shows a sample format of a confusion matrix with n classes [30]. Total false negative (TFN), total false positive (TFP), and total true negative (TTN) for each class i can be calculated using Eqs. 1, 2, and 3, respectively, [30]. n
TFNi =
xi j
(1)
x ji
(2)
j =1 j = i TFPi =
n
j =1 j = i TTNi =
n
n
(3)
x jk
j =1 j =1 j = i k = i Overall, total true positive (TTP) can be obtained through Eq. 4. n
TTPall =
(4)
xjj
j =1
Table 4 Confusion matrix Predicted class Actual class
A1
A2
…
An
A1
X 1,1
X 1,2
…
X 1,N
A2 .. .
X 2,1 .. .
X 2,2 .. .
… .. .
X 2,N .. .
An
X N,1
X N,2
…
X N,N
274
S. Kashyap and A. Singh
To compute the generalized precision (P), recall (R), and specificity (S) for each class i, Eqs. 5, 6, and 7 are used. Pi =
TTPall TTPall + TFPi
(5)
Ri =
TTPall TTPall + TFNi
(6)
Si =
TTNall TTNall + TFPi
(7)
The overall accuracy of the given confusion matrix is given in Eq. 8. Overall Accuracy =
TTPall Total number of Testing Entries
(8)
Some classification parameters help to predict the result is Kappa statistics. The interpretation of Kappa statistics is given in Table 5. The formula of Kappa statistics is given by Eq. 9. Kappa =
Pr(O) − Pr(E) 1 − Pr(E)
(9)
where Pr(O) represents observed probability agreement, and Pr(E) represents the expected probability agreement. The data layout can be shown in Fig. 3. As shown in Fig. 3, p and s show agreement between the two observers, while q and r show disagreement between the two observers. The value of q and r will be zero if there is no disagreement between Table 5 Interpretation of Kappa
Fig. 3 Data layout
Kappa agreement α4, job allocation is done by the following: Step 3: Now we have to initialize all the parameters of the particle swarm. The size of the particle swarm (N) depends on the experiment, and its value is given before the start of the algorithm. The values of the parameters are as taken as q = 0.8, q1 = 2, q2 = 1.3. (3.1) Now we have to initialize the position for each particle, So from here a population set has been initialized. So we have taken random matrices which will be treated as position of the particles. Then the matrices are normalized. (3.2) Take random velocity as a trapezoidal matrix, i.e. every element of the matrix is a trapezoidal fuzzy number for the first case. (3.3) Take random velocity as a pentagonal matrix, i.e. every element of the matrix is a pentagonal fuzzy number for the second case. (3.3.1) t = t + 1 (Here we will start the iteration process from t = 1 to the maximum iteration, which can be changed in the programming code depending on the requirement of the coder). (3.3.2) A leader set is selected from the population set. (3.3.3) The velocity and position of each particles are updated. (3.3.4) The makespan value and flowtime value of each particle are calculated. (3.3.5) Now update the personal best and global best of each particle, respectively. (3.3.6) Add non-dominated particles to non-dominating front. (3.3.7) Determine domination of new non-dominating front members.
340
D. Dutta and S. Rath
(3.3.8) Keep only non-dominated members in the non-dominating front. (3.3.9) Now for each particle, the position matrix is normalized. (3.4) The iteration process is continued until the maximum iteration is achieved. Step 4: Repeat the process as long as the grid is active.
4 Experiment Now we have taken some parameters required to solve the problem. They are inertia weight (q) = 0.8. Acceleration coefficient q1 and q2 are as follows 2 and 1.3, respectively. The two random numbers are generated automatically. For first case, we have taken velocity matrix with each element as trapezoidal fuzzy number. For second case, we have taken velocity matrix with each element as pentagonal fuzzy number. In the optimal schedule tables, we use a technique, that is grids and jobs are represented row wise and column wise, respectively, and then ‘1’ represents job assigned to the Grid and ‘0’ represents no job assigned to the grid [13]. Experiment 1: Number of grid nodes = 3 and number of jobs = 7. Here ‘1’ represents job assigned to the grid and ‘0’ represents no job assigned to the grid. Total number of particles (N) = 20.
4.1 Optimal Schedule with Trapezoidal Fuzzy Number Here the grid speed that is taken in this problem is as following—19.09, 27.017, 29.45 and the time required for each job is as following—69.73, 115.73, 19.34, 89.86, 128.99, 99.90, 82.96 respectively. In Table 1, all the dominant members with their makespan value and flowtime value are shown. Here in this experiment, there are nine dominant members Fig. 1. Table 1 All dominant members
Dominant members Cost [makespan value; flowtime value] 1
[10.6384352009706;3.10803175651822]
2
[8.45546803157101;3.19041893765804]
3
[4.77418728604309;89.7328665437899]
4
[13.5468650334225;2.93336151112409]
5
[19.0438984110504;1.84856186548972]
6
[13.7370847810460;2.74174725677385]
7
[38.0400344789890;1.73688320878820]
8
[8.71542080902895;3.12813671335807]
9
[6.93210063560654;5.50810020029984]
Job Scheduling on Computational Grids Using …
341
Table 2 Optimal schedule of dominant member 2 J1
J2
J3
J4
J5
J6
J7
G1
1
1
0
0
0
1
0
G2
0
0
0
1
1
0
0
G3
0
0
1
0
0
0
1
Fig. 1 Red-coloured points are dominant members, and joining all the dominant points will form a Pareto-optimal curve. Other points present in the graph are dominated points
Since it is a multi-objective optimization problem, each dominant member gives a solution, that is each dominant member gives an optimal schedule. Table 2 is the optimal schedule by the dominant member 2. Here job 1 assigned to grid 1, job 2 assigned to grid 1, job 3 assigned to grid 3, job 4 assigned to grid 2, job 5 assigned to grid 2, job 6 assigned to grid 1, job 7 assigned to grid 3. With increase in number of iterations, we get particles whose makespan and flowtime values are minimized, until we reach maximum iteration.
4.2 Optimal Schedule with Pentagonal Fuzzy Number Here the grid speed is as following—47.66, 8.08, 39.89, and the time required for each job is as following—28.13, 17.84, 64.98, 135.90, 92.86, 14.33, 24.04 respectively. Here ‘1’ represents job assigned to the grid, and ‘0’ represents no job assigned to the grid.
342
D. Dutta and S. Rath
Table 3 All dominant members
Dominant members Cost [makespan value; flowtime value] 1
[5.85399744399483;138.489278987881]
2
[8.58119864313159;3.57772926813010]
3
[8.12025230972839;3.84565824356580]
4
[10.1821154633630;2.29411268862426]
5
[7.28998249534641;11.8570166150832]
6
[22.6523082812147;2.08788255096531]
7
[115.157452009187;1.43545972167154]
Table 4 Dominant member 4 J1
J2
J3
J4
J5
J6
J7
G1
1
0
0
0
1
0
0
G2
0
1
0
1
0
1
0
G3
0
0
1
0
0
0
1
In Table 3, all the dominant members with their makespan value and flowtime value are shown. Here in this experiment, there are seven dominant members (Fig. 2). Since it is a multi-objective optimization problem, each dominant member gives a solution, that is each dominant member gives an optimal schedule . Table 4 is the optimal schedule. Here job 1 assigned to grid 1, job 2 assigned to grid 2, job 3 assigned to grid 3, job 4 assigned to grid 2, job 5 assigned to grid 1, job 6 assigned to grid 2, job 7 assigned to grid 3. With increase in number of iterations, we get particles whose makespan and flowtime values are minimized, until we reach maximum iteration. Experiment 2: Number of grid nodes = 4 and number of jobs = 19. Here ‘1’ represents job assigned to the grid and ‘0’ represents no job assigned to the grid. Total number of particles (N) = 20.
4.3 Optimal Schedule with Trapezoidal Fuzzy Number Here the grid speed is as following—23.82, 35.64, 28.83, 5.67, and the time required for each job is as following—104.33, 119.61, 65.37, 81.28, 74.86, 36.39, 132.58, 141.25, 7.02, 108.82, 52.76, 17.62, 147.10, 13.067, 22.19, 73.99, 30.33, 104.16, 31.83 respectively. Here ‘1’ represents job assigned to the grid, and ‘0’ represents no job assigned to the grid. In Table 5, all the dominant members with their makespan value and flowtime value are shown. Here in this experiment, there are 10 dominant members. Since it is a multi-objective optimization problem, each dominant member gives a solution, that is each dominant member gives an optimal schedule. Table 6 is the optimal schedule. Here job 1 assigned to grid 4, job 2 assigned to grid 3, job 3 is
Job Scheduling on Computational Grids Using …
343
Fig. 2 Red-coloured points are dominant members, and joining all the dominant points will form a Pareto-optimal curve. Other points present in the graph are dominated points Table 5 All dominant members Dominant members
Cost [makespan value; flowtime value]
1
[6.946936547201050;16.739297916442670]
2
[14.659486704830455;2.563739712601547]
3
[6.969653592728622;7.014856721631054]
4
[9.02753715803962;5.08822223531613]
5
[9.52664677862213;3.94375239367574]
6
[31.1399729253217;2.00853013886245]
7
[7.81660807354773;5.14579857433650]
8
[19.6973273093506;2.25310503441691]
9
[44.6164723913943;1.92980761529184]
10
[13.3058849665680;3.29327694067716]
Table 6 Dominant member 5 J1 J2 J3 J4 J5 J6 J7 J8 J9 J10 J11 J12 J13 J14 J15 J16 J17 J18 J19 G1 0
0
0
0
0
1
1
0
0
0
0
0
0
1
0
1
0
0
0
G2 0
0
0
0
0
0
0
0
0
0
1
1
1
0
0
0
0
0
1
G3 0
1
1
0
1
0
0
0
1
1
0
0
0
0
1
0
0
0
0
G4 1
0
0
1
0
0
0
1
0
0
0
0
0
0
0
0
1
1
0
344
D. Dutta and S. Rath
scheduled on grid 2, job 4 is scheduled on grid 4, job 5 is scheduled on grid 3, job 6 is assigned to grid 1, job 7 assigned to grid 1, job 8 assigned to grid 4, job 9 assigned to grid 3, job 10 assigned to grid 3, job 11 assigned to grid 2, job 12 assigned to grid 2, job 13 assigned to grid 2, job 14 assigned to grid 1, job 15 assigned to grid 3, job 16 assigned to grid 1, job 17 assigned to grid 4, job 18 assigned to grid 4, job 19 assigned to grid 2. With increase in number of iterations, we get particles whose makespan and flowtime values are minimized, until we reach maximum iteration.
4.4 Optimal Schedule with Pentagonal Fuzzy Number Here the grid speed is as following—40.11, 35.88, 47.73, 47.88 and the time required for each job is as following—121.7649, 50.02, 19.94, 125.62, 55.22, 130.92, 54.20, 95.65, 57.75, 26.74, 32.62, 134.42, 74.42, 37.24, 42.13, 35.81, 11.42, 66.08 respectively. Here ‘1’ represents job assigned to the grid and ‘0’ represents no job assigned to the grid. In Table 7, all the dominant members with their makespan value and flowtime value are shown. Here in this experiment, there are eight dominant members (Fig. 3). Since it is a multi-objective optimization problem, each dominant member gives a solution, that is each dominant member gives an optimal schedule. The above Table 8 is the optimal schedule. Here job 1 assigned to on grid 4, job 2 assigned to grid 3, job 3 assigned to grid 2, job 4 assigned to grid 2, job 5 assigned to grid 1, job 6 assigned to grid 3, job 7 assigned to grid 1, job 8 assigned to grid 2, job 9 is scheduled on grid 3, job 10 assigned to grid 1, job 11 assigned to grid 4, job 12 assigned to grid 3, job 13 assigned to grid 3, job 14 assigned to grid 1, job 15 assigned to grid 1, job 16 assigned to grid 2, job 17 assigned to grid 3, job 18 assigned to grid 3, job 19 assigned to grid 3. With increase in number of iterations, we get particles whose makespan and flowtime value is minimized, until we reach maximum iteration. Then we have taken some more examples with more grid nodes and jobs, i.e. 10 grid nodes and 50 jobs, 40 grid nodes and 100 jobs. We are getting similar result. Here the termination criterion is maximum iteration. Table 7 All dominant members
Dominant members Cost [makespan value; flowtime value] 1
[16.2611411181259;1.97575917382047]
2
[43.3420287205458;1.61351170570663]
3
[57.5141488889569;1.47959210997574]
4
[6.56397554910838;10.0771996593249]
5
[7.71117323867537;2.00973620596447]
6
[21.7383936806210;1.73657564817898]
7
[25.8650672452948;1.63369347892493]
8
[7.41135422349121;4.34876614318110]
Job Scheduling on Computational Grids Using …
345
Fig. 3 Red-coloured points are dominant members and joining all the dominant points will form a Pareto-optimal curve. Other points present in the graph are dominated points
Table 8 Dominant member 5 J1 J2 J3 J4 J5 J6 J7 J8 J9 J10 J11 J12 J13 J14 J15 J16 J17 J18 J19 G1 0
0
0
0
1
0
1
0
0
1
0
0
0
1
1
0
0
0
0
G2 0
0
1
1
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
G3 0
1
0
0
0
1
0
0
1
0
0
1
1
0
0
0
1
1
1
G4 1
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
5 Limitation In this algorithm, every job can be allocated to one and only one grid, which is a limitation of our work. So in the next work we would like to resolve this limitation.
6 Conclusion In this paper, a multi-objective scheduling of job problem on a computational grid is solved by fuzzy particle swarm optimization with trapezoidal fuzzy number and pentagonal fuzzy number. Here the optimal criteria are the makespan and flowtime values.
346
D. Dutta and S. Rath
Here we have taken a particle among the set of dominant members that minimizes both makespan and flowtime values for both the cases simultaneously. And we observe that even though there scheduling is different there objective values remain more or less similar. Hence we observe that the objective values of fuzzy PSO using trapezoidal fuzzy number and pentagonal fuzzy number remain the same. Here we have taken some examples in the experiment. To be more specific, we have taken some more examples with more grid nodes and jobs, i.e. 10 grid nodes and 50 jobs, 40 grid nodes and 100 jobs. We are getting similar result. Hence, the objective values of fuzzy PSO using trapezoidal fuzzy number is calculated and then compared with fuzzy PSO using pentagonal fuzzy number. Here we see that fuzzy PSO with pentagonal fuzzy number gives the same result as compared with fuzzy PSO with trapezoidal fuzzy number. For future work, we can take other fuzzy numbers in this process and compare the results.
References 1. Foster, I., Kesselman, C. (eds.): The Grid 2: Blueprint for a New Computing Infrastructure, 2nd edn. Morgan Kaufmann (2003) 2. Abraham, A., Liu, H., Zhao, M.: Particle swarm scheduling for work-flow applications in distributed computing environments. In: Metaheuristics for Scheduling: Industrial and Manufacturing Applications. Studies in Computational Intelligence, pp. 327–342. Springer Verlag, Germany (2008) 3. Dong, F., Akl, S.G.: Scheduling algorithms for grid computing: state of the art and open problems. Technical Report, No. 2006-504. Queen’s University (2006) 4. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP Completeness. Freeman, CA (1979) 5. Kennedy J, Eberhart R.C.: Particle swarm optimization. In: Proceedings of the IEEE International Conference on Neural Networks, pp. 1942–1948. IEEE Service Center, Piscataway, NJ (1995). Davids, D.V.: Recovery effects in binary aluminum alloys. Ph.D. Thesis, Harvard University (1998) 6. Boindala, S.P., Arunachalam, V.: Concrete mix design optimization using a multi- objective cuckoo search algorithm. In: Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computing, vol. 1053, pp. 119–126 7. Zaheer, H, Pant, M.: Solution of multi-objective portfolio optimization problem using multiobjective synergetic differential evolution (MO-SDE). In: Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computing, vol. 584 8. Rajkumar, A., Helen, D.: New arithmetic operations of Triskaidecagonal fuzzy number using alpha cut. In: Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computing, vol. 583 9. Kennedy, J., Eberhart, R.: Swarm Intelligence. Morgan Kaufmann Publishers, San Francisco, CA (2001). Smith, C.D., Jones, E.F.: Load-cycling in cubic press. In: Furnish, M.D., et al. (eds.) Shock Compression of Condensed Matter-2001. AIP Conference Proceedings, vol. 620, pp. 651–654. American Institute of Physics, Melville, NY (2002) 10. Clerc, M.: Particle Swarm Optimization. ISTE Publishing Company, London (2006) 11. Agarwal, M., Srivastava, G.M.S.: A PSO algorithm-based task scheduling in cloud computing. In: Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computing, vol. 742
Job Scheduling on Computational Grids Using …
347
12. Liu, H., Abraham, A.: An hybrid fuzzy variable neighborhood particle swarm optimization algorithm for solving quadratic assignment problems. J. Univ. Comput. Sci. 13(7), 1032–1054 (2007) 13. Nayak, S.K., Padhy, S.K., Panda, C.S.: Efficient multiprocessor scheduling using water cycle algorithm. In: Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computing, vol. 583
Analysis of Network Performance for Background Data Transfer Using Congestion Control Protocol Jaspreet Kaur, Taranjeet Singh, and Rijwan Khan
Abstract In this paper, our aim is to present an efficient congestion control mechanism and analysis of network performance for background data transfer with minimum impact on short TCP flows. In this study, focus is on congestion control mechanism, i.e., random early detection based on certain performance metrics. Several parameters like the performance of throughput, delay, and packet lossbased solution for achieving the steady state in opposition to the key performance metrics are taken into consideration. The major objective of the proposed work is to achieve the average throughput over time. Using NS2 implement random early detection methodology to get minimum delay impact on background data transfer and maximum throughput for the better performance of the network. Keywords Throughput · Delay · Bandwidth · TCP · NS2
1 Introduction The Web is yet advancing, and its future lies in the turn of events and advancement happening in each part of the present world. Web blockage happens when the total interest for an asset (e.g., connect data transmission) surpasses the accessible limit of the asset. Coming about impacts from such clog remember long postponements for information conveyance, squandered assets because of lost or dropped bundles, and even conceivable blockage breakdown, in which all correspondence in the whole organization stops. Preferably most extreme traffic ought to be sent through a connect to amplify the throughput. Then again, if transmission rates are over the top, at that point clog will be found. The throughput will significantly decrease, and a collapse J. Kaur AGI, Rampur, Uttar Pradesh, India T. Singh (B) MIET, Greater Noida, Uttar Pradesh, India R. Khan ABESIT, Ghaziabad, Uttar Pradesh, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_29
349
350
J. Kaur et al.
can happen. To dodge such circumstance blockage control is necessary. Henceforth, the requirement for innovation and organization free, clog control calculation is more requesting than any time in recent memory. As the client requirements for more transfer speed and nature of administration are getting basic, networks must be overhauled in the vehicle convention level as well as in the physical level. Accordingly, to keep up great organization execution, certain systems must be given to keep the organization from being blocked for any critical timeframe. Two methodologies for taking care of blockage are clog control (or recuperation) and clog evasion. The previous is receptive and blockage control regularly becomes possibly the most important factor after the organization is over-burden, i.e., clog is distinguished. Blockage control includes the plan of systems and calculations as far as possible the interest limit confounds, or progressively control traffic sources when such a confuse happens [1]. It has been demonstrated that static arrangements, for example, assigning more cradles, giving quicker connections or quicker processors are not compelling for blockage control purposes. In practice as expressed in RFC 2581 characterizes basically four calculations being used toward the end have for clog control: slow beginning, blockage evasion, quick retransmit, and quick recuperation. A few scientists endeavored to tackle the issue of clog in various habits, some of them treated the issue in charge hypothetical way while others manage planning different steering methods [2]. In the current investigation, the clog evasion component is used with the end goal that the connection limit between the sources is separated among them to share the accessible data transmission in wanted manner [1]. The networks in which wired cables and devices are used to provide communication and data transmission are known as wired networks. To transfer data among devices and computer systems, physical cables are utilized here. Among the connected PCs, Ethernet cables are utilized mostly today such that the data can be transferred. When several computers are required to be connected to each other, at times only single router can be utilized and the size of this wired network is small. However, there are several numbers of routers and switches provided to connect devices within larger networks. The Internet access is provided to all the devices that are present within the network with the help of cable modem or any other type of Internet connection present within any one of the devices. To support high-speed data communication and powerful distributed computing for personal and business applications, wired networks have been deployed in several application platforms [3]. Lately, there has been a need to provide security mechanisms within the networks along with their designing and development. There is higher vulnerability of networks to the cyberattacks since these networks are vulnerable to certain attacks. The development of various hacking techniques can be motivated by the hacker to exploit the network. The cyber-attacks have become a great threat to the society. Thus, there is a need to prevent cyber-attacks to occur within the wired networks.
Analysis of Network Performance for Background Data Transfer …
351
2 Methodology The wired network is the type of network in which hosts can communicate with each other through the wired links. Each host in the network has different data sending rate due to which chances of congestion in the network is very high [4]. In this research work, neural network technique will be proposed for the congestion control. The technique of neural network will learn from the previous experiences and drive new values. The neural network will take input the buffer size and host number as input. The actual output is derived, error is reduced after each iteration, and the optimal level is received which give output of congestion window expansion [5]. By increasing the number of high-speed networks, it becomes important to have techniques that can put high throughput and low average queue sizes [6]. In this work, our major goal is to avoid congestion control by keeping the size of the queue to average. Avoiding of global synchronization is another major task that is performed here.
2.1 Red Algorithm Process RED aims to control average queue size by informing end host to slow down the transmission of packets. It monitors the average queue size and drops or marks packets based on statistical probabilities [5] (Fig. 1). The following algorithm is performed: • The average queue length of incoming packet is calculated • If avgr < min length threshold then the packet is placed in the queue • If min < avgr < max qulen threshold then check dropping probability. If probability is high => packet is dropped. If probability is low => packed placed in queue. • If avgr > max => packet is dropped • The probability depends upon the maximum threshold, minimum threshold, and mark probability denominator. In NS2, constant bit rate (CBR) provides quality of service (QOS) and network traffic. CBR is used for network streaming applications as the content can easily be transferred with the limited channel capacity [7]. Figure 2 displays the process of constant bit rate of red algorithm.
2.2 Backpropagation Backpropagation is the technique used in artificial neural networks (ANN) for calculating a gradient which is required for calculating the weights which would be used for the network [8]. This method is also called the backward propagation of errors,
352
J. Kaur et al.
Fig. 1 Proposed methodology algorithm used red algorithm
because when the error is computed at the output and is transferred back by the layers of network. The technique of backpropagation has been revised, rediscovered, and it seems to be equivalent to the process of automatic differentiation which is used in the reverse accumulation model. This method uses the derivative of the loss function with respect to the network output to be known, due to this, it is called the supervised machine learning method, although it can also be used in some unsupervised networks also. Backpropagation technique is a generalization of delta rule in multi-layered feed forward networks (FFN), this is possible due to the use of the chain rule for computing the gradients iteratively for each of the layer [3]. The major goal of the supervised learning technique is to compute a function which best maps the inputs to the desired correct output [9].
Analysis of Network Performance for Background Data Transfer …
353
Fig. 2 Process of red algorithm constant bit rate
2.3 Tools Used NS represents network test system in which recreation is characterized as the method which is used to decide the presentation of the model by actualizing this device in the continuous condition. There are two sorts of test systems, for example, occasionbased and time-based test systems. The organization test system adaptation two (NS2) is otherwise called the occasion-based test system as it triggers the made occasions for the specific timespan [10]. The organization models are reproduced by using network test system 2 as a test system. This test system depends on the Linux in which programs are run on different kinds of Linux like fedora, red cap and so forth the design of the NS2
354
J. Kaur et al.
is perplexing in nature where apparatus orders dialects are used for the front end and C++ programming language for the backend. In the NS2, different execution examination instruments has been used, for example, xgraph, ngraphs and so on when the device order language and C++ run at the same time this cycle is known as article arranged apparatus orders language[10]. Both content and liveliness-based recreations are performed inside this test system. The execution of the article situated language, create two yields. The underlying yield is the .tr document which is otherwise called the follow record in which the content base recreation is spared inside this yield. The subsequent yield is as .nam record which gives results dependent on liveliness recreation. There are different applications given that use this test system, yet it is generally utilized as it gives both contents based, and liveliness-based reenactments for different applications [4].
2.4 Workflow The workflow of implementing the simulation is subdivided into few steps described below: • Topology definition. To ease the creation of basic facilities and define their interrelationships, NS 2 has a system of containers and helpers that facilitates this process [1]. • Model development. Models are added to simulation (for example, UDP, IPv4, point-to-point devices and links, applications); most of the time this is done using helpers [1]. • Node and link configuration. Models set their default values (for example, the size of packets sent by an application or MTU of a point-to-point link); most of the time this is done using the attribute system [1]. • Execution. Simulation facilities generate events, data requested by the user is logged [1]. • Performance analysis. After the simulation is finished and data is available as a time-stamped event trace. This data can then be statistically analyzed with tools like R to draw conclusions [1]. • Graphical Visualization. Processed data or information which is gathered during simulation can be represented graphically using several tools examples of such tools are matplotlib, Gnuplot, XGRAPH, etc. [1].
3 Results This paperwork is based on queue management maintaining red queue. In this, there are two wired network shown below in which the network name old.tcl include the congestion issue and the packet send by sources are dropped so to resolve this issue a new network named new.tcl that shows the very less packet loss as compared to the
Analysis of Network Performance for Background Data Transfer …
355
old network topology. In this simulator, the dynamic network topologies are changes by using the re-layout button in the nam.
3.1 Old Network Topology • Source nodes. Source nodes are working as user in the network topology where the source used to transfer the data to a particular destination. Source nodes in the above network topology are nodes 0, 1, 2 and 4. • Destination node. Node 14 is the destination node. Destination node is assigned using pantagon shape in blue color. • Attacker nodes. Node 0 and node 1 are attacker nodes in that network because of the large data rate of the packet send through the 0 and 1 source nodes. • Router. Routers are essentially check pointing for our data. The data send go through a pathway of routers to reach the destination. If a certain router goes down, then the pathway will be updated to not use the broken router. In essence, routers make up the path our data travels along as well as facilitates the path our data must follow. Routers in the above topology are node 11, node 12 and node 13 (Fig. 3).
Fig. 3 Network topology named old.tcl
356
J. Kaur et al.
3.2 New Network Topology In new network topology, dynamic topologies help in reducing the packet loss. By using the new network topology, get the better network performance as compared to the old network topology in the form graphs of delay, packet loss and throughput. Here, are using UDP type packet for reliable traffic Constant-bitrate initializes the packet size and data rate attached with UDP. Traffic generated from source which is of CBR type and the UDP is work on source [11]. Algorithm used in this is RED algorithm, so that the queue is maintained as packets are not dropped. As in the previous network topology also use the same algorithm, but here are using the concept of backpropagation. Neural networks propagation method is used as in a network a particular node having load then it backpropagate to the previous node at this time error increases than this error reduces when the previous node decreases the data rate. So, this method helps in reducing the error and the data rate for the transmission of data from source to destination (Fig. 4). All the data from the simulations were collected using the loss monitor class included in NS2. The loss monitor class can be attached to a sink to retrieve last packets arrival time, number of packets, number of loss packets and the number of
Fig. 4 Network topology named new.tcl
Analysis of Network Performance for Background Data Transfer …
357
Fig. 5 Comparison of throughputs
bytes received for that specific sink [12]. All the data retrieved from loss monitor were stored into these variables: LastPktTime_(last packet time), npkts_(No. of packets), nlost_(No. of lost packets) and bytes_(No of received bytes), respectively [13]. Throughput can be defined as the successful packet delivery rate [14]. Packet loss occurs when sent packets of data fail to reach their intended locations [8]. In our case, packet loss happens when the data from one node does not reach the node it was trying to send to [12]. End-to-end delay is the amount of time it takes for a packet to transmit from its source to destination [12]. Since our record function runs in intervals of time, are only able to calculate an average delay for each packet inside that interval of time [15]. Figure 5 throughputs of both the approaches, Fig. 6 compares packet losses and Fig. 7 compares the delay in both the approaches.
4 Conclusion and Future Work The objective of this work is to improve an end-to-end congestion control algorithm in wired environment. In the heterogeneous distributed systems, the congestion follows due to insufficient resources and throughput is assumed the least data transfer rate in the links. Current routing techniques using the hop count as the routing metric does not work well in mobile nodes. Heterogeneous wireless networks with overlapped coverage areas as a congestion prevention method without losing
358
J. Kaur et al.
Fig. 6 Comparison of packet loss
service quality. So, there is an immense need of congestion routing metrics which consist of reliability, transmission ability and congestion in the link. The relationship among available networks is used to increase the quality of system, particularly in a congestion scenario. In the implementation, algorithm has progressed a jump by-bounce blockage cognizant convention which works a shared weight an incentive as a directing measurement, established on the information rate, delay, clog rate, throughput, and channel limit. In this paper, the coordinated effort of heterogeneous remote organizations with multi-homing ability is considered for a start to finish blockage control. After displaying the model outcomes, extended convention accomplishes large sum and channel limit, by diminishing postponement and the drop of packets.
Analysis of Network Performance for Background Data Transfer …
359
Fig. 7 Delay comparison
References 1. Baweja, R., Gupta, R., Bhagat, N.K.: Improved congestion avoidance and resource allocation algorithm. IEEE (2014). ISBN: 978-1-4799-6986-9 2. Courcoubetis, C.A., Dimakis, A., Kanakakis, M.: Congestion control for background data transfers with minimal delay impact. IEEE/ACM Trans. Netw. (2017) 3. Karnik, A., Kumar, A.: Performance of TCP congestion control with explicit rate feedback. IEEE/ACM Trans. Netw. 108–120, (2005) 4. Deb, S., Ganesh, A., Key, P.: Resource allocation between persistent and transient flows. IEEE/ACM Trans. Netw. 13(2), 302–315 (2005) 5. https://www.networkfashion.net/ 6. Karnik, A., Kumar, A.: Performance of TCP congestion control with explicit rate feedback. IEEE/ACM Trans. Netw. 13(1), 108–120 (2005) 7. Intarapanich, S., Pattaramalai, S.: Increasing TCP performances on wired and wireless networks by using TCP freeze. IEEE (2016) 8. Jacobson, V.: Congestion avoidance and control. In: Proceedings of ACM SIGCOMM’88, pp. 314–329 (1988) 9. Torkey, H., Attiya, G., Morsi, I.Z.: Modified fast recovery algorithm for performance enhancement of TCP-NewReno. Int. J. Comput. Appl. (2012) 10. Ho, C.-Y., Chen, Y.-C., Chan, Y.-C., Ho, C.-Y.: Fast retransmit and fast recovery schemes of transport protocols: A survey and taxonomy. Comput. Netw. 1308–1327 (2008) 11. Madhuri, D., Chenna Reddy, P.: Performance comparison of TCP, UDP and SCTP in a Wired Network. IEEE (2016)
360
J. Kaur et al.
12. Kushwaha, V., Gupta, R.: Congestion Control for High-Speed Wired Network: A Systematic Literature Review. Elsevier Ltd (2014) 13. Cai, L., Shen, X., Pan, J., Mark, J.W.: Performance analysis of TCP-friendly AIMD algorithms for multimedia applications. IEEE Trans. Mutimedia 7(2), 339–355 (2005) 14. Roman, D., Yevgeni, K., Jarmo, H.: TCP NewReno throughput in the presence of correlated losses: the slow-but-steady variant. In: IEEE International Conference on Computer Communications INFOCOM, pp. 1–6, (April 2006) 15. Ledbat, W.G., Shalunov, S., Kuehlewind, M.: Low extra delay background transport (LEDBAT) draft-ietf-ledbat-congestion-03.txt. (2011) 16. Chiu, M., Jain, R.: Analysis of the increase and de-crease algorithms for congestion avoidance in computer networks. J. Comput. Netw. ISDN Syst. 17(1), 1–14 (1989) 17. Singh, A., Hussain, R.: Congestion control algorithm with MATLAB simulation for agricultural model employing wireless sensor networks. Int. J. Eng. Technol. Comput. Res. (IJETCR) 3(5), 01–04 (2015) 18. Carlucci, G., Cicco, L.D., Congestion control for web real-time communication. IEEE/ACM Tran. Netw. 01–14, (2017) 19. Hsu, P.-M., Lin, C.-L.: Congestion Control for Large-Scale Wired Network Using Time-Delay Compensator. IEEE (2010) 20. Jain, R.: Congestion control in computer networks: issues and trends. IEEE Netw. Mag. 27–30 (1990) 21. Venkataramani, R., Kokku, R., Dahlin, M.: TCP Nice: A mechanism for background transfers. In: Proceedings of 5th Symposium on Operational System Design Implementation (OSDI), pp. 329–344. Boston, MA, USA (2002) 22. Torkey, H., ATTIYA, G., Nabi, A.A.: An efficient congestion control protocol for wired/wireless networks. Int. J. Electron. Commun. Comput. Eng. 5(1), (2014). ISSN (Online): 2249-071X, ISSN (Print): 2278-4209 23. Ramdev M.S., Singh, T., Kandpal, S.: Economic Viability and Implementation of Optimized Commercial Wireless Network in Nigeria 24. Sharma, T.K., Pant, M.: Opposition-based learning embedded shuffled frog-leaping algorithm. In: Soft Computing: Theories and Applications, pp. 853–861. Springer, Singapore (2018) 25. Pathak, P., Singhal, P.K.: Design and analysis of broadband microstrip antenna using LTCC for wireless applications. In: Soft Computing: Theories and Applications, pp. 265–271. Springer, Singapore (2018) 26. Mandal, S., Mandal, K.K., Tudu, B.: Loss and cost minimization with enhancement of voltage profile of distribution systems using a modified differential evolution technique. In: Soft Computing: Theories and Applications, pp. 137–146. Springer, Singapore (2018) 27. Yadav, B., Kumar, A., Kumar, Y.: A robust digital image watermarking algorithm using DWT and SVD. In: Soft Computing: Theories and Applications, pp. 25–36. Springer, Singapore (2018) 28. Pattepu S.: Performance analysis of two multi-antennas relays cooperation with hybrid relaying scheme in the absence of direct link for cooperative wireless networks. In: Soft Computing: Theories and Applications, pp. 147–160. Springer, Singapore (2018)
Validation and Analysis of Metabolic Pathways Using Petri Nets Sakshi Gupta, Sunita Kumawat, and Gajendra Pratap Singh
Abstract Petri nets (PNs) have been widely utilized for modelling and analysing different expert systems which are characterized by concurrency, parallelism and conflicts. Due to the complex and dense nature of metabolic networks, modelling of these systems is needed to understand their topology. Petri net is one of the extensively used mathematical tools for modelling and studying metabolic pathways. Here, we have explained how analysis techniques of PNs can be exploited to simulate, validate and study the behaviour of metabolic pathways. We considered the biosynthesis of polyhydroxyalkanoates (PHAs) in the presence of krebs cycle for illustration to explain the validation and analysis process using Petri net. PHAs are synthesized by bacteria during nutrient lack conditions and are stored as energy storage materials. Krebs cycle is the competing pathway when glucose is the main source of carbon in PHA biosynthesis. The obtained results provide a valid mathematical model for confirming some of the known properties about their metabolic pathway and also provide some results on how the system will behave in the presence of krebs cycle. The model can be used in further studies to get new insights in this pathway along with other competing pathways. Analysis of the metabolic pathways can be more useful if this Petri net approach is related with some experimental approach. Keywords Petri net · Systems biology · Metabolic pathway · Model validation · Structural and invariant analysis
S. Gupta · S. Kumawat (B) Department of Applied Mathematics, Amity School of Applied Sciences, Amity University Haryana, Gurugram, India e-mail: [email protected] S. Gupta e-mail: [email protected] G. P. Singh Mathematical Sciences and Interdisciplinary Research Lab, School of Computational and Integrative Sciences, JNU, New Delhi, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_30
361
362
S. Gupta et al.
1 Introduction Metabolism is an essential part of an organism’s life. The complexity involved in metabolic networks makes their modelling a big challenge to the scientists and researchers. Since the metabolic networks are very large and dense, so to understand the processes occurring within a living organism, it is useful to first represent the metabolic pathways and then analyse the behaviour of these subsystems to know deeply about the metabolic networks. A metabolic pathway is considered as a sub-network of the metabolic network. It consists of interconnected series of biochemical reactions converting a substrate (input compound) into a product (output compound). The input (reactant) for a reaction can be the product of a previous reaction. These reactions are usually catalysed by enzymes. During a reaction, these enzymes are not consumed. The metabolic pathways are usually represented by graphs showing the relation between metabolites, enzymes and biochemical reactions [2]. An evident question that arises after modelling of the metabolic pathways is ‘How we can use these models to qualitatively analyse and study the behaviour of the metabolic pathways?’ Qualitative analysis of these pathways will check the model for consistency and correctness along with giving insights into the behaviour of the system. Several approaches have been proposed to model and quantitatively analyse the biological systems such as Boolean networks, ordinary differential equations (ODE), Bayesian networks [2, 3, 6, 19, 26, 29]. ODE-based approach is mostly used when the exact kinetic parameters are known. But due to the complex nature and growing database of metabolic pathways, the exact kinetic data is mostly missing or incomplete [14]. Moreover, only quantitative analysis of networks is possible using these approaches. But, in order to obtain complete information about how the system is behaving, qualitative analysis of the system is also necessary. In that case qualitative models are the best choice. Petri nets have been proven as an efficient mathematical tool to model and qualitatively analyse the behaviour of metabolic pathways. These came into existence when in 1962, a doctoral thesis, ‘Communication with Automata’ of Carl Adam Petri, described them to simulate networks involving parallel activities [25]. Since then, several systems such as discrete-event systems, distributed systems, computing systems and many more have been modelled successfully using Petri nets [22]. In 1993, it was first shown by Reddy et al. [28] that it is possible to model metabolic network as a discrete-event system, and hence, he used PNs to represent and study metabolic pathways. He has proposed a PN model of metabolic pathway of Fructose in liver and then analysed it qualitatively. Hardy & Robillard [10] have presented a review about modelling, simulation and analysis of various molecular biological systems by using different types of PNs. Heiner et al. [11] have presented an integrating methodology by using an example of apoptosis for representing and analysing biological pathways by utilizing existing Petri net analysis technologies. Thereafter, different biological systems have been represented and successfully analysed by using Petri nets [4, 7, 9, 14, 17, 20, 23]. Due to the similarities between the
Validation and Analysis of Metabolic Pathways Using Petri Nets
363
metabolic pathway elements and PN elements, PNs are considered as one of the various best tools available to represent and study metabolic pathways. In this paper, metabolic pathway of PHAs has been considered to explain techniques of PNs. Polyhydroxyalkanoates are the polyesters, synthesized by microorganisms during lack of nutrient supplies [21]. Bacteria is the main producer of PHA, and it is stored in the form of energy and carbon storage material within bacterial cells [33]. Their properties are quite similar to the widely used non-biodegradable plastics; whose accumulation is becoming an increasing danger to the environment. Due to the non-toxic and biocompatible nature of PHAs, they are being utilized in several applications including medical field, nanotechnology, as biofuels and many more. There are many metabolites which compete with biosynthesis of PHA. In order to increase the PHA synthesis, it is necessary to direct more resources into the pathway leading to PHA synthesis so as to either delete or weaken the competing pathways. In this paper, glucose is considered as the sole source of carbon. In this scenario, krebs cycle is the main competing pathway [12, 18]. But this pathway cannot be removed and only be weakened as it is an essential pathway. In the PN model of PHA, krebs cycle is weakened by directing acetyl-CoA towards PHA synthesis pathway instead of krebs cycle, which resulted in accumulation of PHA.
2 Biological Background of Polyhydroxyalkanoates (PHAs) Polyhydroxyalkanoates are the only bio-polymers being completely bio-synthesized by micro-organisms [5]. They are reserved as energy and carbon storage compounds inside the cells of bacteria during nutrient lack conditions like excess carbon amount and lack of nitrogen, oxygen or phosphorus [33]. They are accumulated within the cytoplasm of bacteria [27]. PHAs have been first observed inside the bacteria in 1888 by Dutch microbiologist Beijerinck. A lot of research has been done on PHAs discussing their production methods, properties and applications along with highlighting the main challenges for producing them on an industrial scale [5, 21, 27, 30, 32, 33]. PHAs are polyesters of hydroxyalkanoates (Has), i.e. the polymers containing monomers of hydroxyl acids connected by an ester bond. They are renewable, biodegradable, biocompatible, non-toxic and non-carcinogenic agents [1]. Due to the similarities in material properties with the petroleum-based plastics like polypropylene, they are becoming the best candidate to replace non-biodegradable plastics [31]. The environment friendly nature of PHAs makes them suitable for using in various applications. They are used as packaging materials, feminine hygiene products, in medical industry, in agricultural applications and in fuel industry as biofuels [5, 33]. In medicinal field, they have found extensive applications as tissue engineering materials, drug delivering carriers, cardiovascular materials like stents, heart valves
364
S. Gupta et al.
Sugars nutrient-rich conditions
nutrient-lack conditions PhaA PhaB AcetylAcetoacetylCoA CoA
Krebs cycle
(R)-3-hydroxybutyrylCoA
PhaC PHA
Coenzyme A
Fig. 1 PHA metabolic pathway
[1, 21]. The usage of PHA-based nanoparticles in medicinal applications like nanocapsules as controlled drug delivering systems is of high importance when delivering the highly toxic medicines such as chemotherapy drugs [27, 30].
2.1 Metabolic Pathway of PHAs The biosynthetic pathways of PHAs are connected with bacterial metabolic pathways [32]. At the time of unbalanced nutrient conditions, they help in long-term bacterial survival by acting as energy and carbon reservoir. Here, we are discussing one of the various other metabolic pathways of PHA (Fig. 1). The key enzymes involved in this pathway are 3-ketothiolase (PhaA), NADPH-dependent acetoacetyl-CoA reductase (PhaB) and PHA synthase (PhaC). During nutrient lack conditions, acetyl-CoA lead towards synthesis of polyhydroxyalkanoate while under balanced nutrient conditions, coenzyme A which comes from the Krebs cycle blocks synthesis of PHA by inhibiting PHaA. For more details on this metabolic pathway of PHA, see [7, 33].
3 Methodology In this section, the basics of PNs, their connection with metabolic pathways, structural and behavioural properties used for the qualitative analysis of the system are discussed.
3.1 Petri Nets (PNs) Petri nets are computational and mathematical tool which are used to model and study discrete-event systems [22]. Boolean PNs or 1-safe PNs are an emerging area which can be used in various practical problems involving gene regulatory networks,
Validation and Analysis of Metabolic Pathways Using Petri Nets
365
designing of multi-functional circuits and many more [8, 13]. The important concepts of Petri nets related to our study are discussed below. Petri nets are a directed-weighted-bipartite graph [15, 22] with two kinds of vertices: (1) places (p), which are denoted by circles and (2) transitions (t), which are denoted by rectangular boxes or bars. These are also called place–transition nets [16]. The directed edges connect the places and transitions. For any transition t, O(t, p) (resp. I (t, p)) indicates the number of edges connecting transition t (resp. place p) to the place p (resp. transition t). This number is represented by assigning positive integer (weight) to the directed edge, which is 0 if such an edge does not exist. Each place is assigned some non-negative integer, called tokens (represented by black dots) and it represents resource availability. Marking vector µ, of a PN expresses the number of tokens assigned to all the places. At the initial stage, it is called initial marking vector (µ0 ). Mathematically, PN is a 5-tuple (P, T, I, O, µ0 ) [13]. A transition t is enabled, if all of its input places pi have a number of tokens greater than or equal to the corresponding edge weight, i.e. if µ( pi ) ≥ I (t, pi ). At a given marking µi , when an enabled transition t fires, tokens flow from its input places towards its output places according to edge weight. This token flow results in the change of marking vector of the system as follows: µi+1 ( p) = µi ( p) − I (t, p) + O(t, p)∀ p ∈ P It is written as µi →t µi+1 and read as marking µi+1 is directly reachable from marking µi . For a transition t, if it does not have output place (input place), then it is called sink transition (source transition). A source transition is always enabled and hence can always fire. Inhibitory action of a resource is represented by using inhibitor arcs, which have a small circle at the end and connects a place to a transition. A transition linked with inhibitor arc is enabled only when the corresponding place has no tokens. For complete review and applications of Petri nets, refer [22, 24]. While modelling metabolic pathways, places model chemical compounds such as enzymes, metabolites, biochemical reactions between these compounds are represented by transitions. The directed edges tell us about the substrates and products of a biochemical reaction. The stoichiometry of a reaction is modelled by assigning arc weight. A number of molecules of chemical compounds present are shown by tokens in that place. As enzymes are catalysts and remain unconsumed when a reaction takes place, so these are modelled by using two directed edges between enzyme place and transition representing the reaction; one directed towards enzyme place and other directed towards reaction transition. Petri net model of the synthesis reaction for sodium chloride, the firing and the corresponding reachability tree is shown in Fig. 2. Here P = {Na, Cl2 , NaCl} and T = {t1 }. In the initial state, 3 molecules of Na and 1 molecule of Cl2 are present. Hence, µ0 (Na) = 3, µ0 (Cl2 ) = 1 and µ0 (NaCl) = 0 or µ0 = (3, 1, 0). As µ0 (Na) = 3 ≥ 2 = I (t1 , Na) and µ0 (Cl2 ) = 1 ≥ 1 = I (t1 , Cl2 ), hence t1 is enable and will fire. System’s new marking is calculated as:
366
S. Gupta et al.
Fig. 2 a Petri net representation of the synthesis reaction and b reachability tree
µ1 (Na) = µ0 (Na) − I (t1 , Na) + O(t1 , Na) = 3 − 2 + 0 = 1 µ1 (Cl2 ) = µ0 (Cl2 ) − I (t1 , Cl2 ) + O(t1 , Cl2 ) = 1 − 1 + 0 = 0 µ1 (NaCl) = µ0 (NaCl) − I (t1 , NaCl) + O(t1 , NaCl) = 0 − 0 + 2 = 2 Hence, µ1 = (1, 0, 2). Now, since µ1 (Na) = 1 2 = I (t1 , Na). Hence t1 is disable and a deadlock state is reached. As there are no source transitions and the number of tokens in any place does not exceed a particular number; hence, this PN is bounded.
3.2 Structural and Behavioural Analysis The properties determined by structural analysis like if the PN is bounded, ordinary, and its siphons and traps are independent of its initial marking [23]. While the behavioural properties like liveness and coverability (reachability) tree depend on initial marking and are also known as marking-dependent properties. If all the arc weights have value one, then a PN is ordinary. In a k-bounded PN, number of tokens in a place never exceeds a particular number k. If value of k is unknown, then we simply say, PN is bounded. 1-bounded PNs are called safe PNs. Biologically, boundedness helps to identify any accumulation of toxic metabolites, which may be harmful for the system under study. The metabolic Petri net models never stops execution if there are source transitions; i.e. quantity of input reactants is sufficient. So, they often possess an infinite net behaviour and hence unbounded [14]. Thus, in metabolic PN models, if sufficient quantity of input substrates is available, that will ensure that the net is always working. Traps and structural deadlocks (Siphons) are other significant structural properties. Structural deadlock is a set of places N such that * N ⊆ N ∗ , where N ∗ and * N , respectively, represents its output and input transitions set. Hence, if this place set has insufficient marking, then all of its output transitions will not be live and the model will reach a deadlock state. Biologically, a siphon indicates a set of metabolites which must
Validation and Analysis of Metabolic Pathways Using Petri Nets
367
be available in sufficient amount in order to make the modelled network live [14]. However, a trap R is opposite to siphon in which R ∗ ⊆ * R, where R ∗ and * R, respectively, represents its output and input transitions set. Hence, in any marking, if a trap receives at least one token, then it will always have a token. Biologically, traps indicate stored metabolic compounds during an organism’s growth [2]. A PN is said to be live in any given marking µ0 , if there does not exist any marking µ reachable from µ0 , in which no transition is enabled. Biologically, liveness confirms execution of all the modelled processes [2]. In a PN, if there exists a marking µ so that µ( p) ≤ µ ( p)∀ p ∈ P, then the marking µ is said to be coverable by µ . In a PN with initial marking µ0 , the tree representation of all possible coverable markings is called coverability tree. If a PN is bounded, then coverability tree is known as reachability tree. Root of the tree is µ0 and markings generated from it represents nodes; and each transition firing is represented by a directed arc. The coverability tree represents the different states of a system. A state is said to be reachable from another state, if there exists a directed edge between these two states. Biologically, tracing a reachability tree helps to determine whether a specific reaction being executed or whether any particular state is reachable from another state.
3.3 Invariant Analysis Invariant analysis depends on the incidence matrix, which is an n×m matrix of places and transitions; n is number of places and m is number of transitions. The entries of n × m incidence matrix A are given by ai j = O(i, j) − I (i, j), where I (i, j) and O(i, j), respectively, represents arc weight from transition j to its input and output place i. This matrix is equivalent to the stochiometric matrix of as metabolic pathway. Invariant analysis of a PN includes determining the place invariants and transition invariants. Both of these invariants play an important part in the model validation and analysis of metabolic pathways. T-invariant or transition invariant is a transition multiset, which upon firing reproduces given state of the system. It is the non-negative, non-trivial m × 1 solution vector y = (y1 , y2 , . . . ym )T of the linear system of equations given by A.y = 0. Biologically, transition invariants are of great interest as they may represent cyclic behaviour and an analyst can get an idea about the chemical reactions which are reproducing a given state [14]. P-invariant or place invariant is a place set, for which the weighted sum of tokens always remains invariable independent of the transition firing sequence. Place invariant is a non-negative, non-trivial n × 1 solution vector x = (x1 , x2 , . . . xn )T of the linear system of equations given by A T .x = 0. In biological terms, P-invariants are associated with conservation law of chemistry and give us information on which compound is conserved [2]. Thus, if there exists a place invariant covering all the places, then net will be bounded [14].
368
S. Gupta et al.
gSug
Sug
Oxi 2
AA-CoA
A-CoA Cond
Krebs cycle subnet
Krebs cycle subnet
PhaC
PhaB
PhaA
PHA
HyB-CoA Rd
PolyM
CoA
Fig. 3 Petri net model of PHA metabolic pathway which leads to synthesis of PHA during deficiency of nutrients
4 Results and Discussion 4.1 Petri Net Representation of Metabolic Pathway of Polyhydroxyalkanoates The structural, behavioural and invariant analysis of the modelled metabolic pathway has been done using Petri Net Toolbox (PN Toolbox) of MATLAB and PIPE v4.3.0. The PN representation of PHA metabolic pathway under conditions of nutrient deficiency is shown in Fig. 3. The acetyl-CoA is directed towards PHA biosynthesis pathway and is inhibited from entering the krebs cycle. This model consists of 6 transitions and 9 places. In this model, a source transition (gSug) inputs sugar into the modelled system. Since source transitions are always enable, so while modelling it is assumed that enough amount of sugar is available. Krebs cycle (TCA cycle) has been treated as a subnet. For modelling enzymes (green places), two directed arcs have been used. Initially, the places Sug, A-CoA, PhaA, PhaB and PhaC have one token each. For detailed information on this PN model, please refer [7]. Description of the places and transitions used in the PN representation is shown in Table 1.
4.2 Model Validation Validation of the constructed model targets at checking the obtained PN model for any inconsistency or deadlock. It will reflect how the system will work in reality. Model validation is the most important step before analysing the properties of the pathway. After the firing of Petri net model of metabolic pathway of PHA, we have got a token in the PHA place (Fig. 4c). This confirms that all the biochemical reactions have
Validation and Analysis of Metabolic Pathways Using Petri Nets
369
Table 1 Description of places and transitions used in PN model of PHA metabolic pathway Place
Compound name
Transition
Reaction
Sug
Sugars
gSug
Generating Sugar
CoA
Coenzyme A
Oxi
Oxidation
A-CoA
Acetyl-CoA
Cond
Condensation
PhaA
3-Ketothiolase
Rd
Reduction
AA-CoA
Acetoacetyl-CoA
PolyM
Polymerization
PhaB
Acetoacetyl-CoA reductase
Dace
Deacetylation
HyB-CoA
(R)-3-hydroxybutyryl-CoA
Cons
Consumption
PhaC
PHA synthase
PHA
Polyhydroxyalkanoate
Fig. 4 PN model of PHA metabolic pathway a before firing, b during firing and c after firing
executed; i.e. our PN model is deadlock-free, which further confirms the validity of our model. Simulation results of our Petri net model before, during and after firing in PN toolbox is shown in Fig. 4.
370
S. Gupta et al.
Fig. 5 State space analysis of PN model
4.3 Analysis of the PN Model Since the arc weights is not equivalent to one, so the net is not ordinary. Also, the net is not bounded (Fig. 5); i.e. during the execution of the modelled processes, the amount of resources is not fixed and keeps on increasing. Thus, by directing acetylCoA towards pathway leading to PHA synthesis results in increasing accumulation of PHA. The minimal siphons and minimal traps obtained for this net are shown in Fig. 6. The set of minimal siphons shows that enzymes must always be present in sufficient amount in order to make the whole process live. The set of traps indicate the stored PHA during the execution of pathway. The PN model is deadlock-free (Fig. 5) and hence live with respect to the initial marking assigned. This ensures that there is no metabolic block, and all the modelled biochemical processes have taken place. The coverability tree obtained for this relatively small pathway has a huge number of possible states. Such a vast coverability tree is difficult to screen capture, so some of the possible states of the PN model have been captured (Fig. 7). The coverability tree shows the different markings of the system reachable from initial marking. The markings where the transition PolyM is being fired (like M19(PolyM), M21(PolyM), M25(PolyM), etc.) lead to PHA synthesis. Figure 8 shows incidence matrix of the modelled system. The PN model contains three P-invariants—PhaA, PhaB and PhaC (Fig. 9); i.e. the number of tokens in these enzyme places will remain unchanged in the process of PHA synthesis. Since the P-invariants are related with conservation laws and as known, enzymes are not consumed during a reaction. Hence, our result coincides with the known facts about metabolic pathway. There are no T-invariants in our model (Fig. 9), i.e. a given state cannot be reproduced or the reactions involved in this pathway are not reversible.
Validation and Analysis of Metabolic Pathways Using Petri Nets
371
Fig. 6 Minimal siphons and minimal traps
Fig. 7 Coverability tree
5 Conclusions and Scope This paper provides a discussion on how Petri net theory can be helpful in validating and analysing metabolic pathways by taking biosynthesis of polyhydroxyalkanoates as an illustration. This PN approach can be applied to qualitatively analyse a model before starting with its quantitative analysis and is very helpful in describing complex
372
S. Gupta et al.
Fig. 8 Incidence matrix
Fig. 9 P-invariants and T-invariants
metabolic pathways. Analysis of the PN representation of a metabolic pathway both structurally and behaviourally can indicate if there is any metabolic block during its operation. As a result of which reformation of the PN model can be done to enhance the behaviour of the pathway. Invariant analysis is a beneficial technique for getting information about a pathway and its processes. T-invariants will arise if there are reversible reactions involved in the pathway and P-invariants give information about the conserved compounds. In this paper, glucose is considered as the main available source of carbon, and thus, krebs cycle becomes the competing pathway with PHA synthesis pathway. The analysis of the developed model of PHA metabolic pathway shows PHA accumulation when acetyl-CoA is directed towards PHA biosynthesis instead of entering krebs cycle. As the production cost of PHA is quite high, so studying its competing pathways with the help of Petri nets can give an idea on how the system will behave or what is role of a specific reaction. The properties may specify the preliminary outcomes of the possible experimental results. The developed model has no metabolic block. The set of minimal siphons shows that sufficient quantity of enzymes must always be present in order to make the whole process live. The set of minimal traps
Validation and Analysis of Metabolic Pathways Using Petri Nets
373
indicates the stored PHA during the execution of pathway. As PHAs are biodegradable–biocompatible–green materials and have wide applications in different fields, so their different metabolic pathways along with competing pathways can be further investigated to gain insights into unknown questions. For instance, if instead of glucose, fatty acids are considered as precursors of PHA biosynthesis, then betaoxidation cycle becomes the competing pathway. This scenario can be represented and studied with the help of Petri nets. Also, the obtained qualitative model could be further extended to quantitative one by using the different extensions of PNs such as Time PN, Stochastic PN, Continuous PN and Hybrid PN. Analysis of the complex metabolic pathways can be more useful by relating the Petri net approach with experimental approach. Acknowledgements The first and second authors would like to acknowledge the support provided under the DST-FIST Grant No.SR/FST/PS-I/2018/48 of Government of India. The third author is grateful to DST PURSE and UPOE-II 257 for granting research facility.
References 1. Ali, I., Jamil, N.: Polyhydroxyalkanoates: current applications in the medical field. Front. Biol. 11(1), 19–27 (2016) 2. Baldan, P., Cocco, N., Marin, A., Simeoni, M.: Petri nets for modelling metabolic pathways: a survey. Nat. Comput. 9(4), 955–989 (2010) 3. Chai, L.E., Loh, S.K., Low, S.T., Mohamad, M.S., Deris, S., Zakaria, Z.: A review on the computational approaches for gene regulatory network construction. Comput. Biol. Med. 48, 55–65 (2014) 4. Chaouiya, C.: Petri net modelling of biological networks. Brief. Bioinform. 8(4), 210–219 (2007) 5. Chen, G.Q.: Plastics completely synthesized by bacteria: polyhydroxyalkanoates. In: Plastics from Bacteria, pp. 17–37. Springer, Berlin, Heidelberg (2010) 6. Davidich, M.I., Bornholdt, S.: Boolean network model predicts cell cycle sequence of fission yeast. PLoS ONE 3(2), e1672 (2008) 7. Gupta, S., Singh, G.P., Kumawat, S.: Petri net recommender system to model metabolic pathway of polyhydroxyalkanoates. Int. J. Knowl. Syst. Sci. (IJKSS) 10(2), 42–59 (2019) 8. Gupta, S., Kumawat, S., Singh, G.P.: Fuzzy petri net representation of fuzzy production propositions of a rule based system. In: International Conference on Advances in Computing and Data Sciences, pp. 197–210. Springer, Singapore (2019) 9. Hamed, R.I., Ahson, S.I., Parveen, R.: Designing genetic regulatory networks using fuzzy Petri nets approach. Int. J. Autom. Comput. 7(3), 403–412 (2010) 10. Hardy, S., Robillard, P.N.: Modeling and simulation of molecular biology systems using Petri nets: modeling goals of various approaches. J. Bioinform. Comput. Biol. 2(04), 619–637 (2004) 11. Heiner, M., Koch, I., Will, J.: Model validation of biological pathways using Petri nets— demonstrated for apoptosis. Biosystems 75(1–3), 15–28 (2004) 12. Heinrich, D., Raberg, M., Steinbüchel, A.: Synthesis of poly(3-hydroxybutyrate-co-3hydroxyvalerate) from unrelated carbon sources in engineered Rhodospirillum rubrum. FEMS Microbiol. Lett. 362(8), (2015) 13. Kansal, S., Acharya, M., Singh, G.P.: Boolean petri nets. In: Pawel Pawlewski (ed.) Petri nets—Manufacturing and Computer Science, pp. 381–406 (2012)
374
S. Gupta et al.
14. Koch, I., Junker, B.H., Heiner, M.: Application of Petri net theory for modelling and validation of the sucrose breakdown pathway in the potato tuber. Bioinformatics 21(7), 1219–1226 (2005) 15. Kumawat, S., Purohit, G.N.: Total span of farm work flow using Petri net with resource sharing. Int. J. Bus. Process Integr. Manage. 8(3), 160–171 (2017) 16. Kumawat, S.: Weighted directed graph: a Petri net-based method of extraction of closed weighted directed Euler trail. Int. J. Serv. Econ. Manage. 4(3), 252–264 (2012) 17. Li, C., Suzuki, S., Ge, Q.W., Nakata, M., Matsuno, H., Miyano, S.: Structural modeling and analysis of signaling pathways based on Petri nets. J. Bioinform. Comput. Biol. 4(05), 1119– 1140 (2006) 18. Lütke-Eversloh, T., Steinbüchel, A.: Biochemical and molecular characterization of a succinate semialdehyde dehydrogenase involved in the catabolism of 4-hydroxybutyric acid in Ralstonia eutropha. FEMS Microbiol. Lett. 181(1), 63–71 (1999) 19. Mandel, J., Palfreyman, N.M., Lopez, J.A., Dubitzky, W.: Representing bioinformatics causality. Brief. Bioinform. 5(3), 270–283 (2004) 20. Marwan, W., Wagler, A., Weismantel, R.: Petri nets as a framework for the reconstruction and analysis of signal transduction pathways and regulatory networks. Nat. Comput. 10(2), 639–654 (2011) 21. Mo˙zejko-Ciesielska, J., Kiewisz, R.: Bacterial polyhydroxyalkanoates: Still fabulous? Microbiol. Res. 192, 271–282 (2016) 22. Murata, T.: Petri nets: Properties, analysis and applications. Proc. IEEE 77(4), 541–580 (1989) 23. Oyelade, J., Isewon, I., Rotimi, S., Okunoren, I.: Modeling of the glycolysis pathway in plasmodium falciparum using petri nets. Bioinform. Biol. Insights 10, BBI-S37296 (2016) 24. Peterson, J.L.: Petri net theory and the modeling of systems. Prentice Hall PTR (1981) 25. Petri, C.A.: Communication with automata (1966) 26. Punase, S., Rout, R.K.: Isomorphic subgraph for identification of singleton attractors in Boolean networks. In: Soft Computing: Theories and Applications, pp. 1381–1390. Springer, Singapore (2020) 27. Raza, Z.A., Abid, S., Banat, I.M.: Polyhydroxyalkanoates: characteristics, production, recent developments and applications. Int. Biodeterior. Biodegradation 126, 45–56 (2018) 28. Reddy, V.N., Mavrovouniotis, M.L., Liebman, M.N.: Petri net representations in metabolic pathways. Proc. Int. Conf. Intell. Syst. Mol. Biol. 93, 328–336 (1993) 29. Shinde, S., Iyer, B.: IoT-enabled early prediction system for epileptic seizure in human being. In: Soft Computing: Theories and Applications, pp. 37–46. Springer, Singapore (2020) 30. Shrivastav, A., Kim, H.Y., Kim, Y.R.: Advances in the applications of polyhydroxyalkanoate nanoparticles for novel drug delivery system. Biomed. Res. Int. 2013, (2013) 31. Singh, M., Kumar, P., Ray, S., Kalia, V.C.: Challenges and opportunities for customizing polyhydroxyalkanoates. Indian J. Microbiol. 55(3), 235–249 (2015) 32. Tan, G.Y.A., Chen, C.L., Li, L., Ge, L., Wang, L., Razaad, I.M.N., ... Wang, J.Y.: Start a research on biopolymer polyhydroxyalkanoate (PHA): a review. Polymers 6(3), 706–754 (2014) 33. Verlinden, R.A., Hill, D.J., Kenward, M.A., Williams, C.D., Radecka, I.: Bacterial synthesis of biodegradable polyhydroxyalkanoates. J. Appl. Microbiol. 102(6), 1437–1449 (2007)
Approach of Machine Learning Algorithms to Deal with Challenges in Wireless Sensor Network Sudha, Yudhvir Singh, Harkesh Sehrawat, and Vivek Jaglan
Abstract Primary goal of wireless sensor network (WSN) to deal with real-world issues creates such network which is feasible and efficient to implement the applications such as monitoring, surveillance of man, machine, structures and natural phenomenon. Topological changes are inevitable due to dynamic nature of WSN. As network dynamics changes, all functional and non-functional operations of wireless sensor network are affected. Traditional approaches used in other networks are incapable of responding and learning dynamically. In current scenario, WSN is integrated with recent technologies like Internet of things and cyberphysical systems to facilitate scalability for providing common services. It is imperative that wireless sensor networks are energy-efficient, self-configurable and can operate independently with minimum human intervention. In order to instil these properties, recently a lot of work has been done to explore machine learning algorithms to tackle with issues and challenges of WSN. In this paper, a basic introduction to machine learning algorithms and their application to various domains of wireless sensor networks are covered. Keywords Wireless sensor network (WSN) · Machine learning (ML) algorithms · Routing · Localization · Clustering · Data aggregation
1 Introduction Machine learning (ML) is a tool for turning information into knowledge. In past few decades, the volume of data has increased in volume. The heaps of data have patterns hidden within it. ML techniques are used to analyse and decode various underlying patterns hidden within the data. The hidden pattern and knowledge can be used to predict future events and can be helpful for decision making. Beside this, ML has ability to constantly learn and improve with every interaction with the system [1]. Sudha (B) · Y. Singh · H. Sehrawat Computer Science Engineering, UIET, MDU, Rohtak, India V. Jaglan Computer Science Engineering, Graphic Era Hill University, Dehradun, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_31
375
376
Sudha et al.
Traditional learning-based algorithm uses data to form a rule-based expert system to get the answers from the phenomenon. On the other hand, ML approach discovers rules behind the problem provided the data and answers are available. To learn the rules governing the phenomenon, the ML algorithms had to perform learning process. ML also offers models that are low in computation and less complex in nature [2]. Machine learning approaches generally consist of two phases: training and testing phases [2]. For solving any problem using ML, four basic steps are followed. i. ii. iii. iv.
Identify the features and classes from the training set. Apply dimensionality reduction techniques Learn and train the model Test model using unknown dataset.
In recent years, machine learning is preferred over conventional methods for solving diverse engineering problems. The reason for this success lies in the fact that ML algorithms can integrate domain knowledge in learning algorithms [3]. ML is suitable for dealing with issues and challenges in WSN due to the following reasons: • The ability of ML algorithm to interact and learn can prove beneficial for WSN, as nodes are deployed in harsh terrain and left unattended. • Dynamic topology: WSN has dynamic topology due to node death, failure of nodes and network partitioning. Routing and localization are difficult and complex in such dynamic scenarios, and ML can provide solutions that are simple and efficient. • Large Dataset: Owing to vast and diverse application of WSN, a large amount of data is available. The analysis of vast amount of complex data can be easily done using ML algorithms. • Integration with new technologies /heterogeneous network environment. Current applications of WSN are integrated with new environment such as cyberphysical system, Internet of things (IOT) and M2M communications [4]. These integration raises new challenges, as it require sensor nodes to have access to internet. The benefits of integration are scalable, heterogeneous and collaborative approach to provide common services. Machine learning will be useful in dealing with different challenges and issues in wireless sensor networks. Few of them are listed below [1, 2]. • Nodes may change their location during network operation; ML algorithms can be used to find the exact position of nodes in WSN. • ML algorithms can help in dealing with dynamic behaviour of sensor nodes by providing effective routing algorithm. • Finding optimum number of clusters to improve network lifetime and target coverage problem require sensing area to be covered, Ml algorithms can be used to find optimal network size for a specified network area. • Improving the efficiency of the network by identification of faulty nodes in the network.
Approach of Machine Learning Algorithms to Deal with Challenges …
377
• Machine learning algorithms can also be utilized to predict the amount of energy harvested within a given time period [5]. This information will enhance the performance of the network. Machine learning algorithms have been applied to localization, routing, data aggregation, QOS and anomaly detection [5]. Different WSN applications require different resources and quality parameters; hence, it becomes necessary to choose the most appropriate ML algorithm. Machine learning algorithms require large dataset and computation time. If an application requires high accuracy, then computation and energy consumption will also be more [4].
2 Machine Learning Algorithms The current section provides introduction of some machine learning algorithms based on the learning procedures. In this work, ML algorithms based on mode of learning can be categorized into supervised, unsupervised, reinforcement, semi-supervised and evolutionary computing.
2.1 Supervised Learning It is a form of learning, where the problem in hand has labelled data, i.e. both input and output parameters [5]. The output can be real or continuous in nature. The training involves mapping input data to output data available. The intent is to formulate the mapping function from the training set, such that when a new input is given to the algorithm, it can predict the output. Supervised learning can be used for regression and classification problems. The problem is classification problem if the output variable is category else it is regression problem. Some of the supervised learning methods are as follows.
2.1.1
Decision Tree (DT)
It is classification method used to predict data labels based on learning acquired by iterating through dataset. The data labels can be categorical or continuous. The problem is represented in the form of a tree with attributes as root and internal nodes, each branch represents a decision rule and leaf nodes are the outcomes or class labels. The inference drawn from training set is used for prediction of class or data label. The approach is applicable only to linearly separable data. The main challenge is to select attributes at root and each node level. The method has been successfully applied to identify link quality and its metrics [4]. The decision tree algorithm is best
378
Sudha et al.
suited for medical applications as this model is interpretable and can be constructed directly by domain expert.
2.1.2
Regression
It is simple approach with good prediction accuracy. The simple linear regression can be characterized by the following equation Z = f (k) + c where Z is dependent variable and k is independent variable. The dependent variable is continuous, and independent variables can be discrete or continuous. Various regression techniques are possible depending on nature of dependent variable, number of independent variable and shape of regression line. It is essential to choose best-suited techniques depending upon number and type of variables, and dimensionality and characteristics of dataset [3]. Regression can solve various WSN issues such as connectivity, data aggregation and localization.
2.1.3
K-Nearest Neighbour (KNN)
It is learning algorithm which classifies the data samples based on the labels of nearest data samples. It is applicable to both classification and regression problems. The nearest neighbours are calculated using various distance function, the largely preferred among them is Euclidean distance. The algorithm starts by measuring distance from k local points. The algorithm is suitable for query processing, as it is simple and does not require high computation resources [4]. In WSN, KNN can be used to predict missing or unknown data values, by using average values of the K neighbours.
2.1.4
Random Forest (RF)
It can be used for both regression and classification. It is based on decision tree. Random samples from datasets are collected, and a tree is formed for each sample. Prediction is done by each decision tree, and then voting is performed [6, 7]. The decision tree with maximum vote is used for prediction. Random forest algorithm has high accuracy, but it is complex and requires high computational time and resources.
2.1.5
Artificial Neural Network (ANN)
ANN consists of artificial neurons which imitate biological neurons of human brain. A basic ANN model comprises of three layers: an input layer, output layer and an
Approach of Machine Learning Algorithms to Deal with Challenges …
379
Fig. 1 Model of artificial neuron
additional hidden layer between them. Each layer has a set of neurons, and these neurons are connected through weighted links. The basic processing unit of ANN is artificial neuron; its structure is shown in Fig. 1. In Fig. 1, q1 , q2 and q3 represent inputs to the neuron, w1 , w2 … wn are the weights assigned to the inputs, the variable Y is weighted sum of all inputs and can be represented as: Y =
n
qi ∗ Wi
i=1
The output is generated through activation function on variable Y, the activation function captures the nonlinear relationship between the inputs. Feedforward and feedback neural network are two types of neural network. The key advantage of ANN is ability to learn complex and nonlinear relationships and generalize them for unseen data. ANN is useful for various classification and forecasting problems. It is efficiently applied for intrusion detection, routing, localization, event detection in WSN.
2.1.6
Support Vector Machines (SVM)
It can be valuable for both classification and regression problems. SVM algorithm is appropriate for solving unconstrained non-convex optimization problems. Algorithm involves selecting an optimal hyperplane which distinctively classifies data points with maximum margin [7]. SVM can be useful for finding the spatial–temporal correlation in WSN data and is used in localization, intrusion detection, security and detecting malicious activities of the nodes.
380
2.1.7
Sudha et al.
Bayesian
It is probabilistic machine learning technique, which assumes parameters as either discrete or continuous random variables. The objective is to calculate the posterior conditional probability distribution of an unknown cause given the observed evidence [8]. Several variations of Bayesian learning are used for better learning like hidden Markov model, conditional random fields and Gaussian mixture model. The Bayesian inference requires less training samples as compared to other ML algorithms [4].
2.2 Unsupervised Learning In unsupervised learning, the dataset consists of only input parameters, and the algorithm needs to learn the inherent structure of data from the input parameters. The goal is to learn the underlying distribution of data. The unsupervised learning algorithm can be grouped into clustering and dimensionality reduction problems [6]. Some of the unsupervised learning algorithms are:
2.2.1
K Means
Clustering classifies data into various non overlapping clusters. The algorithm iteratively computes new centroid values, until there is no change in the centroid values and data points assigned to it [7]. K-means is widely used algorithm for node clustering as it is simple and linear in complexity.
2.2.2
Fuzzy C Means (FCM)
It is a soft clustering approach, where data points can be assigned to more than one clusters. The algorithm is based on fuzzy set theory and requires number of clusters to be predefined. The data points are grouped considering various similarity measures such as intensity, distance and connectivity [6]. Each data point is assigned a membership value corresponding to each cluster according to similarity measure used. Fuzzy C means can be used for WSN issues such as mobile sink, connectivity and localization.
2.2.3
Hierarchical Clustering
This technique is used to group similar object that follows a distinct bottom-up or top-down order. Top-down approach is also known as divisive clusters, uses a method to partition the whole dataset recursively into cluster, until each individual data is a singleton clusters. Agglomerative or bottom-up approach considers each data point
Approach of Machine Learning Algorithms to Deal with Challenges …
381
as singleton cluster and agglomerates each clusters recursively until a single cluster consisting of all data points is formed [6].
2.2.4
Principal Component Analysis (PCA)
It is a multi-variant method for dimensionality reduction and data compression. The role of PCA is to extract information from the data and presents a set of new orthogonal variables called principal components [9]. PCA works on two basic principles: first the new set of orthogonal variables must maximize the variance and second it should minimize the error [9]
2.3 Semi-Supervised Semi-supervised learning is the most suitable learning techniques for real-world data, which consist of both labelled and unlabelled data. The training dataset consists of partially labelled data. The objective is to predict the labels in the available training dataset and learn a mapping function while training, which can predict labels for future data [6]. It can be used for applications such as spam filtering, video surveillance, localization and fault detection.
2.4 Reinforcement Learning (RL) It consists of an environment and an agent, where environment refers to object or problem and agent represents the algorithm. RF is method of interactive learning, where the environment sends reward to the agent (algorithm) for each positive learning step [3]. The agent will update its knowledge based on last state and reward obtained from the environment. The most common RF learning technique used is Q learning technique [4]. The advantage of Q learning is it can be applied easily to distributed architecture as in WSN.
2.5 Evolutionary Computation It is an approach inspired by nature and biological evolution. It includes genetic algorithm, artificial Bee optimization, particle swarm optimization, ant colony optimization evolutionary programming, Differential evolution, Firefly algorithm and memetic algorithm. The general approach is to start with random set of solution or population. The population evolves across a number of iterations and new solutions are generated. The task is to select the best fit options based on the fitness function
382
Sudha et al.
Fig. 2 Summary of ML algorithms used for various WSN applications
in each iteration. This recursive process ends when optimal solution is achieved. Evolutionary computing algorithm can be applied to localization, target tracking, routing, and mobile sink and coverage problems in WSN [6] (Fig. 2; Table 1).
3 Machine Learning Algorithms for WSN 3.1 Routing Routing is a process of setting up a valid and minimum cost path from source to destination nodes. WSN has dynamic topology owing to mobility, network partitioning, node death and failure. On the basis of path establishment mechanism used, routing protocol can be proactive, reactive or hybrid in nature. Routing can effectively improve energy efficiency and network lifetime of WSN. With increase in application of WSN in real-life applications, routing has now transformed to a multi-objective problem dealing with challenges such as load balancing, coverage and security. Machine learning can improve routing mechanism by learning optimal routing path using past data or by dividing network area into sub-graphs. ML approach will reduce the routing complexities and helps in predicting full path. SIR [10] uses a modified Dijkstra algorithm to form backbone network. A smallest routing path from each node to base station is obtained. Four parameters are used to measure link quality of neighbours are latency, duty cycle, packet error rate and throughput. Each node tests the link quality by sending a packet named ping and obtains the mean values for all the metrics. Self-organizing map (SOM) unsupervised learning is used to find the optimal path. Each node calculates its distance based on output of SOM. The hybrid approach using Dijkstra’s and SOM takes into account
Approach of Machine Learning Algorithms to Deal with Challenges …
383
Table 1 Machine learning algorithm and their applications in WSN Algorithm
Learning type
Advantages
Disadvantages
Application in WSN
Linear regression
Supervised
Easy to understand Fast to train Suitable for small classification problem
Less accurate Not suited for nonlinear and complex data
Connectivity, data aggregation and localization
Bayesian
Supervised
Doesn’t require large training dataset
Sensitive to prior probabilities chosen. High computational cost
Activity recognition, event detection, outliner detection. Data aggregation, path selection of mobile sink
Decision tree
Supervised
Simple, don’t require scaling or normalization of data values. Can handle missing values
Attribute selection affects accuracy. Any change in data requires change in whole structure of decision tree
Cluster head selection, localization, event detection
Support vector Supervised machine (SVM)
Suitable for learning nonlinear and complex data Efficient learning system
High computational and memory requirement
Localization, Intrusion detection, Security and detecting malicious activities of the nodes
Random forest Supervised
High accuracy, fast to execute, can handle large dataset with high dimensionality. Capable of dealing with missing values
Prone to Overfitting. Not suitable for small data samples
Fault detection, Intrusion detection and MAC layer, Mobile sink
(continued)
384
Sudha et al.
Table 1 (continued) Algorithm
Learning type
Advantages
Disadvantages
Application in WSN
K-nearest neighbour (KNN)
Supervised
Simple and easy. Accuracy is good and can handle missing data values
Slow in operations. Large memory is required. Selecting appropriate distance measure is the key
Query processing subsystem, Anomaly detection and outliner detection
K Means
Unsupervised
Good when clusters are almost of same size Faster than hierarchical clustering For best results an optimum value of k is essential
Sub-optimal performance for categorical data. Not suitable for overlapping clusters Does not work well for Irregular data points
Surveillance and object tracking, Node clustering and data clustering
Fuzzy C Means
Unsupervised
Similar to K means, but uses soft approach for clustering. It can handle uncertainty. Optimum value of k is required
Slower than K means. It performs better than K means when dataset is incomplete
Node clustering and data aggregation localization, Coverage and connectivity
Hierarchical clustering
Unsupervised
Do not require prior knowledge about number of clusters. Results are consistent for multiple runs of the algorithm
Cannot handle big data. Time complexity of the algorithm is quadratic in nature
Energy harvesting, Mobile sink, Node clustering and data aggregation
Principal component analysis
Unsupervised
High-dimensional data can be easily visualized. Easy to understand
Large memory requirement to store correlation matrix
Data aggregation and routing
Flexible and adaptable to the process of decision making
Get easily Routing trapped in local problems, minima QOS and MAC layer
Reinforcement Semi-supervised learning (Q learning)
(continued)
Approach of Machine Learning Algorithms to Deal with Challenges …
385
Table 1 (continued) Algorithm
Learning type
Advantages
Neural network
Supervised/Unsupervised Suitable for complex nonlinear problems. Requires large number of parameters for accurate prediction
Disadvantages
Application in WSN
Prone to over fitting Large dataset required for training. Computation intensive. Cannot Handle missing values
Intrusion detection, Routing, Localization, Event detection
QOS parameters for routing. The overhead and complexity of hybrid approach are high. Reinforcement learning-based algorithm is also suitable for routing in WSN. The work in [11] uses learning technique for multi-cast routing for wireless ad hoc network. The routing is performed in two phases: first phase involves discovering optimal routes and updating Q values. The second phase creates optimal path for multicast routing. RLGR [12] is hierarchical geographic routing approach and uses energy and delay to formulate the reward function. The work in [13] enhanced the Q learning-based routing for multiple sources to multiple sinks routing. The algorithm assumes nodes can directly communicate and shares local information as feedback with their neighbours. As a result message overhead is high for this routing method. In [14] Q-PR algorithm that is similar to [12] is proposed, but Q-PR algorithm can learn from previous routing decisions and selects the routing path based on delivery ratio. The Q-PR uses Q learning and Bayesian decision model to select the optimal routes. Apart from these algorithms based on ANN [15], K Means [16], SVM [17] and Bayesian statistics [18] are also used for routing to improve network lifetime in WSN.
3.2 Localization Localization is a method of finding the actual coordinates of sensor nodes in the network. In dense and large WSN, it becomes difficult for BS find the exact location of nodes. The nodes have to transmit their location to the BS. If the nodes are mobile, it is essential to locate them accurately to ensure precision in monitoring and tracking applications. Sensor nodes and sink both can either be static or mobile in nature. Most of the clustering, routing protocols require distance and location of node for energy consumption calculations and selecting optimal path to the BS or CH. Programming and reconfiguring are required continuously to cope with dynamic topology of WSN [2]. Some applications require only relative distance measures, while other requires absolute location. ML algorithm can be helpful in converting relative locations into absolute ones using few anchor nodes.
386
Sudha et al.
SVM can be used for localization of nodes, when it is not feasible to use GPS devices. The localization based on SVM is performed in [19], the approach uses RSSI for initial node movement and later uses SVM to predict its new location. LSVM [20] uses distributed localization; it requires training set based on connectivity information and indicators. The method is sensitive to outliners. The work [21] proposes RFSVM algorithm, which introduces transmit matrix to map the relationship between hops and distances. The training was performed using transmit matrix, and SVM is utilized for predicting the location of unknown nodes. The work in [22] uses combined approach of fuzzy C means and k means for clustering data. The network area is divided into clusters, and a RSSI dataset is obtained for each cluster. A dedicated ANN is trained for each cluster; the final data is collected from selected ANN and fused to predict the location of sensor node. The approach works for both static and mobile nodes in real-world indoor environment. The algorithm LPSONN [23] uses ANN-based approach for localization of static nodes and improves error rate. The algorithm uses PSO for selecting optimal number of hidden layer. The authors in [24] propose localization algorithm for mobile nodes using ANN. In order to enhance the accuracy of localization, optimal number of neurons was predicted using PSO technique. A semi-supervised learning-based distributed localization approach for mobile nodes is presented in [25]. The algorithm can handle environmental changes and new nodes effectively. The work [26] proposes a centralized localization approach for mobile sensor nodes using hidden Markov model (HMM). The algorithm performs equally well for indoor and outdoor environment.
3.3 Medium Access Control Layer (MAC Layer) Protocol MAC layer is responsible for reliable and efficient message transmission and reception. It is also responsible for channel access policies, flow control, buffering and error control in WSN. WSN is a cooperative network, where large number of sensor node cooperates and transmit data. The requirement of MAC layer varies as WSN is optimized for specific application and different QOS parameters. MAC layers’ protocols can either be scheduled-based or contention-based in nature. For designing MAC layer protocol for WSN energy conservation and latency are important. Contentionbased MAC protocol uses sleep and awake cycle to save energy. ML algorithm can be utilized for allocating channels to the nodes, and the work [27] uses Bayesian model for channel assignment to conserve network energy. ML algorithms are also helpful in including new nodes in the existing channel assignment scheme during network operations. In [28], authors use fuzzy Hopfield network to transmit TDMA schedule. TDMA slots are created to reduce processing time and collisions. A novel CSMA-based approach is proposed to prevent denial of service attack in WSN [29]. Reinforcement learning is suitable for duty cycle management as it supports distributed operations and requires less computational resources. Duty cycle management using reinforcement learning is done in RL-MAC [30]. Channel bandwidth and
Approach of Machine Learning Algorithms to Deal with Challenges …
387
traffic are considered for determining active transmission time and duty cycle. The work [31] proposes a hybrid contention-based approach for MAC protocol using random forest algorithm. It uses RSS values to address MAC layer spoofing. This approach minimizes the authentication overhead and power consumption.
3.4 Event Detection and Query Processing WSN is a cooperative network, the information collected is analysed and inference drawn is used to make certain decisions. On the basis of mechanism of data reporting, WSN applications can be event-driven, continuous or query-driven in nature. ML algorithms can effectively detect the event with simple classifier and minimum memory and storage requirement [1]. In query processing system, ML can restrict the query searching to desired area reducing the flooding and message overheads in the network. Authors in [32] uses KNN-based regression model to mine information obtained from sensor nodes. The model is suitable to detect diverse events. A KNN-based approach was proposed in [33], to design a query processing system for event detection using stored data. The work [34] is rule-based approach using fuzzy logic for event detection. The algorithm considers information from neighbouring nodes for event detection, resulting in enhancement of speed and accuracy of the algorithm. Authors of [35] introduce an ANN-based approach to estimate real-time water estimation model. The work [36] proposes a self-adaptive reinforcement learning-based sleep/awake scheduling approach, where the time axis is partitioned into small time slots. Each time slot is independent, and every node can autonomously decide its mode, i.e. sleep, transmit or wake up. Each node independently decides its operation mode for each time slot in decentralized manner.
3.5 Node Clustering and Data Aggregation Increase in network size also increases the volume of data sent to sink. Hence, it is imperative to aggregate data before transmitting it to the BS. An effective clustering method and a proper CH selection will conserve network energy and enhance network lifetime [37]. ML algorithms can be used for removing of correlated or duplicate data. The number of packets transmitted will be less, and only useful information will be passed to BS. The machine learning strategies can enhance clustering and data aggregation process by • Identifying faulty nodes in the network, and ensuring they are not eligible for becoming CH. • An optimal CH selection for load balancing and minimizing energy consumption in the network.
388
Sudha et al.
• Removing redundancy from sensed data, compressing and then transmitting data to BS/Sink. A proficient data aggregation method will enhance the network lifetime by facilitating uniform energy utilization. Data aggregation methods can be categorized into tree-based, cluster-based, centralized or in network-based. ML techniques are adaptable to dynamic WSN environment and can work without reprogramming and reconfiguring [4]. The authors in [38] use decision tree for CH selection. The parameters considered for CH selection include distance to cluster centroid, available battery, degree of mobility and indication of vulnerability. These parameters are used for forming decision tree. The algorithm has start-up phase and steady phase. The initial CHs are chosen arbitrarily. BS transmits enquiry message to each node, and in response, each node sends its control information to BS. BS uses decision tree algorithm to choose new CH. BS transmits list of new CH to all the nodes. Nodes join the CH based on their RSSI value. The process is repeated, and role of CH is rotated to balance the load of the network. Reclustering can be performed after a fixed interval or based on the specified threshold battery value. Authors in [39] present a HMM-based prediction method for CH selection. The algorithm is centralized, and clustering is performed by particle swarm optimization. The work in [40] uses Kohenan self-organizing map (KSOM) for congestion detection and avoidance in WSN. In this architecture, nodes classify aggregated data using SOM. The technique helps in dimensional reduction. Due to aggregation, there is a decrease in energy consumption and network traffic. In [41], a K means-based approach is used for classifying nodes into various clusters. This algorithm requires no assumption regarding distribution of nodes. The algorithm improves network lifetime and packet delivery ratio. EECPK-means [42] uses mathematical approach for selecting optimal number of CH. The algorithm uses midpoint method for selection of initial CH, instead of selecting them randomly. The approach helps in forming more balanced clusters, with nearly uniform members in each cluster. The CH is selected based on residual energy in addition to Euclidian distance used in k means algorithm. The communication energy is used by considering multi-hop transmission between cluster head and BS. The algorithm improves network lifetime and load balancing. EKMT [43] considers the distance between nodes and distance with BS for selecting optimal CH. This method reduces delay and improves network lifetime by reselecting CH dynamically. Authors of [44] present self-managed cluster formation method using neural network. The neural network forms clusters based on minimum weakly connected dominating sets(WDCS) in WSN. The work also depicts the impact of transmission radii for improving the stability and network lifetime. The algorithm is best suited for large WSN with small transmission radii, where centralized schemes are not effective. Lin et al. [45] proposed methodologies which don’t require awareness of network topology. It uses adaptive linear vector learning quantization technique for online data compression. The approach analyses the correlation between current and historical reading and retrieves the compressed sensor readings. The disadvantage of the methodology is outliners or dead neurons are not considered. The work
Approach of Machine Learning Algorithms to Deal with Challenges …
389
[46] proposes a data aggregation model based on correlation between neighbouring nodes. Correlation of every pair of adjacent node is calculated, and aggregation tree is constructed using shortest path algorithm. The limitation of this model is it considers only the spatial correlation and neglects temporal correlation among nodes. Generally PCA is used in combination with compressive Sensing and EM algorithm for data aggregation in WSN. CS is an alternative to traditional sampling technique; it estimates the original signal from a few linear incoherent measurements. It explores the property of sparsity and incoherence; it reconstructs the original signal, saving on sensing resources, transmission and storage facilities. The second algorithm EM algorithm is an iterative algorithm, comprising of first expectation (E) step used to formulate the cost function using system parameters. The second step is maximization step (M) used to compute new parameter values. Macua et al. [47] proposed a data compression method utilizing PCA and maximum likelihood of observed data. The consensus round parameter can be varied to balance the accuracy and communication cost. Fenxiong et al. [48] used PCA algorithm to remove data redundancy. The sensor nodes sense data and transmit it to the cluster heads. PCA algorithm is applied at the cluster head nodes to eliminate duplicate readings. The data is compressed by ignoring principal components with least variance values. C-FCM [49] is a centralized algorithm, where clustering and CH selection are performed in each round. Cluster formation is done considering the location, and CH is selected depending on the remaining energy of the nodes within the clusters. The data gathering and aggregation are performed, and combined data is send to the BS. DFCM [50] follows a distributed approach for CH selection. Initially, the clusters structure is formed using FCM approach, once the clusters are formed, CH is selected in each round locally. The protocol runs in rounds with two phases, i.e. CH selection and data transmission phases. Decentralized approach leads to less network overhead and energy consumption as compared to C-FCM. Foster and Murphy [51] discussed a method for CLIQUE-based clustering using Q learning technique. In CLIQUE algorithm, node can decide upon its eligibility to become CH. The reward function depends on hops to sink and residual energy. Every node is an independent learning agent. A node becomes CH only if it is most economical in routing data to all the sinks. The control overhead is reduced as there is no CH selection process.
3.6 Quality of Service (QOS) QOS in WSN means guarantee and efficient data delivery and event reporting. WSN is used in diverse applications with different QOS requirements. It is challenging task to provide quality of service in dynamic and resource constrained WSN environment. Apart from this unbalanced traffic, data redundancy, multiple sinks and scalability
390
Sudha et al.
create challenge to QOS. QOS can be defined in two perspectives—applicationspecific and network-specific [52]. Network perspective deals with energy and bandwidth management, application-specific parameters includes latency, delay, number of active sensors and coverage. ML algorithms can be used for detecting faulty nodes, balancing energy and estimating links quality. Authors in [53] use fuzzy logic-based data fusion approach for clustered WSN. The sensor nodes that are equipped with fuzzy logic controller, to ensure only essential data is transmitted to CH. The work [54] presents a link quality estimation mechanism using wavelet neural network. In [55] a cross-layer communication protocol is proposed. The algorithm implements reinforcement learning algorithm for high data rate wireless multimedia communication applicable for emergency networking. The work [56] proposes a RL-based data dissemination and topology control protocol. The protocol is used to select active neighbour nodes for reliable topology. Authors in [57] propose dynamic fault detection model using neural network. Backpropagation algorithm is used for training ANN for fault detection and node identification. DACR algorithm [58] uses RL learning for distributed adaptive cooperative routing protocol for reliable QOS for WSN. The algorithm decides upon the next node and optimal relay node to increase the reliability of network. The work in [59] uses a neural network to introduce a new metric dependability to judge its QOS. Dependability of network is calculated considering availability, reliability, maintainability and survivability.
3.7 Mobile Sink In large wireless sensor network, the nodes near sink or BS drain energy faster than other nodes. These nodes are exhausted and die soon causing energy hole problem in WSN. To deal with energy hole problem, the concept of mobile sink was introduced. A mobile sink will gather data from nodes by directly visiting each node. But in large WSN, it is not feasible to visit nodes; hence to ease the operations, a few nodes are selected as rendezvous points. Mobile sink visits only these points and collect data, rest all other nodes transmit their information to the nearest rendezvous point [6]. Recently, machine learning has been used for selecting optimal set of rendezvous point, optimal mobile sink path selection and finding data forwarding routes from nodes to rendezvous points [6]. Authors of [60] explored the problem of mobile sink in event-driven and deadline-based application. The work proposes an optimal deadline-based trajectory (OJT) algorithm for mobile sinks to collect data from active nodes. OJT is based on decision tree algorithm; the parameters considered are the geographical position of nodes and properties of captured events. The work [61] proposes a multiple mobile sink framework to minimize the transmission latency and maximize the throughput. The mobile sinks act as fog node to facilitate data gathering and storing it on the cloud. A naive Bayesian algorithm [62] is applied for predicting the connection status between the node and mobile sink. The data is
Approach of Machine Learning Algorithms to Deal with Challenges …
391
transmitted depending on the connection status. The decision to transmit or not is independently taken by the sensor nodes. The results depict naive Bayesian approach outperforms the traditional data gathering approach in WSN.
3.8 Energy Harvesting Energy harvesting is the termed used for mechanisms used to generate energy from the surroundings of networks. These mechanisms can either power nodes directly or via by storing the generated energy into batteries of the nodes. Generally, energy sources used for energy harvesting can be grouped into two categories from ambient environment or external sources. The environment sources include wind, solar power, thermal and radio frequency, and external sources can be mechanical-based or human-based sources [63]. Machine learning algorithms are used to predict the amount of total energy harvested within a given time period [3]. The authors [64, 65] used linear regression for predicting the energy in both centralized and distributed networks. The work covers solar energy-based energy harvesting. The approach predicts the harvested energy using historical data. A Q learning approach Q-SEP [66] is used to forecast the amount of energy produced in a particular time slot using past observations. ML can also be used for dynamic duty cycle adjustment in WSN. Dynamic duty cycle adjustment works on energy neutrality principle. Energy neutrality is implemented by decreasing the duty cycle when harvested energy is low and increasing the duty cycle when energy harvested is high. The RLTDPM algorithm [67] monitors the status of node and dynamically adjust duty cycles utilizing the above principle of energy neutrality to achieve the required throughput. The work [68] presents hierarchical clustering mechanism, where only CH use renewable energy sources and rest all nodes are battery powered. The approach aims are finding optimal location of cluster heads to minimize energy consumption.
4 Conclusion WSN is a resource constrained environment with its own unique challenges and issues. The work presents a survey of the ML algorithms applied for resolving various issues in WSN. Depending on the problem, data and quality of service parameters an appropriate ML method can be selected. There are no specific criteria to define effectiveness of one method over another. Researchers can choose one method over other on the basis factors such as accuracy, time to classify dataset using trained model, training time, complexity. Future work involves comparing and analysing ML algorithms using real-world datasets.
392
Sudha et al.
References 1. Forster, A.: Machine learning techniques applied to wireless. In: 3rd International Conference on Intelligent Sensors, Sensor Networks and Information (2007) 2. Ayodele, T. O.: Introduction to machine learning. in New Advances in Machine Learning. InTech (2010) 3. Simeone, O.: A very brief introduction to machine learning with applications to communication systems. IEEE Trans. Cogn. Commun. Netw. 4(4), (2018) 4. Alsheikh, M.A., Lin, S., Niyato, D., Tan, H.-P.: Machine learning in wireless sensor networks: algorithms, strategies, and applications. IEEE Commun. Surv. Tutorials 16(4), 1996–2018 (2014) 5. Khan, Z.A., Samad, A.: A study of machine learning in wireless sensor network. Int. J. Comput. Netw. Appl. (2017) 6. Praveen Kumar, D., Amgoth, T., Annavarapu, C.S.R.: Machine learning algorithms for wireless sensor networks: a survey. Inf. Fusion 49, 1–25 (2019) 7. . Ayodele, O.: Types of machine learning algorithms. In: New Advances in Machine Learning. InTech (2010) 8. Horný, M.: Bayesian Networks. Boston University (2014) 9. Jolliffe, I.T.: Principal Component Analysis. Springer Verlag (2002) 10. Barbancho, J., León, C., Molina, F.J., Barbancho, A.: A new QoS routing algorithm based on self-organizing maps for wireless sensor. Telecommun. Syst. 36, 73–83 (2007) 11. Sun, R., Tatsumi, S., Zhao, G.: Q-MAP: a novel multicast routing method in wireless ad hoc networks with multiagent reinforcement learning. In: Conference on Computers, Communications, and Control Engineering (2002) 12. Dong, S., Agrawal, P., Sivalingam, K.: Reinforcement learning based geographic routing protocol for UWB wireless sensor network. In: Global Telecommunications Conference. IEEE (2007) 13. Forster, A., Murphy, A.L.: FROMS: feedback routing for optimizing multiple sinks in WSN with reinforcement learning. In: 3rd International Conference on Intelligent Sensors, Sensor Networks and Information, pp. 371–376. IEEE (2007) 14. Arroyo-Valles, R., Alaiz-Rodriguez, R., Guerrero-Curieses, A., Cid-Sueiro, J.: Q-probabilistic routing in wireless sensor networks. In: 3rd International Conference on Intelligent Sensors, Sensor Networks and Information, pp. 1–6. IEEE (2007) 15. Srivastava, J.R., Sudarshan, T.S.B.: A genetic fuzzy system based optimized zone based energy efficient routing protocol for mobile sensor networks (OZEEP). Appl. Soft Comput. 37, 863– 886 (2015) 16. El Mezouary, R., Choukri, A., Kobbane, A., El Koutbi, M.: An energy-aware clustering approach based on the K-means method for wireless sensor networks. In: Advances in Ubiquitous Networking, p. 325–337. Springer (2016) 17. Khan, F., Memon, S., Jokhio, S.H.: Support vector machine based energy aware routing in wireless sensor networks. In: Robotics and Artificial Intelligence (ICRAI) (2016) 18. Jafarizadeh, A.K.T.D.V.: Efficient cluster head selection using naive bayes classifier for wireless sensor networks. Wireless Netw. 3, 779–785 (2017) 19. Tran, D., Nguyen, T.: Localization in wireless sensor networks based on support vector machines. IEEE Trans. Parallel Distrib. Syst. 19(7), 981–994 (2008) 20. Yang, B., Yang, J., Xu, J., Yang, D.: Area localization algorithm for mobile nodes in wireless sensor networks based on support vector machines. In: Mobile Ad-Hoc and Sensor Networks, pp. 561–571. Springer (2007) 21. Tang, T., Liu, H., Song, H., Peng, B.: Support vector machine based range-free localization algorithm in wireless sensor network. In: Machine Learning and Intelligent Communications, pp. 150–158. Springer, Cham (2016) 22. Bernas, M., Placzek, B.: Fully connected neural networks ensemble with signal strength clustering for indoor localization in wireless sensor networks. Int. J. Distrib. Sens. Netw. 11(12), (2015)
Approach of Machine Learning Algorithms to Deal with Challenges …
393
23. Banihashemian, S.S., Adibnia, F., Sarram, M.A.: A new range-free and storage-efficient localization algorithm using neural networks in wireless sensor networks. Wirel. Pers. Commun. 98(1), 1547–1568 (2018) 24. El Assaf, A., Zaidi, S., Affes, S., Kandil, N.: Robust ANNs-based WSN localization in the presence of anisotropic signal attenuation. IEEE Wirel. Commun. Lett. 5(5), 504–507 (2016) 25. Gharghan, S.K., Nordin, R., Ismail, M., Ali, J.A.: Accurate wireless sensor localization technique based on hybrid PSO-ANN algorithm for indoor and outdoor track cycling. IEEE Sens. J. 16(2), 529–541 (2016) 26. Kumar, S., Tiwari, S.N., Hedge, R.M.: Sensor node tracking using semi-supervised hidden Markov models. Ad Hoc Netw. 33, 55–70 (2015) 27. Kim, M.H., Park, M.-G.: Bayesian statistical modeling of system energy saving effectiveness for MAC protocols of wireless sensor networks. In: Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing. Studies in Computational Intelligence (2009) 28. Shen, Y.-J., Wang, M.-S.: Broadcast scheduling in wireless sensor networks using fuzzy hopfield neural network. Expert Syst. Appl. 34(2), 900–907 (2008) 29. Kulkarni, R.V., Venayagamoorthy, G.K.: Neural network based secure media access control protocol for wireless sensor networks. In: Proceedings of the 2009 International Joint Conference on Neural Networks, ser. IJCNN’09. IEEE, Piscataway, NJ, USA (2009) 30. Liu, Z., Elhanany, I.: RL-MAC: A reinforcement learning based MAC protocol for wireless sensor networks. Int. J. Sens. Netw. 1(3), 117–124 (2006) 31. Alotaibi, B., Elleithy, K.: A new MAC address spoofing detection technique based on random forests. Sensors 16(3), (2016) 32. Illiano, P., Lupu, E.C.: Detecting malicious data injections in event detection wireless sensor networks. IEEE Trans. Netw. Serv. Manage. 12(3), 496–510 (2015) 33. Li, Y., Chen, H., Lv, M., Li, Y.: Event-based k-nearest neighbors query processing over distributed sensory data using fuzzy sets. Soft Comput. 23(2), 483–495 (2019) 34. Han, Y., Tang, J., Zhou, Z., Xiao, M., Sun, L., Wang, Q.: Novel itinerary-based KNN query algorithm leveraging grid division routing in wireless sensor networks of skewness distribution. Pers. Ubiquitous Comput. 18(8), 1989–2001 35. Kılıçaslan, Y., Tuna, G., Gezer, G., Gulez, K., Arkoc, O., Potirakis, S.M.: ANN-based estimation of groundwater quality using a wireless water quality network. Int. J. Distrib. Sensor Netw. 10(4), 1–8 (2014) 36. Ye, D., Zhang, M.: A self-adaptive sleep/wake-up scheduling approach for wireless sensor networks. IEEE Trans. Cybernet. 1–14 (2017) 37. Bhatia, V., Kumavat, S., Jaglan, V.: Comparative study of cluster based routing protocols in WSN. Int. J. Eng. Technol. 7(1.2), 171–174 (2018) 38. Ahmed, G., Khan, N.M., Khalid, Z., Ramer, R.: Cluster head selection using decision trees for wireless sensor networks. In: IEEE International Conference on Intelligent Sensors, Sensor Networks and Information Processing (2008) 39. Bhatia, V., Jaglan, V., Kumavat, S., Kaswan, K.S.: A hidden Markov model based prediction mechanism for cluster head selection in WSN. Int. J. Adv. Sci. Technol. 28(15), 585–600 (2019) 40. Lee, S., Chung T.C.: Data Aggregation for Wireless Sensor Networks Using Self-organizing Map. Springer-Verlag, Berlin Heidelberg, (2005) 41. El Mezouary, R., Choukri, A., Kobbane, A., El Koutbi, M.: An energy-aware clustering approach based on the K-means method for wireless sensor networks. In: Advances in Ubiquitous Networking, pp. 325–337. Springer (2016) 42. Ray, D.D.A.: Energy efficient clustering protocol based on k-means (EECP- K-means)midpoint algorithm for enhanced network lifetime in wireless sensor net work. IET Wirel. Sens. Syst. 6(6), 181–191 (2016) 43. Jain, B., Brar, G., Malhotra, J.: EKMT-k-means clustering algorithmic solution for low energy consumption for wireless sensor networks based on minimum mean distance from base station. In: Networking Communication and Data Knowledge Engineering, pp. 113–123. Springer (2018)
394
Sudha et al.
44. He, H., Zhu, Z., Makinen, E.: A neural network model to minimize the connected dominating set for self-configuration of wireless sensor networks. IEEE Trans. Neural Netw. 20(6), 973–982 (2009) 45. Lin, S., Kalogeraki, V., Gunopulos, D., LonardiV, S.: Online information compression in sensor networks. IEEE International Conference on Communications. Int. J. Pure Appl. Math. Special Issue 7(11), 3371–3376 (2006) 46. Liu, C., Luo, J., Song, Y.: Correlation-model based data aggregation in wireless sensor networks. In: 12th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD) (2015) 47. Macua, S.V., Belanovic, P., Zazo, S.: Consensus-based distributed principal component analysis in wireless sensor networks. In: 11th International Workshop on Signal Processing Advances in Wireless Communications, p. 15 (2010) 48. Chen, F., Li, M., Wang, D., Tian, B.: Data compression through principal component analysis over wireless sensor networks. J. Comput. Inf. Syst. 9(5), 1809–1816 (2013) 49. Hoang, D.C., Kumar, R., Panda, S.K.: Realisation of a cluster-based protocol using fuzzy C-means algorithm for wireless sensor networks. IET Wirel. Sens. Syst. 3(3), 163–171 (2013) 50. Alia, O.M.: A decentralized fuzzy C-means-based energy-efficient routing protocol for wireless sensor networks. Sci. World J. (2014) 51. Forster, A., Murphy, A.L.: CLIQUE: role-free clustering with Q-learning for wireless sensor networks. In: 29th IEEE International Conference on Distributed Computing Systems (2009) 52. Bala, T., Bhatia, B., Kumawat, S., Jaglan, V.: A survey: issues and challenges in wireless sensor network. Int. J. Eng. Technol. 7(24), 53–55 (2018) 53. Collotta, M., Pau, G., Bobovich, A.V.: A fuzzy data fusion solution to enhance the QoS and the energy consumption in wireless sensor networks. Wirel. Commun. Mobile Comput. (2017) 54. Sun, W., Lu, W., Chen, L., Mu, D., Yuan, X.: WNN-LQE: wavelet-neural-network-based link quality estimation for smart grid WSN. IEEE Access 5, 12788–12797 (2017) 55. Lee, E.K., Viswanathan, H., Pompili, D.: RescueNet: reinforcement-learning-based communication framework for emergency networking. Comput. Netw. 98, 14–28 (2016) 56. Pravin Renold, A., Chandrakala, S.: MRL-SCSO: multi-agent reinforcement learning-based self-configuration and self-optimization protocol for unattended wireless sensor networks. Wirel. Pers. Commun. 96(4), 5061–5079 (2017) 57. Moustapha, A., Selmic, R.: Wireless sensor network modeling using modified recurrent neural networks: application to fault detection. IEEE Trans. Instrum. Meas. 57(5), 981–988 (2008) 58. Razzaque, M.A., Ahmed, M.H.U., Hong, C.S., Lee, S.: QoS-aware distributed adaptive cooperative routing in wireless sensor networks. Ad Hoc Netw. 19, 28–42 (2014) 59. Snow, A., Rastogi, P., Weckman, G.: Assessing dependability of wireless networks using neural networks. In: Military Communications Conference. IEEE (2005) 60. Tashtarian, F., Moghaddam, M.H.Y., Sohraby, K., Effati, S.: ODT: optimal deadline-based trajectory for mobile sinks in WSN: a decision tree and dynamic programming approach. Comput. Netw. 77, 128–143 (2015) 61. Wang, T., Zeng, J., Lai, Y., Cai, Y., Tian, H., Chen, Y., Wang, B.: Data collection from WSNs to the cloud based on mobile Fog elements. Future Gener. Comput. Syst. (2017) 62. Kim, S., Kim, D.Y.: Efficient data-forwarding method in delay-tolerant P2P networking for IoT services. Peer-to-Peer Netw. Appl. 11(6), 1176–1185 (2018) 63. Shaikh, S.F.K.: Energy harvesting in wireless sensor networks: A comprehensive review. Renew. Sustain. Energy Rev. 5, 1041–1054 (2016) 64. Sharma, A., Kakkar, A.: Forecasting daily global solar irradiance generation using machine learning. Renew. Sustain. Energy Rev. 82, 2254–2269 (2018) 65. Tan, W.M., Sullivan, P., Watson, H., Slota-Newson, J., Jarvis, S.A.: An indoor test methodology for solar-powered wireless sensor networks. ACM Trans. Embedded Comput. Syst. (TECS) 16(3), 1–25 (2017) 66. Kosunalp, S.: A new energy prediction algorithm for energy-harvesting wireless sensor networks with Q-learning. IEEE Access 4, 5755–5763 (2016)
Approach of Machine Learning Algorithms to Deal with Challenges …
395
67. Hsu, R.C., Liu, C.-T., Wang, H.-L.: A reinforcement learning-based ToD provisioning dynamic power management for sustainable operation of energy harvesting wireless sensor node. IEEE Trans. Emerg. Topics Comput. 2(2), 181–191 (2014) 68. Awan, S.W., Saleem, S.: Hierarchical clustering algorithms for heterogeneous energy harvesting wireless sensor networks. In: Wireless Communication Systems (ISWCS). IEEE (2016)
Cross-Domain Recommendation Approach Based on Topic Modeling and Ontology Vikas, Bhawana Tyagi, Vinay Kumar, and Pawan Sharma
Abstract Single-domain recommendation systems lack in cross-diversity of items. For a sparse user, it is very difficult to recommend item. But we can utilize the knowledge of other domains to recommend an item to that sparse (unknown user preferences) user. This can be possible with the help of cross-domain recommendation systems. After watching movies, listening to songs or reading books, user reviews them with a specific rating. This data which is generated by a user can be used for recommendation purpose. By using topic modeling techniques like Latent Dirichlet Allocation (LDA) and ontology methods, genres can be extracted from reviews. By passing user reviews to the model, a pattern of words can be obtained which further be mapped to genres by implementing ontological profile approach. Dictionary for the genres can be obtained by crawling similar type of words related to a genre. Book dataset and movie lens dataset are used for our implementation purpose. Evaluated result gives better mean precision in comparison with existing approach based on semantic clustering. Keywords Cross-domain recommendation system · Topic modeling: Latent Dirichlet Allocation · Ontology
1 Introduction In today’s scenario, the number of people prefers the online shopping; they also prefer to watch the movies, web series and so on. As on the Internet, there is a huge amount of data, due to which it is become very difficult for the user to pick the best from the tremendous data. So we need some mechanism with the help of Vikas (B) · V. Kumar School of Information Technology, C-DAC Noida, India B. Tyagi Banasthali Vidyapith, Vanasthali, Rajasthan, India P. Sharma Dronacharya College of Engineering, Greater Noida, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_32
397
398
Vikas et al.
which the things the user may like can be recommended. A recommender system [1] is used to recommend an item to a user. Recommendation performed by the engine is based on the calculations of predicted rating. Rating’s prediction is based on the preference of a user. These systems are playing a major role in improving the productivity of ecommerce by increasing the area of recommendation for a user. There are various types of recommendation system like movies, songs, books, article, news, videos, etc. There are various sites which successfully implemented these systems, some of them are Netflix (for movie recommendation), YouTube, Shadi.com (for recommending groom or bribes based on the provided user information). By using user-generated data like comments or reviews, ratings or any kind of usergenerated data, the performance of recommender engine can be enhanced. At e-commerce sites, users express their behavior for an item in the form of comments, reviews, ratings, etc. The recommender systems that are currently available can recommend the items for the limited or the specific domain, for example, the popular Netflix only recommends the web series, TV series and movies. These systems have not considered the information that can be extracted from the other domain for better recommendation. To improve the currently available recommender system, the information from the diverse domain can be extracted to make it better which may be called cross-domain recommender system. Some of the sites perform the recommendation among different domains by using the data like comments, reviews, rating, etc. Hence, information of one domain can be used to recommend item in another domain [2]. Various approaches are implemented to perform crossdomain recommendation like linking different domains by using auxiliary rich information [3]. Approaches like tag-based [4], ontological profiling [5], rating scenarios [6], etc. have been implemented for performing the recommendation among different domains. All of these approaches deal with sparsity level to an extent. In this paper, we have used Latent Dirichlet Allocation (LDA) along with ontology technique to map the generated topic pattern with a genre. Genre-related words exist in dictionary of words which is obtained by crawling semantically related words from web. This paper is structured among the different sections. Section 2 describes the related work, Sect. 3 covers problem formulation, dataset used, experimental setup and proposed approach, and Sect. 4 shows the evaluated result and conclusion is in Sect. 5.
2 Related Work Information like reviews, comments multimedia information and tags is surveyed by Shi et al. [7], and they identified and discussed the main challenges that can be faced while developing the recommender system. Tobias et al. [5] developed a framework for cross-domain recommendation system by using semantic networks and ontology. Tang et al. [8] proposed the cross-domain topic learning method to address the challenges associated with the cross-domain recommendation system. Tan et al. [9] used the Bayesian approach for generating the cross-domain recommender system.
Cross-Domain Recommendation Approach Based on Topic Modeling …
399
Low et al. [10] proposed a system by using Dirichlet Multinomial Regression. Vivek et al. [11] uses tensor decomposition with semantic similarity. He uses tag as a bridge and calculated the tensor decomposition. Anil Kumar and Nitesh Kumar implemented common semantic space and topic modeling of semantic clustered libraries [12]. Zhang [13] and Wang et al. [14] combined the topic modeling and latent factor model to make the more personalize recommendation. Qian et al. [15] proposed the cross-domain recommendation system by using semantic correlations with the help of which semantic relationship between two dissimilar tags can be captured and further used for recommendation. The review of cross-domain recommendation system is well covered by Jin et al. [16].
3 Problem Formulation This research paper is formulating a cross-domain recommendation approach in between the domains books (novels) and movies. Movies are being recommended based on ratings and reviews of both the domains where book is our source domain and movie is our target domain. In this section, dataset used, experimental setup and the proposed approach are being discussed (Fig. 1).
3.1 Dataset Two real-world datasets are being used. The first dataset is book-crossing datasets (novels) [17] which was collected by Cai-Nicolas Ziegler from the bookcrossing community. The dataset contains 278,858 users and they have provided both implicit and explicit 1,149,780 ratings. The ratings were about 271,379 books. The second dataset which we have considered is movie lens (movies) [18] 100K datasets. The book-crossing dataset has high sparse level because of vacant ratings, we preprocessed the dataset. The reviews of books and movies are crawled from imdb.com and theamazon.com. Fig. 1 Problem formulation Source domain: Book (Novels )
Target Domain: Movie (Novels)
400
Vikas et al.
3.2 Data Preprocessing There are 943 users and 1581 movies present in movie lens dataset. This dataset is already preprocessed, i.e., it already has frequently reviewed movies. For bookcrossing dataset, ratings are normalized from 1 to 10. Frequently reviewed books have been fetched. Books which have been rated by at least 20 users are considered as frequently reviewed books. Threshold for frequently reviewed books can be changed as per the need.
3.3 Experimental Setup The experimental setup contains the development environment information. Anaconda Spyder is used for implantation purpose. Entire implementation is performed on Windows.
3.4 Proposed Approach This section discusses the detailed steps used for implanting our approach. It includes five sequential steps: Data preprocessing, topic modeling, ontology for generating the genres, user novel generation of both the domains, cross-domain recommendation and performance evaluation. Approach flow is shown in Fig. 2. Steps:
3.5 Data Preprocessing Dataset preprocessing is already discussed in Sect. 3B. This step explains the preprocessing of reviews of both the domains.
3.5.1
Tokenization
Splitting the reviews into atomic elements, tokenization can be performed in various manner, and we are using NLTK library which contains “tokenize. regexp.”
Cross-Domain Recommendation Approach Based on Topic Modeling …
401
Data Preprocessing
Source Domain: Book (Novel)
Target Domain: Movies
Topic Modelling on reviews: Latent Dirichlet Allocation
Ontology for genre generation
User-Novel genre preference
User-Movie genre preference
Cross Domain Recommendation
Performance Evaluation Fig. 2 Proposed approach
3.5.2
Stop Words Removal
Words like “for,” “and,” “the” and all related conjunctive words are of no use for LDA model as they cannot describe anything specific. So, we remove these words.
3.5.3
Frequent Words
LDA model requires most frequent words. In our case, word that appears at least three times is considered as most frequent word. Threshold for word frequency can be changed as per needs.
402
3.5.4
Vikas et al.
Stemming
It is the process of reducing a word to its root word. For example: fishing, fished. These words have similar meaning so these can be reduced to “fish.”
3.5.5
Document Term Matrix
For generating our LDA model, it is important to calculate the frequency of each term (word) in each document (review). For this, we make a dictionary in which every word is assigned with an “id” and word count (number of times a word occurs in a document).The collection of words is called bow (bag of words).
3.5.6
TF-IDF Model
We have our processed corpus. Now, how my machine will decide “Which word should be given more importance?”. For that, we calculate term frequency (TF) and invert document frequency (IDF) model with the help of following formulae: TF = IDF = Loge (
Word Occurance in a document Number of words in that document
Number of documents ) Number of documents conatining that word
(1) (2)
Then: TF-IDF Model: TF * IDF.
3.6 Topic Modeling on Reviews Topic modeling is a technique of extracting topics from documents (in our case it is “bow”). Topic means a group of words, i.e., pattern that frequently occur in our corpus. There are various topic modeling techniques like: probabilistic latent symantec analysis (PLSA ), Latent Dirichlet Allocation (LDA ). In our approach, we use LDA to train our model for generating topics from the corpus and then topics are mapped to a specific genre. For assigning words to the right topic, number of iterations are performed by LDA. Iteration of a word in a topic is formulated by two probabilities P1 and P2 as: P1: Number of words in a document that are assigned to a topic. P1
Topic Document
(3)
Cross-Domain Recommendation Approach Based on Topic Modeling …
403
P2: Proportion of words in whole document assign to a particular topic. P2(
Word ) Topic
(4)
and formulated word topic probability: P = P1 × P2
(5)
3.7 Ontology for Genre Generation The pattern of words, i.e., topics generated by LDA needs to be mapped with a specific word (genre). For mapping purpose, we created a dictionary of semantic type of words. For example: A topic with the following words: “laughter, funny, enjoy, joke” clearly defines that the topic is about “comedy” genre. Similarly, another topic “agent, suspects, security witness” says about “crime.” In the same way, we generate multiple topics and map them to a specific genre.
3.8 User-Novel/Movie Generation By passing review of a user to our model and performing ontology, user genre preferences matrix is obtained.
3.9 Cross-Domain Recommendation After calculating the genre preferences of both the domains, we calculated the genre preferences of users. By summing up the genre entries of a single user, genre having value greater than 3 is considered as user’s genre preference. After calculating the genre preferences of user in book (novel) domain, movies (top rated movies, i.e., rating should be greater than 5) have been fetched and recommended to the user of novel domain. After performing recommendation, cross-verification is done. It is being checked whether the movies which are recommended are as per the novel’s user preference.
404
Vikas et al.
4 Result Evaluation Performance of our approach is evaluated on the following measures:
4.1 Precision Precision =
Nrs Ns
(6)
4.2 Recall Recall =
Nrs Nr
(7)
4.3 F-Measure F - Measure =
2 × Precision × Recall Precision + Recall
(8)
where, Nrs: Number of recommended items that the user prefers. Ns: Number of recommended items. Nr: Number of item that the user prefers. Table 1 shows the evaluated result for the top 10 user’s cutoff. Precision and recall summary: Table 1 Result
User
Precision
Recall
1
0.399183457052
0.0101106044538
2
0.382375397667
0.0001060445387
3
0.294906327324
0.0001106044538
4
0.283520678685
0.0001060445387
5
0.274326617179
0.0021004453878
6
0.227995758218
0.0001060404453
7
0.211786093016
0.0001060445387
8
0.200159066808
0.0000020890774
9
0.190880169671
0.0000020890774
10
0.183881230117
0.0000120890774
Cross-Domain Recommendation Approach Based on Topic Modeling …
405
Mean Precision: Top 5: 0.32686249 Top 10: 0.26490147. Mean F-Measure: Top 5: 0.003958 Top 10: 0.0025.
5 Conclusion We have studied various approaches for performing cross-domain recommendations and conclude that reviews given by the user can be used to generate the preferences, which further can be used for recommendation purpose. For generating the user preferences from reviews, we have studied a topic modeling approaches. Latent Dirichlet Allocation in detail followed the same for our research work. The performance evaluation table shows that recommendations with LDA performed better than just collaborative and content-based filtering that includes user-item overlap and also gives better result than semantic clustering and cosine similarity. The proposed approach can be used with other domains for recommendation purpose. Different types of user-generated data like images, videos, etc. along with topic modeling approach can be used for recommendation purpose in between different domains.
References 1. Adomavicius, G., Tuzhilin, A.: Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng. 17(6), 734–749 (2005) 2. Cantador, I., Fernández-Tobías, I., Berkovsky, S., Cremonesi, P.: Cross-domain recommender systems. In: Ricci, F., Rokach, L., Shapira, B. (eds.) Recommender Systems Handbook. Springer, Boston, MA (2015). https://doi.org/10.1007/978-1-4899-7637-6_27 3. Shi, Y., Larson, M., Hanjalic: Collaborative filtering beyond the user-item matrix. In: ACM Computing Surveys (CSUR) (2014) 4. Shi, Y., Larson, M., Hanjalic, A.: Generalized tag-induced cross domain collaborative filtering. arXiv preprint arXiv:1302.4888. (2013) 5. Fernández-Tobías, I., Cantador, I., Kaminskas, M., Ricci, F.: A generic semantic-based framework for cross-domain recommendation. In: Proceedings of the 2nd International Workshop on Information Heterogeneity and Fusion in Recommender Systems (2011) 6. Li, B., Yang, Q., Xue, X.: Transfer learning for collaborative filtering via rating-matrix generative model. In: Proceedings of 26th Annual International Conference on Machine Learning (2009) 7. Shi, Y., Larson, M., Hanjalic: Collaborative filtering beyond the user-item matrix, A survey of state of art and future challenge. ACM Comput. Surv. (2014). 8. Tang, J., Wu, S., Sun, J., Su, H.: Cross-domain collaboration recommendation. In: Proceedings of 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2012)
406
Vikas et al.
9. Tan, S., Bu, J., Qin, X., Chen, C., Cai, D.: Cross domain recommendation based on multi-type media fusion. Neurocomputing (2014) 10. Low, Y., Agarwal, D., Smola, A.J.: Multiple domain user personalization. In: 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2011) 11. Kumar, V., Mohan, K., Shrivastva, P.D., Singh, S.: Cross domain recommendation using semantic similarity and tensor decomposition. In: International Conference on Computational Modeling and Security (CMS) (2016) 12. Kumar, A., Kumar, A., Hussain, M., Chaudhury, S., Agarwal, S.: Semantic clustered-based cross domain recommendation. In: IEEE Symposium on Computational Intelligence and Data Mining (CIDM) (2014) 13. Zhang, W., Wang, J.: Integrating topic and latent factors for scalable personalized review-based rating prediction. IEEE Trans. Knowl. Data Eng. 28(11), 3013–3027 (2016). https://doi.org/ 10.1109/TKDE.2016.2598740 14. Wang, H., Lu, Y., Zhai, C.: Latent aspect rating analysis without aspect keyword supervision. In: Proceedings of 17th ACM SIGKDD International Conference on Knowledge Discovery Data Mining (KDD), pp. 618–626 (2011). https://doi.org/10.1145/2020408.2020505. 15. Zhang, Q., Hao, P., Lu, J., Zhang, G.: Cross-domain recommendation with semantic correlation in tagging systems. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. Budapest, Hungary, (2019). https://doi.org/10.1109/IJCNN.2019.8852049 16. Jin, Y., Dong, S., Cai, Y., Hu, J.: RACRec: review aware cross-domain recommendation for fully-cold-start user. IEEE Access 8, 55032–55041 (2020). https://doi.org/10.1109/ACCESS. 2020.2982037 17. https://www.kaggle.com/somnambwl/bookcrossing-dataset. Last accessed 12 Sept 2020 18. https://grouplens.org/datasets/movielens/. Last accessed 12 Sept 2020
The Study of Linear and Nonlinear Fractional ODEs by Homotopy Analysis H. Gandhi, A. Tomar, and D. Singh
Abstract In our work, we broaden the application of homotopy analysis method (HAM) to solve linear and nonlinear time fractional ordinary differential equations (FODEs) with given initial conditions. In this methodology, the approximate solutions of initial value problems are considered as a Taylor’s expansion, which converges swiftly to exact solution of linear and nonlinear system. This iterative scheme consists of nonzero parameter ‘è’, which is known as an exclusive controlling convergence parameter and fractional parameter ‘α’, so one would be able to dominance convergence of obtained solutions of fractional systems with the assistance of defined parameters in modern science, engineering and technology. Keywords Homotopy analysis method · Fractional ordinary differential equations · Linear and nonlinear FODEs
1 Introduction The stimulus generalization of integer-order differential equations known as fractional differential equations [1–3] has received major absorption in current decades and has been explored and tried to solve many physical real mathematical modeling problems, which are modeled in distinct areas of science and technology [4, 5]. In last 30 years, fractional calculus became hottest area of research and it was observed that diverse interdisciplinary solicitations can be modeled and studied with the essential use of fractional derivatives. There are many mathematicians [6–8] who donated classical and fractional biological models to society and researchers [9–17] who contributed in the solutions of problems by Lie symmetry reduction with power H. Gandhi (B) · D. Singh Amity School of Applied Science, Amity University Haryana, Gurgaon, India D. Singh e-mail: [email protected] A. Tomar Amity Institute of Applied Science, Amity University, Noida, Uttar Pradesh, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_33
407
408
H. Gandhi et al.
series analysis to reduce the system of FPDEs into FODEs with the use of ErdyliKober operators. Overall, classical derivative is limiting case of fractional derivative as it is the ladder operator to generate classical model via fractional model. In our work, we emphasized on HAM, as it is powerful device for solving time fractional linear and nonlinear ODEs. Liao [18, 19] proposed this method in last of twentieth century to obtain analytic estimations of nonlinear differential modeling. It provides us the convergence of series solution and great privilege to choose accurate auxiliary operators, functions and parameters, which increase the rate and area of convergence. The researchers and scientists from distinct areas have been fortunately applying HAM technique to the large number of nonlinear problems. Arora et al. [20] provided with numerical simulation of ITO coupled system and concluded that the HAM ensures the convergence of nonlinear problems by choosing a suitable value of controlling parameter è. Bataineh et al. [21, 22] have solved system of ODEs and explained direct solution of nth-order IVPs by HAM. Many authors [23–28] have discovered that the differential transform technique, the Adomian decomposition method, the homotopy perturbation technique, the variation iteration analysis are the special occurrences of HAM, when the controlling parameter è = −1. Authors [29, 30] presented the utilization and validity of solution of system of FPDEs and concluded that this technique does not require any kind of discretization. Researchers [31–35] provided the soft computing applications concerned with real and daily life phenomenon. In our paper, we will focus on finding the solution of nonlinear fractional ODEs of the form D αi xi + Fi (t, x1 , x2 , · · · xm ) = f i (t), xi (t0 ) = ci where 0 < αi ≤ 1 and i ∈ N (1) where f i (t) are real valued linear or nonlinear functions with t i ε R. Author [21, 22] proposed solution of many classical problems based on (1) like stiff system, Genesio and Riccati type systems, etc., with HAM. In this paper, we organized introduction, preliminaries in Sect. 2, an explanation to methodology in Sect. 3, which is followed by HAM series solution and computational graphical analysis of some fractional ODEs with given initial conditions under Sect. 4, which is succeeded by concluded remarks.
2 Preliminaries Some key definitions, results of fractional derivatives and integral theory provided here. Definition: The Riemann-Liouvelli integral operator ‘I μ ’of fractional order ‘μ’ is defined as t 1 μ (t − τ )μ−1 g(τ ) dτ ; μ > 0 (2) I g(t) = (μ) 0
The Study of Linear and Nonlinear Fractional ODEs …
409
With the following conditions (i) I 0 g(t) = g(t)
(3)
(ii) I δ I η = I δ+η = I η I δ
(4)
(α + 1) (t − a)δ+α ; δ ≥ 0 and α > −1 (δ + α + 1)
(iii) I δ (t − a)α =
(5)
Definition: The Caputo definition of fractional derivatives is described as δ
D q(t) = I
i−δ
1 D q(t) = (i − μ)
t (t − τ )i−δ−1 q i (τ )dτ
n
0
(6)
where i − 1 < δ ≤ i, i ∈ N and t > 0 With the following results: D δ I δ q(t) = q(t) I δ D δ q(t) = q(t) −
m−1
q k (0+ )
k=0
(7) (t − a)k (k + 1)
(8)
3 Introduction to Fractional HAM Let us assume the fractional differential equation [v(t)] = 0
(9)
Here ‘’ is nonlinear fractional differential operator, ‘t’ denotes independent time variable, and v(t) is a needed function, respectively. Zero-th order deformed equation is described by (1 − q) L[ (t; q) − v0 (t)] = q H (t) [ (t; q)]
(10)
Here q ∈ [0, 1] is variable embedding parameter; H = 0 and è = 0 are auxiliary function and controlling auxiliary parameter, respectively; L is an auxiliary linear fractional operator; (t; q) is a function to be evaluated and v0 (t) an initial approximation of v(t). It is major fact of this technique that here one has great freedom to take appropriate auxiliary functions and operators; it is obvious that when ‘q’ varies
410
H. Gandhi et al.
from zero to one, it shows that the solution (t; q) varies from initial approximation v0 (t) to final obtained solution v(t).
(t; 0) = v0 (t)
(t; 1) = v(t)
(11)
Expanding (t; q) in Taylor’s terms series about q = 0, we obtain (t; q) = v0 (t) +
∞
vi (t) q i
(12)
i=1
where ∂ i [ψ (t; q)] 1 vi (t) = (i − 1) ∂q i q=0
(13)
If above said auxiliary parameters and operators are properly taken, the above series get converge at q = 1 and then we obtain one of the required solution of original nonlinear ODEs. (t; 1) = v0 (t) +
∞
vi (t)
(14)
i =1
Differentiating (10) i-times with variable q and substituting q = 0, then divided by (i − 1), we found the i-th order deformed equation: L [vi (t) − χi vi−1 (t)] = H Ri (vi−1 (t))
(15)
1 ∂ i−1 [ (t ; q)] Ri (vi−1 (t)) = (i) ∂ q i−1 q=0
(16)
where
And χi =
1i ≥1 0i ≤1
(17)
Finally, for the computation, we obtain the approximated semi-analytical HAM solution represented by approximation of series. i (t) =
i−1 k=0
vk (t)
(18)
The Study of Linear and Nonlinear Fractional ODEs …
411
As ‘i’ approaches to ∞ in (18), to obtain exact solution.
4 Applications of Fractional HAM Example 1. The linear fractional ODE with initial conditions. D α v(t) + v(t) = 0; 0 < α ≤ 1; v(0) = 1, v (0) = 0
(19)
Applying HAM, then zero-th order deformed equation can be explained as (1 − q) L[v(t; q) − v0 (t)] = q H (t) [D α (v(t; q)) − v(t; q)]
(20)
We can select the initial approximation of v(t) as v0 (t) = 1
(21)
We have to select the auxiliary linear operator L(v(t; q)) = D α (v(t; q)); L(C) = 0
(22)
Selecting auxiliary function H(t) as unity. Also, m-th order deformed equation described as D α [vm (t) − χm vm−1 (t)] = [D α (vm−1 (t)) + vm−1 (t)]
(23)
Applying the integral operator I α on both sides of (23) vm (t) = χm vm−1 (t) + I α [D α (vm−1 (t)) + vm−1 (t)]; m ≥ 1
(24)
We obtain set of few terms of series solution represented as t α (α + 1)
(25)
2 t 2α (1 + )t α + (α + 1) (2α + 1)
(26)
22 (1 + )t 2α 3 t 3α (1 + )2 t α + + (α + 1) (2α + 1) (3α + 1)
(27)
v1 (t) = v2 (t) = v3 (t) = v4 (t) =
32 (1 + )2 t 2α 33 (1 + )t 3α 4 t 4α (1 + )3 t α + + + (α + 1) (2α + 1) (3α + 1) (4α + 1)
(28)
412
H. Gandhi et al.
Fig. 1 HAM solution to IVP at è = −1 and α = 0.75, 0.90, 0.99, 1
42 (1 + )32 t 2α 63 (1 + )2 t 3α 44 (1 + )t 4α (1 + )4 t α + + + (α + 1) (2α + 1) (3α + 1) (4α + 1) 5 t 5α (29) + (5α + 1)
v5 (t) =
Setting è = −1 in Eqs. (25–29), we obtain HAM truncated six terms solution (Fig. 1) φ(t) = 1 +
5 (−1)m t mα (mα + 1) m=1
(30)
Also, exact solution is represented by (t) = 1 +
∞ (−1)m t mα (mα + 1) m=1
(31)
Here obtained exact solution is coincident with [36]. Example 2. The nonlinear fractional ODE with 0 < t < 1 and m−1 < α ≤ m defined as D α v(t) − v 2 (t) − 1 = 0; m ∈ N with v ( p) (0) = 0 for p = 0, 1, 2, 3 . . . , (m − 1) (32) In this problem, we would able to change the values of fractional parameter ‘α’ tα as initial approximation as well as auxiliary parameter ‘è’ Selecting v0 (t) = (α+1) of u(t) and auxiliary operator is assumed to be L(v(t; q)) = D α (v(t; q))
(33)
The Study of Linear and Nonlinear Fractional ODEs …
413
For sake of simplicity consider auxiliary function H(t) as unity, then zero-th order deformed equation is explained by (1 − q)L[v(t; q) − v0 (t)] = q[D α (v(t; q)) − v 2 (t; q) − 1]
(34)
Differentiating m-times with change in variable ‘q’, and putting ‘q = 0’, then dividing each term by (m−1), to get the m-th order deformed equation and operating the fractional integral operator ‘I α ’ on both sides (34). We obtain α
α
vm (t) = χm vm−1 (t) + I [D vm−1 −
m−1
v j vm−1− j − (1 − χm )]; m ≥ 1
(35)
j=0
We obtain the sequence of five terms of series as v0 (t) = v1 (t) = −
tα (α + 1)
1 t 3α (2α + 1) . (3α + 1) [(α + 1)]2
t 3α (2α + 1) 1 . (3α + 1) [(α + 1)]2 1 t 5α (4α + 1)(2α + 1) . + 22 (5α + 1)(3α + 1) [(α + 1)]3
(36)
(37)
v2 (t) = −(1 + )
1 t 3α (2α + 1) . (3α + 1) [(α + 1)]2 t 5α (4α + 1)(2α + 1) 1 + 22 (1 + ) . (5α + 1)(3α + 1) [(α + 1)]3
2 ⎤ ⎡ (2α + 1) 2 1 ⎥ t 7α (6α + 1) ⎢ (α + 1) ⎥ ⎢ (3α + 1) − 3 ⎦ ⎣ (7α + 1) 1 4(4α + 1)(2α + 1) . + (5α + 1)(3α + 1) [(α + 1)]4
(38)
v3 (t) = −(1 + )2
v4 (t) = −(1 + )3
1 t 3α (2α + 1) . (3α + 1) [(α + 1)]2
1 t 5α (4α + 1)(2α + 1) . (5α + 1)(3α + 1) [(α + 1)]3 ⎤ ⎡
2 (2α + 1) 2 1 ⎥ t 7α (6α + 1) ⎢ (α + 1) ⎥ ⎢ (3α + 1) − 3(1 + )3 ⎥ ⎢ ⎦ (7α + 1) ⎣ 4(4α + 1)(2α + 1) 1 . + 4 (5α + 1)(3α + 1) [(α + 1)] + 32 (1 + )2
(39)
414
H. Gandhi et al.
⎫⎤ ⎡⎧
⎪ ⎪ (2α + 1) 2 1 ⎪ ⎪ ⎪ ⎪ . ⎥ ⎢⎪ ⎪ ⎪ 5 (3α + 1) ⎥ ⎪ ⎪ ⎢⎪ [(α + 1)] ⎪ ⎪ ⎥ ⎪ ⎪ ⎢⎨ ⎬
9α 5 ⎥ ⎢ t (8α + 1) 1 (6α + 1) (4α + 1) (2α + 1) ⎥ ⎢ + 24 + ⎥ ⎪ (9α + 1) ⎢ ⎥ ⎪ ⎪ ⎢⎪ (7α + 1) (5α + 1) (3α + 1) (α + 1) ⎪ ⎪ ⎥ ⎪ ⎢⎪ ⎪ ⎪ ⎪ ⎦ ⎣⎪ 1 4(4α + 1)(2α + 1) ⎪ ⎪ ⎪ ⎪ ⎩+ ⎭ . (5α + 1)(3α + 1) [(α + 1)]5
(40) As suggested by Liao [18], above terms of expressions (36–40) have auxiliary controlling parameter ‘è’ which determines convergence area and rate of approximations for this methodology. Taking è = −1, we obtain. t 3α (2α + 1) 1 tα + . (α + 1) (3α + 1) [(α + 1)]2 1 2t 5α (4α + 1)(2α + 1) . + (5α + 1)(3α + 1) [(α + 1)]3
2 ⎡ ⎤ (2α + 1) 2 1 ⎥ t 7α (6α + 1) ⎢ (α + 1) ⎢ (3α + 1) ⎥ + ⎦ (7α + 1) ⎣ 4(4α + 1)(2α + 1) 1 . + 4 (5α + 1)(3α + 1) [(α + 1)] ⎫ ⎧
(2α + 1) 2 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ . ⎪ ⎪ ⎪ ⎪ (3α + 1) ⎪ ⎪ ⎪ ⎪ ⎬ ⎨ 9α 1 t (8α + 1) (6α + 1) (4α + 1) (2α + 1) +2 +4 ⎪ ⎪ (9α + 1) [(α + 1)]5 ⎪ (7α + 1) (5α + 1) (3α + 1) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 4(4α + 1)(2α + 1) ⎪ ⎪ ⎪ ⎪ ⎭ ⎩+ . (5α + 1)(3α + 1) (41)
v(t) =
The exact solution of the ODE as ‘α → 1’ in (41) is v = tan t and as we consider è = −1, we obtain same result discussed by ADM [37] (Figs. 2 and 3).
5 Concluding Remarks Our study executed an iterative process based on HAM algorithm to receive Taylor’s series solutions for time fractional system of ODEs. The consequence of noninteger order of Caputo derivatives on solutions is studied through graphical representations for distinct cases. This article validates that HAM contributes precise numerical solutions without any restrictions on fractional ODEs. This technique has a clear trump card over other numerical techniques in sense that it provides an approximate series
The Study of Linear and Nonlinear Fractional ODEs … Fig. 2 HAM series solution at è = −1, α = 0.5, 0.75, 0.95, 1, 1.25, 1.5, 2
2.5
= -1
2.0 1.5
415
=.5
v(t)
=.75
1.0
=.95
0.5
=1 =1.25 =1.5 =2
0.2
0.4
0.6
0.8
Fig. 3 HAM series solution at α = 1, è = 0.5, 1, 1.4
1.0
=-1.4
1.5
1.0
=-1 v(t)
=-.5
0.5 t
0.2
0.4
0.6
0.8
1.0
solutions of the problems by deploying the selection of initial condition and parametric functions, controlling parameter ‘è’ and fractional parameter ‘α’. Graphical computations are analyzed by mathematica software. This technique delivers rapid convergence and also keeps down the difficulties that arise in calculating more accurate exact solutions. For future scope, author remarked that insertion of fractional derivative operator with explained technique delivered outstanding role to execute the historic study of linear and nonlinear models.
References 1. Podlubny, I.: Fractional Differential Equations: An Introduction to Fractional Derivatives. Academic, San Diego, CA (1999) 2. Oldham, K.B., Spanial, J.: The Fractional Calculus. Academic, New York (1974) 3. Debnath, L.: Recent applications of fractional calculus to science and enginnering. Int. J. Math. Math. Sci. 54, 3413–3442 (2003) 4. Iyiola, O.S., Olyinka, O.G.: Analytical solutions of time fractional models for homogeneous Gardner equation and non homogeneous differential equations. Ain Shams Eng. J. (2014). 2090–4479
416
H. Gandhi et al.
5. Biazer, J., Ghanbari, B.: HAM solution of some initial value problems arising in heat radiation equations. J. King Saud Univ.-Sci. 24, 161–165 (2012) 6. Cruywagen, G.V., Diana, E., Woodward, P., Tracqui, G.T., Murray, J.D.: The modeling of diffusive tumors. J. Biol. Syst. 3(4), 937–945 (1995) 7. Lonescu, C., Lopes, A., Copot, D., Machad, J.A.T., Bates J.H.T.: The role of fractional calculus in modeling biological phenomena. Commun. Nonlin. Sci. Numer. Stimul. 141–159 (2017) 8. Gandhi, H., Tomar, A., Singh, D.: A predicted mathematical cancer tumor growth model of brain and its analytical solution by reduced differential transform method. In: Proceedings of International Conference on Trends in Computational and Cognitive Engineering 2019. Advanced in Intelligent Systems and Computing, vol. 1169, pp. 203–213. Springer (2021) 9. Biswas, A., Song, M., Triki, H., Kara, A.H., Ahmad, B.S., Strong, A., Hama, A.: Solitons, shock waves, conservation laws and bifurcation analysis of boussinesq equation with power law non linearity and dual dispersion. Appl. Math. Inf. Sci. 3, 949–957 (2014) 10. Biswas, A., Khalique, C.M.: Optical Quasi-solitons by lie symmetry analysis. J. King Saud Univ.-Sci. 24, 271–276 (2012) 11. Bansal, A., Kara, A.H., Biswas, A., Moshokoa, S.P., Belic, M.: Optical soliton perturbation, group invariants and conservation laws of perturbed Fokes-Lenells equation. Chaos, Solitons Fractals 114, 275–280 (2018) 12. Wang, G.W., Hashemi, M.S.: Lie symmetry analysis and soliton solutions of time fractional K(m, n) equation. Pramana-J. Phys. 88(7), (2017) 13. Gandhi, H., Singh, D., Tomar, A.: Explicit solution to general fourth order time fractional KdV equation by Lie symmetry analysis. In: AIP Conference Proceedings 2253 (2020). Article Id 020012 14. Chauhan, A., Arora, R., Tomar, A.: Lie symmetry analysis and traveling wave solutions of equal width wave equation. Proyecciones J. Math. 39(1), 173–192 (2020) 15. Chauhan, A., Arora, R.: Time fractional Kupershmidt equation: symmetry analysis and explicit series solution with convergence analysis. Commun. Math. 27, 171–185 (2019) 16. Shi, D., Zhang, Y., Liu, W., Liu, J.: Some exact solutions and conservation laws of the coupled time fractional Boussinesq-Burgers system. Symmetry 11, 77 (2019) 17. Gandhi, H., Tomar, A., Singh, D.: Lie symmetry analysis to general fifth order time fractional Korteweg-de-Vries equation and its explicit solution. In: Proceedings of International Conference on Trends in Computational and Cognitive Engineering 2019. Advanced in Intelligent Systems and Computing, vol. 1169, pp. 189–201. Springer (2021) 18. Liao S.J.: The Proposed Homotopy Analysis Technique for the Solution of Nonlinear Problems. Ph.D. Thesis, Shangai Jiao Tong University, China (1992) 19. Liao, S.J.: Comparison between the homotopy analysis method and homotopy perturbation method’. Appl. Math. Comput. 169, 1186–1194 (2005) 20. Arora, R., Tomar, A., Singh, V.P.: Numerical simulation of ITO coupled system by homotopy analysis method. Adv. Sci. Eng. Med. 4, 522–529 (2012) 21. Bataineh, S.A., Noorani, M.S.M., Hashim, I.: Solving system of ODEs by homotopy analysis method. Commun. Nonlin. Sci. Numer. Simul. 13, 2060–2070 (2008) 22. Bataineh, S.A., Noorani, M.S.M., Hashim, I.: Direct solution of nth-order IVPs by HAM. Differ. Eqns. Nonlin. Mech. (2009). Article ID 842094 23. Gokdogan, A., Merden, M., Yildirim, A.: The modified algorithm for the differential transform method to solution of Genesio systems’. Commun. Nonlin. Sci. Numer. Simul. 17, 45–51 (2012) 24. Songxin, L., Jeffrey, D.J.: Comparison of homotopy analysis and homotopy perturbation method through an evolution equation. Commun. Nonlin. Sci. Numer. Simul. 14, 4057–4064 (2009) 25. Marasi, H.R., Narayan, V., Daneshbastam, M.: A constructive approach for solving system of fractional differential equations. Wavelets Fractal Anal. 3, 40–47 (2017) 26. Yang, X.J., Baleanu, D.: A local fractional variational iteration method for Laplace equation with fractional operators. Abst. Appl. Anal. (2013). Article ID 202650
The Study of Linear and Nonlinear Fractional ODEs …
417
27. Kumar, D., Singh, J., Mehmet, H.B.: An effective computational approach to local fractional telegraph equations. Nonlin. Sci Lett. A 8, 200–206 (2017) 28. Jafri, H., Tajadodi, H., Johnston, S.J.: A decomposition method for solving diffusion equation via local fractional time derivative. Thermal Sci. 19, S123–S129 (2015) 29. Maitama, S., Zhao, W.: Local fractional homotopy analysis method for solving non differentiable problems on cantor sets. Adv. Differ. Eqns. 2019, 127 (2019) 30. Jafari, H., Seifi, S.: Solving a system of nonlinear fractional partial differential equations using homotopy analysis method. Commun. Nonlin. Sci. Numer. Simul. 14, 1962–1969 (2009) 31. Singh, R., Seth, D., Rawat, S., Ray, K.: Performance investigations of multi-resonance microstrip patch anteena for wearable applications. In: Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computing, 742. Springer (2019) 32. Jain, L., Singh, R., Rawat, S., Ray, K.: Miniaturized, meandered and stacked MSA using accelerated design strategy for biomedical applications. In: International Conference of Soft Computing for Problem Solving, pp. 725–732 (2016) 33. Singh, R., Rawat, S., Ray, K., Jain, L.: Performance of wideband falcate implantable patch anteena for biomedical telemetry. In: International Conference of Soft Computing for Problem Solving, pp. 757–765 (2016) 34. Vaishali, Sharma, T.K., Abraham, A., Rajpurohit, J.: Trigonometric probability tuning in asynchronous differential evolution. In: Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computing, vol. 584. Springer (2018) 35. Sharma, T.K., Rajpurohit, J., Prakash, D.: Enhanced local search in shuffled frog algorithm. In: Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computing, vol. 1053. Springer (2020) 36. Diethelm, K., Ford, N.J., Freed, A.D.: A predictor-corrector approach for numerical solution of fractional differential equation. Nonlin. Dyn. 29, 3–22 (2002) 37. Shawagfeh, N.T.: Analytical approximate solutions for nonlinear fractional differential equations. Appl. Math. Comput. 131, 517–529 (2002)
The Comparative Study of Time Fractional Linear and Nonlinear Newell–Whitehead–Segel Equation H. Gandhi, A. Tomar, and D. Singh
Abstract In this study, the homotopy analysis method (HAM) and fractional reduced differential transform method (FRDTM) are executed to solve the reaction–diffusion-type time fractional Newell–Whitehead–Segel (NWS) partial differential equation. Experimental evidence is given to demonstrate the efficiency of these methods. The concept of fractional derivatives is taken in Caputo sense. The obtained outcomes are compared with each other and found that the solution formed by HAM has greater potential than FRDTM to solve linear and nonlinear fractional partial differential equations (FPDEs). Keywords Time fractional Newell–Whitehead–Segel equation · Fractional reduced differential transform method · Homotopy analysis method
1 Introduction The time fractional partial differential equation (FPDEs) for dynamical modeling of real-world natural problems such as study of ripple patterns in sand, seashell strips in different kind of spatial systems, which can be studied by set of PDEs known as amplitude equations and one of reaction–diffusion equation is the Newell– Whitehead–Segel equation expressed as: u t = ku x x + au − bu p
(1)
where p is positive integer, a, b and k > 0 are real constants, u(x, t) represents distribution function of particles with variation along space variable ‘x’ and time H. Gandhi (B) · D. Singh Amity School of Applied Science, Amity University Haryana, Gurugram, India D. Singh e-mail: [email protected] A. Tomar Amity Institute of Applied Science, Amity University, Noida, Uttar Pradesh, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_34
419
420
H. Gandhi et al.
‘t’ in liquid or solid medium, like very thin rods or flow of fluids in thin pipes. ‘u t ’gives the partial variation in ‘u’ with time ‘t’ and ‘u x x ’ shows changes of ‘u’ with variable ‘x’ at particular time and (au − bu p ) is a source term. NWS type of reaction diffusion equations can be used for the study of patterns, epidemic spread and biological system analysis [1, 2]. In this paper, we are considering time fractional model of Eq. (1) in the form: ∂tα u = ku x x + au − bu p , 0 < α ≤ 1
(2)
where α is a fractional parameter to view historic physical states of fluid flow problems. Fractional order derivatives and differential equations [3–5] have been received major absorption in current decades and tried to solve many physical real mathematical modeling problems, which are modeled in distinct areas of science and technology. The most salient part of making the use of fractional partial differential models is to know the present, past and future activities of dynamical systems. Hence in present scenario, fractional modeling is more fruitful and realistic in fluid dynamics, control theory, signal processing, diffusion, electromagnetic and many more areas [3, 4, 6–11]. As every concept has some limitation, in case of fractional derivatives, Caputo definition is more accurate than RL definition. In RL sense, the fractional derivative of constant need not zero but Caputo declared that fractional derivative of constant function is zero, which is major conflict in fractional mathematical modeling so, researcher must know the use of fractional order derivatives and their applications. In field of research, various methods are implemented to solve linear and nonlinear partial differential equations such as homotopy perturbation method [12], variation iteration method [13], reduced differential transform [14–18], Lie symmetry analysis [19–26], homotopy analysis method [5, 13, 27–30], etc., Authors [31–35] provided some applications of soft computing in our daily life. In our article, we are going through the comparative study of FRDTM and HAM techniques and expressing the advantages of fractional HAM to derive the approximate analytical solution with the choice of convergence parameter ‘è’ over FRDTM and use of soft computing graphical analysis by using Mathematica. After introduction above, we are going through the basic definitions in Sect. 2, required to solve problems and Sect. 3 is explained to know about comparative methodologies used. Progressively discussed the applications of above said NWS equations in Sect. 4, and finally, concluding remarks involved in this study are mentioned.
2 Basic Definitions Some important definitions are provided as under, for further discussion of methodologies and solution of linear and nonlinear fractional NWS-PDEs.
The Comparative Study of Time Fractional Linear and Nonlinear …
421
Definition: The Riemann–Liouville integral operator ‘I μ ’ of fractional order ‘μ’ is defined as: t 1 (t − τ )μ−1 g(τ ) dτ ; μ > 0 (3) I μ (g(t)) = (μ) 0 Also (i) I 0 (g(t)) = g(t)
(4)
(ii) I δ (I η ) = I δ+η = I η (I δ )
(5)
(iii) I δ (t − a)α =
(α + 1) (t − a)δ+α ; δ ≥ 0 and α > −1 (δ + α + 1)
(6)
Definition: The Caputo definition of fractional derivatives is described as: δ
D g(t) = I
i−δ
1 D q(t) = (i − μ)
t (t − τ )i−δ−1 q i (τ )dτ
n
0
(7)
where i − 1 < δ ≤ i, i ∈ N and t > 0 With the following non-commutative declaration of fractional differential and integral RL-operators, it is most important for getting the solution of problems in fractional modeling. D δ (I δ q(t)) = q(t) I δ (D δ q(t)) = q(t) −
m−1 i=0
q k (0+ )
(8) (t − a)i (i + 1)
(9)
3 Methodologies 3.1 Fractional Reduced Differential Transform Method Let the function Um (x) is the generated transformed function from original function u(x, t), which is analytic and m-times continuously differentiable in the domain of interest of space variable ‘x’ and time ‘t’ with ‘α’ is fractional parameter.
422
H. Gandhi et al.
Table 1 . Functions w(x, t)
Fractional reduced differential transforms m 1 Dt w(x, t) t=0 Wm (x) = (mα+1)
w(x, t) = p(x, t) ± q(x, t)
Wm (x) = Pm (x) ± Q m (x)
w(x, t) = α p(x, t)
Wm (x) = α Pm (x)
w(x, t) = ∂x ( p (x, t))
Wm (x) = ∂x (Pm (x))
w(x, t) = p (x, t).q (x, t)
m
Wk (x) =
r =0
Pr (x)Q m−r (x) =
w(x, t) = Dtnα ( p (x, t))
Wm (x) =
(1+mα+nα) (1+mα) Pm+α (x)
w(x, t) = sin( pt + α)
Wk (x) =
pm m!
or cos( pt + α)
m
p or m ! cos
Um (x) =
πm 2
sin
πm
+α
2
+α
m r =0
Pr (x)Q m−r (x)
m 1 Dt u(x, t) t=0 (mα + 1)
(10)
which describes the order of time fractional derivative. The differential inverse transform of Uk (x) is defined as: u(x, t) =
∞
Um (x)(t − t0 )mα
(11)
m=0
Then, combining above equations, we have: u (x, t) =
∞
1 (Dtm u(x, t))t=0 (t − t0 )mα (mα + 1) m=0
(12)
It can be found that the concept of the FRDTM is derived from the Taylor’s series expansion and the initial approximation is given by the initial condition U0 (x) = u(x, 0). Taking the above said transformation of the equation to be solved, we obtain an iterative scheme for Um (x). Operating inverse transformation of the obtained set of values Um (x) ; m = 0 , 1, 2, ....... n gives approximate solution. Therefore, the exact series solution u(x, t) is given by (12). Some of the important transforms explained under are useful for solution of FPDEs by FRDTM (Table 1).
3.2 Homotopy Analysis Method Liao [6, 12] proposed this method in recent decades to obtain analytic estimations of nonlinear differential modeling. It provides us the convergence of series solution and great privilege to choose accurate auxiliary operators, functions and parameters,
The Comparative Study of Time Fractional Linear and Nonlinear …
423
which increase the rate and area of convergence of solutions obtained from linear and nonlinear classical or fractional ODEs. Let us assume the fractional differential equation: [ f (x, t)] = 0
(13)
subject to initial condition f 0 (x, t) = g(x). Here ‘ ’ is nonlinear fractional differential operator. f (x, t) is a needed function of space and time variables ‘x’ and ‘t,’ respectively. Zero-th order deformed equation is described by (1 − q) L[ ς (x, t; q) − f 0 (x, t)] = q H (t) [ ς (x, t; q)]
(14)
Here, q ∈ [0, 1] is variable embedding parameter; H = 0 and è = 0 are auxiliary function and controlling auxiliary parameter, respectively; L is an auxiliary linear fractional operator; ς (x, t; q) is a function to be evaluated and f 0 is initial approximation of f (x, t). It is major fact of this technique that here one has great freedom to take appropriate auxiliary function and operator; it is obvious that when q varies from zero to one, it shows that the solution ς (x, t; q) varies from initial approximation f 0 to final obtained solution f (x, t). (t; 0) = f 0 (x, t) ; (t; 1) = f (x, t)
(15)
Expanding ς (x, t; q) in Taylor’s terms series about q = 0, we obtain: ς (x, t; q) = f 0 (x, t) +
∞
f i (x, t) q i
(16)
i=1
where f i (x, t) =
∂ i [ς (x, t; q)] 1 (i − 1) ∂q i q=0
(17)
If above said auxiliary parameters and operators are properly taken, the above series gets converge at q = 1 and then we obtain one of the required solutions of original nonlinear ODEs. ς (x, t; 1) = f 0 (x, t) +
∞
f i (x, t)
(18)
i =1
Differentiating (14) i-times with respect to ‘q’ and substituting q = 0, and divided by (i − 1), we obtain the i-th order deformed equation: L [ f i (t) − χi f i−1 (t)] = H Ri ( f i−1 (x, t))
(19)
424
H. Gandhi et al.
where Ri ( f i−1 (x, t)) =
1 ∂ i−1 [ς (x, t ; q)] (i) ∂ q i−1 q=0
(20)
And χi =
i >1 i ≤1
1 0
(21)
Finally, for the computation, we obtain the approximated HAM solution represented by approximation of series. ςi (x, t) =
i−1
f k (x, t)
(22)
k=0
As ‘i’ approaches to ∞ in (22), we obtain exact solution.
4 Applications of Methodologies Example 4.1 Consider the linear NWS linear equation with fractional order ‘α’ under initial condition as (Figs. 1, 2, 3, 4, 5 and 6): ∂tα u = u x x + 2u, 0 < α ≤ 1, and u 0 (x, t) = e x
(23)
Applying fractional reduced transforms on (23), we obtain ∂ 2 Uk (kα + α + 1) Uk+1 = + 2Uk (x) (kα + 1) ∂x2 Fig. 1 Exact solution u(x, t) with x and t of (23)
(24)
The Comparative Study of Time Fractional Linear and Nonlinear … Fig. 2 Approximate HAM solution u(x, t) at è = −1 and α = 1 of (23)
Fig. 3 Approximate HAM solution u(x, t) at è = −0.75 and α = 1 with x and t of (23)
Fig. 4 Approximate HAM solution u(x, t) at è = −1 and α = 0.5 with x and t of (23)
Fig. 5 Approximate HAM solution u(x, t) at è = −1 and α = 0.75 with x and t of (23)
425
426
H. Gandhi et al.
Fig. 6 Approximate HAM solution u(x, t) at è = −1 and α = 0.95 with x and t of (23)
Substituting k = 0, 1, 2, 3….., few terms are described as: U1 =
−e x ex ex ; U2 = ; U3 = − ; (α + 1) (2α + 1) (3α + 1)
(25)
So, obtained solution is given as: u(x, t) =
.∞
Um (x)t mα = e x + e x
m=0
.∞
(−1)m t mα (mα + 1) m=1
(26)
As α → 1 then exact solution is given by: u(x, t) = e x−t
(27)
4.1 Application of HAM Applying HAM, zero-th order deformed equation can be explained as: (1 − q) L[ς (x, t; q) − u 0 (x, t)] = q H (t) [D α (ς (x, t; q)) − ςx x − 2ς ]
(28)
Selecting auxiliary function H(t) as unity and to select the auxiliary linear operator. L(ς (x, t; q)) = D α (ς (x, t; q)) ; L(C) = 0
(29)
Also, m-th order deformed equation described as: D α [u m (x, t) − χm u m−1 (x, t)] = [D α (u m−1 (x, t)) + u m−1(x x) − 2u m−1 ] Applying the integral operator I α on (30).
(30)
The Comparative Study of Time Fractional Linear and Nonlinear …
427
u m (x, t) = χm u m−1 (x, t) + I α [ D α (u m−1 (x, t)) + u m−1(x x) − 2u m−1 ] ; m ≥ 1 (31) We obtain few terms of series solution tα (α + 1)
(32)
(1 + ) α 2 t + t 2α e x (α + 1) (2α + 1)
(33)
u 1 (t) = e x u 2 (t) =
(1 + )2 α 22 (1 + ) 2α 3 3α t + t + t ex ; u 3 (t) = (α + 1) (2α + 1) (3α + 1)
(34)
As we set è = -1, the HAM solution coincides with solution obtained by FRDTM in Eq. (26) and α → 1, we get the classical case which converges to exact solution (27). Example 4.2 Consider the nonlinear time fractional NWS equation with initial condition and ‘ρ’ is arbitrary constant (Figs. 7 and 8). ∂tα u = u x x + 2u − 3u 2 , 0 < α ≤ 1, and u 0 (x, t) = ρ
(35)
Applying fractional reduced transforms, we obtain k (kα + α + 1) ∂ 2 Uk Uk+1 = + 2Uk (x) − 3 Ur Uk−r (kα + 1) ∂x2 r =0
(36)
Substituting k = 0, 1, 2, 3….., few associated terms are as 1.00 0.98 0.96 0.94
u
= -.5
= -1 Exact Solution
0.92 0.90 t 0.05
0.10
0.15
0.20
Fig. 7 Approximate exact solution u(x, t) at è = −1, −0.5 and α = 1 with x and t of (35)
428
H. Gandhi et al. 1.00 0.98 0.96 0.94
= .95
u
=1
0.92 = .75 0.90 t 0.05
0.10
0.15
0.20
Fig. 8 Approximate exact solution u(x, t) at è = −1 and α = 0.75, 0.95, 1 with x and t of (35)
U1 =
(2ρ − 3ρ 2 ) 2(1 − 3ρ)(2ρ − 3ρ 2 ) ; U2 = ;... (α + 1) (2α + 1)
(37)
So, obtained series solution is given by: u(x, t) = ρ +
(2ρ − 3ρ 2 ) α 2(1 − 3ρ)(2ρ − 3ρ 2 ) 2α t + t (α + 1) (2α + 1)
(38)
As α → 1, exact approximate solution is given by u(x, t) = ρ +
(2ρ − 3ρ 2 ) 1 2(1 − 3ρ)(2ρ − 3ρ 2 ) 2 t + t (α + 1) (2α + 1)
(39)
Application of HAM Applying HAM, then zero-th order deformed equation can be explained as: (1 − q) L[ς (x, t; q) − u 0 (x, t)] = q H (t) [D α (ς (x, t; q)) − ςx x − 2ς + 3ς 2 ] (40) We have to select the auxiliary linear operator. L(ς (x, t; q)) = D α (ς (x, t; q)) ; L(C) = 0
(41)
Selecting auxiliary function H(t) = 1. Also, m-th order deformed equation can described as:
The Comparative Study of Time Fractional Linear and Nonlinear …
429
D α [u m (x, t) − χm u m−1 (x, t)] = [D α (u m−1 (x, t)) + u m−1 (x x) − 2u m−1 +
m−1
u r u m−1−r ]
(42)
r =0
Operator I α on (42). u m (x, t) = χm u m−1 (x, t)] + I α [ D α (u m−1 (x, t)) + u m−1(x x) − 2u m−1 +
m−1
u r u m−1−r ]; m ≥ 1
(43)
r =0
We obtain few terms of series solution: u 1 (t) =
(−2ρ + 3ρ 2 )t α (α + 1)
(1 + )(−2ρ + 3ρ 2 ) t α 22 (1 − 3ρ)(2ρ − 3ρ 2 )t 2α + ;··· (α + 1) (2α + 1) ⎡ ⎤ (1 + )(−2ρ + 3ρ 2 ) t α + ⎥ (−2ρ + 3ρ 2 )t α ⎢ (α + 1) ⎥ + ... +⎢ u(x, t) = ρ + ⎣ 2 2 2α (α + 1) 2 (1 − 3ρ)(2ρ − 3ρ )t ⎦ (2α + 1) u 2 (t) =
(44)
(45)
(46)
Similarly, as discussed in above problem as è = −1, we obtain same solution as explained in FRDTM by Eq. (38), and when α → 1, it converges to exact solution (39).
5 Conclusions In this paper, we have generated approximate semianalytical and exact solution of NWS equations of fractional order with application of FRDTM and HAM. The obtained approximate solutions by these methodologies were compared with each other. We found that the proposed numerical comparative analysis is accurate and efficient for solving time fractional linear and nonlinear modeling. It was revealed that HAM contributes precise numerical solutions without any restrictive assumptions on FPDEs. Author found that as the convergence parameter ‘è’ approaches to − 1, HAM solutions of NWS equations are coincident with FRDTM solutions. This technique has a clear trump card over FRDTM in sense that it provides an approximate series solutions of the problems by deploying the contextual selection of initial guess and parametric functions, controlling parameter ‘è’ and fractional parameter ‘α.’ Graphical computations are analyzed by Mathematica Software. HAM delivers rapid
430
H. Gandhi et al.
convergence and also keeps down the difficulties that arise in the judgment procedure of certain intricate terms.
References 1. Murray, J.D., Stanley, E.A., Brown, D.L.: On the spatial spread of rabies among foxes. Proc. R. Soc. Lond. B. Biol. Sci. 229(1255), 111–150 (1986) 2. Holmes, E.E., Lewis, M.A., Banks, J.E., Veit, R.R.: Partial differential equations in ecology, spatial interactions and population dynamic. Wiley 75(1), 17–29 (1994) 3. Podlubny, I.: Fractional Differential Equations: An Introduction to Fractional Derivatives, Fractional Differential Equations. Academic, San Diego, CA (1999) 4. Oldham, K.B., Spanial, J.: The Fractional Calculus. Academic, New York (1974) 5. Arora, R., Tomar, A., Singh, V.P.: Numerical simulation of ITO coupled system by homotopy analysis method. Adv. Sci. Eng. Med. 4, 522–529 (2012) 6. Liao S.J.: The Proposed Homotopy Analysis Technique for the Solution of Nonlinear Problems. Ph.D. Thesis, Shangai Jiao Tong University, China (1992) 7. Gokdogan, A., Merden, M., Yildirim, A.: The modified algorithm for the differential transform method to solution of Genesio systems. Commun. Nonlinear Sci. Numer. Simula. 17, 45–51 (2012) 8. Kumar, D., Singh, J., Mehmet, H.B.: An effective computational approach to local fractional telegraph equations. Nonlinear Sci. Lett. A. 8, 200–206 (2017) 9. Jafri, H., Tajadodi, H., Johnston, S.J.: A decomposition method for solving diffusion equation via local fractional time derivative’. Therm. Sci. 19, S123–S129 (2015) 10. Biswas, A., Song, M., Triki, H., Kara, A.H., Ahmad, B.S., Strong, A., Hama, A.: Solitons, Shock waves, Conservation laws and bifurcation analysis of boussinesq equation with power law non linearity and dual dispersion. Appl. Math. Inf. Sci. 3, 949–957 (2014) 11. Wang, G.W., Hashemi, M.S.: Lie symmetry analysis and soliton solutions of time fractional K(m, n) equation. Pramana-J. Phys. 88, 7 (2017) 12. Liao, S.J.: Comparison between the homotopy analysis method and homotopy perturbation method’. Appl. Math Comput. 169, 1186–1194 (2005) 13. Yang X.J., Baleanu D.: A local fractional variational iteration method for Laplace equation with fractional operators. Abst. Appl. Anal. (2013). Article ID 202650 14. Tomar, A., Arora, R.: Numerical simulation of coupled MKDV equation by reduced differential transform method. J. Comput. Methods Sci. Eng. 14, 269–275 (2014) 15. Srivastva, V.K., Avasthi, M.: Solution of Caputo time fractional order hyperbolic telegraph equation. AIP Adv. 3, 32–142 (2013) 16. Srivastva, V.K., Avasthi, M.: Solution of two and three dimensional second order time hyperbolic telegraph equations. J. King Saud Univ. 29, 166–171 (2017) 17. Srivastva, V.K., Avasthi, M., Kumar, S.: Solution of two dimensional time fractional biological population model. Egypt. J. 1, 71–76 (2014) 18. Gandhi, H., Tomar, A., Singh, D.: A predicted mathematical cancer tumor growth model of brain and its analytical solution by reduced differential transform method. In: Proceedings of International Conference on Trends in Computational and Cognitive Engineering 2019. Advanced in Intelligent Systems and Computing, vol. 1169, pp. 203–213. Springer (2021) 19. Biswas, A., Khalique, C.M.: Optical Quasi-solitons by lie symmetry analysis. J. King Saud Univ.-Sci. 24, 271–276 (2012) 20. Bansal, A., Kara, A.H., Biswas, A., Moshokoa, S.P., Belic, M.: Optical soliton perturbation, group invariants and conservation laws of perturbed Fokes-Lenells equation. Chaos, Solitons Fractals 114, 275–280 (2018) 21. Gandhi, H., Singh, D., Tomar, A.: Explicit solution to general fourth order time fractional KdV equation by Lie symmetry analysis. AIP Conf. Proc. 2253, 020012 (2020)
The Comparative Study of Time Fractional Linear and Nonlinear …
431
22. Chauhan, A., Arora, R., Tomar, A.: Lie symmetry analysis and traveling wave solutions of equal width wave equation. Proyecciones J. Math. 39(1), 173–192 (2020) 23. Chauhan, A., Arora, R.: Time fractional Kupershmidt equation: symmetry analysis and explicit series solution with convergence analysis. Commun. Math. 27, 171–185 (2019) 24. Shi, D., Zhang, Y., Liu, W., Liu, J.: Some exact solutions and conservation laws of the coupled time fractional Boussinesq-Burgers system. Symmetry 11, 77 (2019) 25. Maitama, S., Zhao, W.: Local fractional homotopy analysis method for solving non differentiable problems on cantor sets. Adv. Differ. Eqns. 2019, 127 (2019) 26. Gandhi, H., Tomar, A., Singh, D.: Lie symmetry analysis to general fifth order time fractional Korteweg-de-Vries equation and its explicit solution. In: Proceedings of International Conference on Trends in Computational and Cognitive Engineering 2019. Advanced in Intelligent Systems and Computing, vol. 1169, pp. 189–201. Springer (2021) 27. Bataineh, S.A., Noorani, M.S.M., Hashim, I.: Solving system of ODEs by homotopy analysis method. Commun. Nonlinear Sci. Numer. Simul. 13, 2060–2070 (2008) 28. Bataineh, S.A., Noorani, M.S.M., Hashim, I.: Direct solution of nth-order IVPs by HAM. In: Differential Equations and Nonlinear Mechanics (2009). Article ID 842094 29. Jafari, H., Seifi, S.: Solving a system of nonlinear fractional partial differential equations using homotopy analysis method. Commun. Nonlin. Sci. Numer. Simul. 14, 1962–1969 (2009) 30. Songxin, L., Jeffrey, D.J.: Comparison of homotopy analysis and homotopy perturbation method through an evolution equation. Commun. Nonlin. Sci. Numer. Simul. 14, 4057–4064 (2009) 31. Singh, R., Seth, D., Rawat, S., Ray, K.: Performance investigations of multi-resonance microstrip patch anteena for wearable applications. In: Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computing, vol. 742. Springer (2019) 32. Jain, L., Singh, R., Rawat, S., Ray, K.: Miniaturized, meandered and stacked MSA using accelerated design strategy for biomedical applications. In: International Conference of Soft Computing for Problem Solving, pp. 725–732 (2016) 33. Singh, R., Rawat, S., Ray, K., Jain, L.: Performance of wideband falcate implantable patch anteena for biomedical telemetry. In: International Conference of Soft Computing for Problem Solving, pp. 757–765 (2016) 34. Vaishali, Sharma, T.K., Abraham, A., Rajpurohit, J.: Trigonometric probability tuning in asynchronous differential evolution. In: Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computing, vol. 584. Springer (2018) 35. Sharma T.K., Rajpurohit, J., Prakash D.: Enhanced local search in shuffled frog algorithm. In: Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computing, vol. 1053. Springer (2020)
Parallel and Distributed Computing Approaches for Evolutionary Algorithms—A Review S. Raghul and G. Jeyakumar
Abstract The evolutionary computing (EC) domain provides variety of optimization algorithms to the computer science community. The increase of large-scale realworld optimization problems which consists of thousands of decision variable has given rise to different challenges EC algorithms (popularly known as evolutionary algorithms (EAs)). However, these challenges are addressed by the research community of EC domain integrating the parallel and distribute computing paradigms to design novel EA frameworks for solving complex large-scale optimization problems. The parallel and distributed EA frameworks have received global attention over the past two decades. This paper summarizes various commonly followed algorithm models for parallel and distributed EAs. This paper also comprehensively reviews the research attempts in the literature relating these algorithmic models with different EAs and different optimization problems. Keywords Evolutionary algorithm · Mater–slave model · Parallel computing · Distributed computing · Divide-and-conquer
1 Introduction Evolutionary algorithms (EAs) with stochastic and metaheuristic characteristics have given effective solutions to different and difficult optimization problems. EAs work based on the principle of evolution theory of nature. In EAs, for the given optimization problem, the solution space is generated in a randomized manner and optimal solution is searched in an iterative fashion. Unfortunately, the principle of EAs (search-based) has paved way for its drawback making it inefficient for applying it on large-scale (high-dimensional) optimization problems for two reasons listed below: S. Raghul · G. Jeyakumar (B) Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India e-mail: [email protected] S. Raghul e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_35
433
434
S. Raghul and G. Jeyakumar
(1)
Decision variables and the search space of targeted problem increases exponentially, which prevents EAs from exploring with reasonable number of search iterations. Computational cost of each iteration becomes expensive, when huge number of candidate solutions are provided by the search operators.
(2)
For the past two decades, researcher has been consistently trying to solve these two drawbacks. As a result of that, when the traditional evolutionary sequential algorithm fails to provide the satisfactory results at reasonable amount of time, we can depend upon parallel and distributed approaches which increases the applicability of EAs on solving high-dimensional optimization problems. It is globally accepted by all researchers that implementing any EA using parallel and distributed approaches reduces the number of search iteration considerably and increases the effectiveness of the algorithm. This paper reviews different ways of implementing EAs using parallel and distributed approaches. The remaining part of the paper is organized with multiple sections. Section 2 is for brief introduction of EA, Sect. 3 is for introducing the parallel and distributed EA, Sect. 4 is to elaborate the implementation details of different parallel and distributed approaches in the literature, Sect. 5 summarizes the content of Sect. 4, and finally, Sect. 6 concludes the paper.
2 Evolutionary Algorithms The term EA is originated from evolutionary computation (EC) domain. An EA is an optimization algorithm with metaheuristic characteristics. The inspiration of EA’s has been developed from the principles of biological evolutions such as reproduction, recombination, mutation, selection, fitness evaluation, and survival of fittest. The fitness evaluation functions play a vital role in determining the quality of solution candidate. The types of evolutionary algorithms are genetic algorithm (GA), genetic programming (GP), evolutionary programming (EP), evolutionary strategy (ES), and differential evolution (DE). Based upon the nature of the given optimization problem, the algorithmic behavior of these algorithms differs from each other. But there is a generality of implementation technique involved in all the algorithms, which can vary depending upon the parametric behavior of each algorithm. Generally, EAs follow the basic structure as shown in Fig. 1, which includes the following algorithmic steps. 1. 2.
Randomly generating the initial population of individuals. Iterating the following steps until the termination conditions are met. (i) (ii) (iii) (iv)
Fitness is evaluated for every individual present in the population. Selecting the fittest individual for reproduction. The new individuals are produced by crossover process. The offspring is given birth during the process of mutation.
Parallel and Distributed Computing Approaches for Evolutionary …
435
Fig. 1 General scheme of EA
(v)
Replacement process takes place by replacing the new individuals with least-fitting individuals of the given population.
3 Parallel and Distributed EAs There is a rapid increase in the complexity of real-world problems, due to the rapid development of information sector. To overcome the difficulties arising due to the increased complexity of the problems, the parallel and distributed approaches are adapted to the computing methodologies. Significantly, complex high-dimensional problems can be solved using the distributed approach which follows the mechanism of divide-and-conquer strategy. Inducing the parallelism concept to the distributed architecture increases the cost-effectiveness and reduces the time complexity of the problem. To solve such complex and high-dimensional problems, various popular algorithmic models are added to the domain of EC. The effectiveness of these models varies depending on parameters, viz. migration size, frequency, topology and replacement policy. The island model is one such parallel model for EAs. The details of different algorithmic structure/model available in the literature of parallel and distributed EAs are detailed in the next section.
4 Parallel and Distributed Approaches for EAs There have been several researches works in the field of parallel and distribute EA. The investigation on this domain has been carried out way back in 1999. Since then, the research attempts of the researchers brought different approaches/model to this domain. These approaches are discussed in this section. Generally, high computational speed and load distribution can be achieved by distributing and parallelizing the task among different computing elements. The highly popular parallel and distributed evolutionary models for EAs are [1]—Master– slave model, island model, cellular model, hierarchical model, pool model, coevolution model, and multi-agent model. Another classification (as in [2]]) includes
436
S. Raghul and G. Jeyakumar
the global one-population master–slave model, one-population fine-grained model, multi-population coarse-grained model, and hierarchical model. Global one-population master–slave—Remote memory is used to store the complete population. The reading and writing of data of an individual population are implemented by a slave node which is controlled by the master node. The other alternative is by storing the population in a master node, where the data is sent to the slave nodes and the processing is done and result is sent back to the master node. As higher the number of nodes increases, the necessity of communication between master and slave increases. One-Population Fine-Grained Model—The original population is divided or decomposed into sub-problems by the master node, and the distribution is done in such a way each slave node consisting of one or two individuals from the population. All the nodes are interconnected with the neighboring nodes, and the only difference they possess is that the individuals are present in each node. Multi-Population Coarse-Grained Model—This model works based upon multiple population, where evolutionary process is done asynchronously. Isolation process is followed; hence, no interconnection dependencies happen. Separate EAs is defined in the algorithm to check that each node present will have its own sub-population. The presence of migration occurs due to relative isolation. The population will be totally isolated if migration does not take place. Hierarchical Model—Combination of two or more models. Hierarchy model generally consists of two-layered architecture, where higher layer consists of one multi-population models. Using the above-mentioned parallel and distributed models, various research attempts have been made by the researchers to use these for their choice of EAs. The island-based distributed models were experimented with DE on various benchmarking problems, and the insights are reported in [3]. The extended version of DE called dynamic DE (DDE) was also used to experiment the island-based distributed model on high-dimensional benchmarking problems in [4]. A comprehensive study on mixing different variants of DE algorithms in the distributed framework is presented in [5]. Various studies on elaborating the design parameters and theoretical behavior of distributed frameworks also were added in the literature. A study analyzing the topologies of connecting the nodes in the distributed framework is presented in [6]. Analyzing the diversity changes in the global and sub-populations was carried out in [7]. Interestingly, a mathematical model to demonstrate and analyze the migration process between the sub-populations in the distributed nodes was designed and is presented [8]. Continuing to the trend of carrying out theoretical analysis, analyzing the design parameters and testing the novelty of different parallel and distribute models of EAs enormous research articles are reported in the literature. A selective and novel research attempts of the researchers are discussed in the following section.
Parallel and Distributed Computing Approaches for Evolutionary …
437
4.1 Agent-Based Approach In the domain of distributed artificial intelligence, agent-based computation has been studied recently. This is combined with EAs to increase the performance by enhancing decomposition capabilities of the problem. The authors in [9] discussed how this approach decomposes the real-world problem into set of sub-problems, which is carried out by the slave agents in a concurrent fashion and then the synthesized results are taken care by the master agent. In the first level (top) of the framework, the agents (mater and slaves) are divided into teams which are asynchronous in nature. The team of sub-agents consists of: 1. 2. 3. 4. 5. 6. 7.
Scheduler—to receive and maintain specific information. Constructors—to create initial solution. Improvers—for the agents to replace existing solutions with better solution. Repairers—to repair the in-feasible solutions with certain constrains. Destroyers—to remove redundant and low quality results from the population. Decomposer—to decompose complete vectors into sub-vectors. Synthesizer—to synthesize sub-vectors to a complete vector.
This distributed approach was claimed to perform more effectively where normal DE fails to provide results within the constrained time.
4.2 Hybridization-Based Approach An approach to combine GA with estimation of distributions (EDA) was presented in [10]. In the master–slave topology followed in [10], the master selects the search space and slave performs other relative functions. All the works are processed independently and in parallel. In this approach, the work of the master is divided into four phases which narrows down the areas explored by the slaves progressively, using parallel dynamic K-Means clustering to determine the phase of attracted solution space. Hence, solution quality was improved and the computation time was reduced. This approach was implemented in such a way that at each iteration the slaves run GA independently and the master controls the slaves using EDA. The master node commands the slave where to concentrate for the search space for the upcoming iterations. After clustering the solutions, the master calculates the probabilistic estimation vector p(x) and passes the information to the slaves on ‘where to concentrate’ for further search spaces. Master uses univariate marginal distribution algorithm to find out the probabilistic vectors present in each cluster. By using control mechanism strategy, effective search space was explored. In order to attain parallelism, a collective communicational principle-based approach was proposed in [11]. This approach used Java Message Passing (MPJ) express for communication between the processors. The sum of all available processors was used as the number of generations for evolution. Each processor performs
438
S. Raghul and G. Jeyakumar
the functional steps of the EAs. The authors have used ant colony optimization (ACO) algorithm to generate the new individuals. The population is sorted to replace the weakest members by ACO individuals (best), and the processing continues until the termination condition is reached. Aiming to improve the result, a hybrid approach was tested. It consisted of the combination of GA and ACO. Here every process is doing one third of the work in total. If a process finds a repetition of candidates during computation it applied ACO to get new individuals into the population. At the final step, each process will send its results to the master process where it calculates the optimal solution.
4.3 Asynchronous Master–Slave Approach AMS-DEMO [6] (asynchronous master–slave implementation of DE for multiobjective optimization) was designed for solving problems which are time-intensive. The proposed architecture provides efficient results for both homogeneous and heterogeneous parallel models. The work presented in [12] investigated a test case, which was not solved earlier. The proposed architecture was a pre-defined model where input parameters were passed, and the settings were configured based upon the problem targeted. There were many researches done on asynchronous master– slave parallelization of EAs. They produced good results in heterogeneous computer architectures but no analysis was done on homogenous architecture. Hence, to bring out the hidden formulation an experiment was conducted with the help of DEMO where both homogeneous and heterogeneous systems are used.
4.4 Multi-core and Cluster Approach To attain parallelism in more effective fashion, unified parallel C (UPC) language can be used. It provides a partitioned global address space (PGAS) for parallel programming models. Here the computers are connected as a cluster in a virtual distributed form. It is attained by connecting the distributed computational node with a single logical shared memory [13]. PGAS develops programs with the principle of single program multiple data (SPMD) in which each and every process participates in the distributed computation and has access to the shared global address space and the information about the physical locality (portioning). By this way, memory access patterns were enabled efficiently. Data is shared among all threads which helps to overcome main drawback, because sometimes the communication patterns will lead the threads unnecessarily accessing the non-local parts of the shared memory which can easily occur due to the model’s simplicity. It is one of the popular paradigms used in various high-performance computing (HPC).
Parallel and Distributed Computing Approaches for Evolutionary …
439
4.5 Single Program Multiple Data Approach In single program multiple data (SPMD) approach, a single program is duplicated and sent to multiple nodes for the parallel operation. For the given single part of a program, each node creates its multiple computation environment, where every program will be duplicated and every duplication is disputed by all nodes (slave and master) in the model. In consideration with multiple data parts, the nodes are processed in parallel and the solutions resulted parallel nodes are collected and the best solution is declared as optimal solution. In [14], the author proposes a parallel GA where each node calculates and computes the initial population which is then allocated in parallel and multiple work areas are created. After allocation of initial population, the evolutionary process is started. Parallel GA (PGA) plays vital role in solving nonlinear optimization problems [15]. Generally, PGA is based on open distributed computing paradigm, where a network of processors is involved to fetch the optimized solutions. The large complex problem is divided into smaller low-complex problems which is termed as islands. After completing every iteration of GA, locally generated population with certain percentage in the processors are swapped with other islands. The execution time can be reduced progressively by collaboration of process with many processors. Focusing on and structuring search in the sub-population enhances the optimization results.
4.6 Multi-objective Problem Approaches The strategies suggested to incorporate with the algorithms solving multi-objective problems are: (1) (2)
Using shared storage space with message passing—for each processor to have its own main memory and communicate among them using message passing. Using centralized load balancing—for distribution of loads dynamically.
In [16], the authors proposed a parallel multi-objective EA which consist of steps as follows: 1. 2. 3. 4.
Creating of initial population and evaluation of the candidates. Perform mutation. Divide the population into n cluster, based on the fitness value. Each cluster performs: a. b. c. d. e.
Select the individual from each node. Individuals are converted into gray code. Parents are mutated. Offspring are converted into binary values. Fitness value are calculated.
440
5. 6. 7. 8.
S. Raghul and G. Jeyakumar
Combine the clusters of individuals. Perform migration. Go to Step 4 (if termination condition is not satisfied). Select the best individual.
4.7 Heterogeneous EA Approach Existing EAs often faces two challenges. The first challenge is that the ad-hoc configurations are significantly affected by optimization problems due to its parameter and operators. The second is real-world problems requires long runtime to evaluate the fitness from a population. To overcome these issues, a method was proposed in [17]. A heterogeneous DE with double-layered approach using cloud computing distributed environment was designed. In the first layer, the population with different operators and parameters are made to run concurrently. In the second layer, cloud virtual machines are set to run in parallel to evaluate the fitness of the targeted population. The computational cost was reduced as it was offered by the cloud. In the cloud distributed environments, the fitness functions evaluation can be performed by the cloud infrastructure, to achieve higher computational speed.
4.8 GPU-Based Parallel Approach CUDA is a programming tool which follows single instruction and multiple thread approach. This allows the programmer to create block threads which are independent. For EAs, these threads can be used to represent both the population and individuals. To execute EAs in CUDA architecture the following steps are to be followed [18]— memory allocation in GPU device, copy data from host to GPU memory, host has to invoke the kernel function, execution of code done by GPU, and the results are copied back from GPU to host memory. In order to illustrate the parallelization of the code [18], the sequential block had been reorganized by highlighting the terms of individual numbers. The functional block consists of initialization, fitness evaluation, comparison and update. In sequential implementation, the functional blocks are executed in the host processors. In contrast, in the parallel implementation, the initialization module alone remains running on the host processor. The kernel calls make the GPU to run the whole optimization process including fitness evaluation, comparison, and update with multiple thread architecture (one by one) until the termination conditions are satisfied.
Parallel and Distributed Computing Approaches for Evolutionary …
441
4.9 Co-evolutionary Approach Distributed parallel cooperative co-evolutionary multi-objective evolutionary algorithm (DPCCMOEA) is an MPI-based model for large-scale optimization problems provides efficient result than normal EA, proposed in [19]. Through a series steps of decompositions, DPCCMOEA consistently decomposes a complex problem into simpler sub-problems which are low dimensional in nature. Additionally, a two-layer MPI parallel architecture was used to evolve around the sub-problems. This method significantly reduced the computational time which is an important drawback faced by the traditional EAs. This model works on the principle of using decision variable analysis (DVA) strategy. This strategy is to decompose large number of variables into several groups. Each group is optimized as a sub-population, and they cooperate for global optimization.
4.10 Convergence and Diversity-Based Approach In case of large-scale multi-objective optimization problem parallel EAs provides efficient results. During each iteration generally, the traditional MOEAs (multi-objective evolutionary algorithms) goes through two steps: environmental selection and population evaluation. The second step focuses on balancing convergence and diversity. This is more time-consuming process. Hence, the traditional EAs struggle to break the premature convergence and stagnation issues. The selection often needs to compare the good solution with the combined population. The task of dividing the evolutionary process into a series of sub-processes severely affects the parallelization process. Reference [20] proposed a novel framework to separate the evolutionary process from the selection process removing the dependency formed in between the sub processes. The proposed framework consists of series of sub-population along with local and global archive. When the first batch of sub-population converges, best solutions from each sub-population are sent to the global archive. When all the node becomes idle, a new sub-population will be generated based on the solutions within the global archive. Hence, when a sub-population converges, the solution along with its fitness is sent to the local archive. When the number of solutions present in the local archive meets a threshold, they are to global archive. Based upon the environmental selection, the best solution is obtained from the global achieve. This framework separates the convergence and divergence and allows them happen in parallel. Evolution of each sub-population pursues the convergence, whereas the global archive focuses on the divergence.
442
S. Raghul and G. Jeyakumar
4.11 Competition-Based Approach The competition-based strategy [15] is based upon a principle where periodic performance of each sub-population is evaluated. Firstly, the sub-populations were made to run randomly in turns. At each iteration, the best individuals are compared with each sub-population, the fittest individuals are chosen from the sub-populations. At the second step, sub-population which consist of worst fitness individuals is recorded. During the third step, the stagnation of each generation is checked for all sub-populations. If there is the presence of stagnation in the sub-population, it will be replaced with the invaded sub-population. To achieve the greater computation speed, comprehensive interaction between invading and invaded sub-populations is carried out. The operators named opposition-invasion and cross-invasion were proposed [21]. This is an alternative approach of using DE algorithm where it fails to solve large complexed real-life problems.
4.12 Divide-and-Conquer Approach The idea of divide-and-conquer approach had grabbed attention of researchers across the globe because it is proved to provide better and quality result for many realworld problems. Constructing objective function for sub-problems makes divide-andconquer-based EAs to attain parallelism [22]. Reducing the search iteration makes EAs more effective. There are two major ways—enhancing the search ability by scheduling a local search operator and reducing the dimensionality of the problem using divide-and-conquer. The divide-and-conquer strategy is outlined as follows. 1. 2. 3.
The targeted high-dimensional problem is decomposed into sub-problems with lower dimensions. Each sub-problems are straightened out by an EA. The partial solution of each sub-problem (low dimension) is combined to get the high dimensional complete solution and to provide the end result to the targeted (high-dimensional) problem.
Separately and simultaneously sub-problems can be solved in case if sub-problems do not depend on each other. The word ‘separately’ refers to that each EA runs for a small-scale sub-problem to solve; thus, search operations are reduced considerably. The word ‘simultaneously’ refers using distributed or cloud computing paradigm to solve the sub-problems using multiple processors. The interdependencies among the sub-problems is one of the major problems. To overcome this, the technique of random grouping is suggested. In this, the decision variables are decomposed into equal sized groups in a random manner. After regular intervals of time, these groups are dropped down. Later, it was found out that performance of random grouping was affected by grouping size and grouping frequency.
Parallel and Distributed Computing Approaches for Evolutionary …
443
5 Summary of Survey In present scenario, we are surrounded with many complex large-scale optimization problems. Though EAs are regarded as potential tool for solving such optimization problems, their performance can still be improved utilizing the technological advancement in parallel and distributed computing paradigm [23]. Highdimensional problems which consist of complex constraints and multiple objectives can be handled effectively by agent-based approach. Quality solutions to all such multiple objective optimization problems can be obtained by fine-tuning this approach. Focusing on the theoretical ground of the parallel and distributed frameworks of the EAs would help the researchers to understand the behavior of the algorithm and to tune its performance [24]. In such scenarios, the algorithmic steps to balance the convergence and divergence of the population are of more important. Assigning few nodes in the distributed framework for convergence process and the remaining for the divergence process is one such option for that. Instead of using a single algorithm, hybridizing different algorithms on parallel and distributed environment emanates avenues of utilizing the strength of the participating algorithms to solve complex problems faster [25]. For solving the multiple objective optimization problems, there exist multi-objective versions of the EAs. They are to be considered instead of their classical versions to achieve higher performance through the parallel and distributed frameworks. In recent works, the cloud infrastructure and GPU computation framework are also being used as options for the distributed framework.
6 Conclusions Generally, EAs are heuristic-based approach to solve problems that cannot be easily solved in polynomial time. Still, there is no global method developed to solve all the problems around. In such scenarios, an EA can be used by modifying its structure based upon the nature of the problem. Parallel and distributed versions of EAs are one such approach, which is widely used in large-scale optimization problems. They could attain quality solutions with reduced computational time. This paper presented a comprehensive survey of the state-of-the-art parallel and distributed approach for EAs.
References 1. Yue-Jiao, W.-N., Zhan, Z.-H., Zhang, J., Li, Y., Zhang, Q.: Distributed evolutionary algorithms and their models: a survey of the state-of-the-art. Appl. Soft Comput. 36, 286–300 (2015)
444
S. Raghul and G. Jeyakumar
2. Skorpil, V., Oujezsky, V., Cika, P., Tuleja, M.: Parallel processing of genetic algorithms in python language. In: Proceedings of Photonics and Electromagnetics Research Symposium, pp. 3727–3731 (2019) 3. Jeyakumar, G., Velayutham, C.S.: Distributed mixed variant differential evolution algorithms for unconstrained global optimization. Memetic Comput. 5(4), 275–293 (2013) 4. Jeyakumar, G., Velayutham, C.S.: Distributed heterogeneous mixing of differential and dynamic differential evolution variants for unconstrained global optimization. Soft Comput. 18(10), 1949–1965 (2014) 5. Jeyakumar, G., Velayutham, C.S.: Hybridizing differential evolution variants through heterogeneous mixing in a distributed framework. Hybrid Soft Comput. Approaches Stud. Comput. Intell. 611, 107–151 (2015) 6. Sanu, M., Jeyakumar, G.: Empirical performance analysis of distributed differential evolution varying migration topologies. Int. J. Appl. Eng. Res. 10(5), 11919–11932 (2015) 7. Raghu, R., Jeyakumar, G.: Empirical analysis on the population diversity of the sub-populations in distributed differential evolution algorithm. Int. J. Control Theory Appl. 8(5), 1809–1816 (2016) 8. Raghu, R., Jeyakumar, G.: Mathematical modelling of migration process to measure population diversity of distributed evolutionary algorithms. Indian J. Sci. Technol. 9(31), 1–10 (2016) 9. Zheng, Y., Xu, X., Chen, S., Wang, W.: Distributed agent based cooperative differential evolution: a master-slave model. In: Proceedings of IEEE 2nd International Conference on Cloud Computing and Intelligence Systems, pp. 376–380 (2012) 10. Said, S.M., Nakamura, M.: Parallel enhanced hybrid evolutionary algorithm for continuous function optimization. In: Proceedings of Seventh International Conference on P2P, Parallel, Grid, Cloud and Internet Computing, pp. 125–131 (2012) 11. Abdoun, O., Moumen, Y., Abdoun, F.: Parallel evolutionary computation to solve combinatorial optimization problem. In: Proceedings of International Conference on Electrical and Information Technologies (ICEIT), pp. 1–6 (2017) 12. Depolli, M., Trobec, R., Filipiˇc, B.: Asynchronous master-slave parallelization of differential evolution for multi-objective optimization. Evol. Comput. 21(2), 261–291 (2013) 13. Kromer, P., Platos, J., Snasel, V.: Scalable differential evolution for many-core and clusters in unified parallel C. In: Proceedings of IEEE International Conference on Cybernetics (CYBCO), pp. 180–185 (2013) 14. Lin, C., Liu, J., Yao, H., Chu, C., Yang, C.: Performance evaluation of parallel genetic algorithm using single program multiple data technique. In: Proceedings of 2015 Second International Conference on Trustworthy Systems and Their Applications, pp. 135–140 (2015) 15. Al-Oqaily, A.T., Shakah, G.: Solving non-linear optimization problems using parallel genetic algorithm. In: Proceedings of 2018 8th International Conference on Computer Science and Information Technology (CSIT), pp. 103–106 (2018) 16. Ibrahim, K.: Parallel and distributed genetic algorithm with multiple-objectives to improve and develop of evolutionary algorithm. Int. J. Adv. Comput. Sci. Appl. 7(5), (2016) 17. Zhan, Z.-H., Liu, X.-F., Zhang, H., Yu, Z., Weng, J., Li, Y., Gu, T., Zhang, J.: Cloudde: A heterogeneous differential evolution algorithm and its distributed cloud version. IEEE Trans. Parallel Distrib. Syst. 28(3), 704–716 (2017) 18. Laguna-Sánchez, G.A., Olguín-Carbajal, M., Crut-Cortés, N., Barron Fernández, R., Cadena Martínez, R.: A differential evolution algorithm parallel implementation in a GPU. J. Theor. Appl. Inf. Technol. 86(2), (2016) 19. Cao, B., Zhao, J., Lv, Z., Liu, X.: A distributed parallel cooperative coevolutionary multiobjective evolutionary algorithm for large-scale optimization. IEEE Trans. Ind. Inf. 13(4), 2030–2038 (2017) 20. Chen, H., Zhu, X., Pedrycz, W., Yin, S., Wu, G., Yan, H.: PEA: parallel evolutionary algorithm by separating convergence and diversity for large-scale multi-objective optimization. In Proceedings of 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS), pp. 223–232. Vienna (2018)
Parallel and Distributed Computing Approaches for Evolutionary …
445
21. Ge, Y., Yu, W., Zhan, Z., Zhang, J.: Competition-based distributed differential evolution. In: Proceedings of 2018 IEEE Congress on Evolutionary Computation (CEC), pp. 1–8 (2018) 22. Yang, P., Tang, K., Yao, X.: A parallel divide-and-conquer-based evolutionary algorithm for large-scale optimization. IEEE Access 7, 163105–163118 (2019) 23. Harad, T., Alba, E.: Parallel genetic algorithm—a useful survey. ACM Comput. Surv. (2020) 24. Khaparde, A.R.: Analysis of new distributed differential evolution algorithm with best determination method and species evolution. Procedia Comput. Sci. 167, 263–272 (2020) 25. Shahab, A., Grot, B.: Population-based evolutionary distributed SGD. In: Proceedings of the 2020 Genetic and Evolutionary Computing Conference, pp. 153–154 (2020)
Motion/Force Control for the Constrained Electrically Driven Mobile Manipulators Based on Hybrid Backstepping Control Approach Naveen Kumar and Manju Rani
Abstract This paper designs a hybrid backstepping control approach for the constrained electrically driven mobile manipulator by merging the advantages of the model-based approach and the neural network-based model-free approach and the conventional backstepping control scheme. The backstepping approach provides the strong robustness against the uncertainties. Additionally, an adaptive compensator term is also adopted to diminish the effects of the uncertainties like reconstruction error, bounded external disturbances, and friction terms. Next, stability analysis is done by making use of the adaptation laws and Lyapunov stability theory. The complete system is guaranteed to be asymptotic stable. Simulation tests on twolink electrically driven manipulator demonstrate the efficiency and robustness of the presented control scheme. Keywords Electrically driven · Radial basis function · Backstepping · Asymptotical stable · Constrained manipulators
1 Introduction Mobile manipulators are those robotic arms allied on a movable base, by which manipulators have much workspace thus achieve better in positioning [1–3]. In [4], the authors proposed that mobile manipulators are generally formed by a holonomic and nonholonomic constrained mobile platform and modular manipulator fixed on the platform. There are many examples of practical application of mobile manipulators such as space operating tasks, explosive tasks, and hazardous place exploring N. Kumar (B) Department of Mathematics, National Institute of Technology, Kurukshetra 136119, Haryana, India M. Rani Department of Applied Sciences and Humanities, Panipat Institute of Engineering and Technology, Samalkha, Panipat 132102, Haryana, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_36
447
448
N. Kumar and M. Rani
[5, 6]. For the RLED constrained mobile manipulators, robust and adaptive control schemes were developed in [7–9], and neural network-based control schemes were designed in [10–17]. By making use of the model-based and the model-free approaches, a novel hybrid control schemes were successfully developed for the single mobiles [18, 19]. By making use of the backstepping technique, a control scheme was designed in [20] for single mobile robots. In the last few decade, much work has been done on the backstepping-based control schemes which have been universally applied in many of the application areas [21–27]. The important point of backstepping technique is to choose recursively some suitable functions of state variables as fictitious control inputs for the subsystems of the overall system [21]. Force and motion control problem was successfully resolved for a 3-dof manipulator system by fuzzy neural network adaptive-backstepping technique [22]. For the multiple manipulators, an adaptive fuzzy backstepping control method was applied in [25]. An adaptive sliding mode control algorithm based on the backstepping technique was successfully applied to the tracking control of the wheeled mobile manipulator. By adopting this scheme, the improved global ultimate asymptotic stability and invariability to uncertainties were obtained [27]. From the literature survey, we notice that very less efforts have been made towards a technique that takes the benefits of the model-based and intelligent technique-based approach for the control problem of holonomic/non-holonomic mobile manipulators. The main benefit of adopting the backstepping approach is to furnish the robustness against the uncertainties. From this point of view, this paper proposes a hybrid backstepping control scheme for the position and the force control of electrically driven mobile robots. For the proposed control law, the benefits of the computed torque controller, neural network-based model-free, and the conventional backstepping control method have been considered. Meanwhile, to dimish the effects of uncertainties, an adaptive compensator is also adopted. The stability of the overall controlled system is provided by the Lyapunov theorem. A simulation study is conducted and discussed for the robustness and the efficiency of the proposed control method.
2 System Description The dynamics of h-dof manipulator system are described as M (q)¨q + C(q, q˙ )˙q + F(˙q) + G(q) + Td = B(q)τ + J T λ
(1)
where q = [qυ qb ]T ∈ Rh ; , q˙ , and q¨ are the generalized coordinates, velocity, and the acceleration vector, respectively, such that qb ∈ Rhb representing the mobile base and qυ ∈ Rhυ for the mobile arm. M (q); C(q, q˙ ) ∈ Rh×h , F(˙q), G(q), Td ∈ Rh×1 denoting the inertia matrix, the centripetal-coriolis matrix, the friction term, gravity vector, and unknown bounded disturbances, respectively. τ ∈ Rp , B(q) ∈ Rh×p , and A(qb ) ∈ Rk1 ×hb are denoting the torque input vector, a full rank input matrix, and
Motion/Force Control for the Constrained Electrically Driven Mobile Manipulators . . .
449
full rank kinematic constraint matrix, respectively. After applying m non-holonomic constraint on mobile base as [7] A(qb )q˙b = 0
(2)
C (qb )A (qb ) = 0
(3)
T
T
where C(qb ) ∈ Rhb ×(hb −p) is a a full rank matrix. Equations (2) and (3) provide vectors ˙ such that ϑ and ϑ, ˙ b )ϑ˙ ˙ q¨ b = C(qb )ϑ¨ + C(q q˙ b = C(qb )ϑ;
(4)
where ϑ = [υ qυ ] Next, apply p1 holonomic constraint to the series chain multilink manipulator given by (ϑ) = 0 ∀(ϑ) ∈ Rp1 . Define J ((ϑ)) = ∂/∂ϑ. In view of p1 holonomic constraints, the vector qυ ∈ Rhυ can be again rearranged as qυ = [qυ1 qυ2 ]. By an appropriate partition of qυ , qυ2 is resolved by the vector α = ˙ where [υ T , qυ1T ]T . Finally, ϑ = (α), i.e, ϑ is a function of α so that ϑ˙ = (α)α. (α) = ∂/∂α. Next, we get ϑ¨ = (α)α¨ + (α) ˙ α. ˙ Let us denote J1 (α) = J ((α)). The matrices (α) and J1 (β) satisfying the relation given by T (α)J1T (α) = 0. The reduced dynamic model is given by Mf α¨ + Cf α˙ + Ff + G f + Tdf = T B1 τ
(5)
such that Mf = T M1 , Cf = T C1 , Ff = T F1 , G f = T G 1 , Tdf = T Td 1 . where the matrices M1 , C1 , F1 , G 1 , and T1 are same as in [19]. Let us consider that there are some direct current motors which are selected to provide the control torque inputs whose equation is: (6) Lυ I˙ + Rυ I + Kg Nr T α˙ = U where Lυ , Rυ , and Kg are armature inductance matrix, the resistance matrix, and the back emf constant matrix, respectively. U ∈ Rm , I ∈ Rm , Nr , and I are denoting the voltage control input, the armature current vector, gear ratio for m joints, and the formation matrix, respectively.
2.1 Properties and Assumptions of the Equation of Motion Property 1 Mf is positive definite as well as a symmetric matrix. Property 2 M˙ f − 2Cf is skew-symmetric matrix. Property 3 J T (α) and (α) are uniformly bounded and continuous. Assumption 1 Ff (˙q) ≤ d1 + d2 ˙q, and Tdf ≤ d3 ; dj > 0 (1 ≤ j ≤ 3).
450
N. Kumar and M. Rani
Assumption 2 It is assumed that αd and α˙ d upto the second order and desired Lagrangian multiplier are bounded and uniformly continuous. λυ = £(C1 α˙ + F1 + G 1 + Td 1 − B1 τ ) £=
(J1 (M1 )−1 J1T )−1 J1 (M1 )−1
(7) (8)
3 Controller Design The development of the conventional position-backstepping controller fully depends upon the complete dynamics of system which is not possible in real-life applications. Therefore, whatsoever information is available about the system dynamics should be utilized for the controller design. A hybrid backstepping controller will be designed for the position/force of manipulator system by making use of model-based, modelfree approach, and the conventional backstepping approach. Here, the class of electrically driven mechanical systems consists the manipulator dynamics and the actuator dynamics. Dynamic level yields the dynamic control, taking manipulator subsystem dynamics into account to design a desired current signal Id to ensure that the tracking errors converge to zero levels. Next, actuator level yields actuator control, taking actuator subsystem dynamics into account to design voltage control input in order to enforce I track its desired values, Id and desired reference trajectory αd converge to α, and the desired force multiplier λυd converge to actual force. We define the signals: α˙ r1 = α˙ d − Kα eα ; eα = α − αd ; eλ = λυ − λυd ; eI = I − Id ;where αd : the desired position, α˙ d : desired velocity and Kα : a positive definite matrix. Further, the filtered tracking error is r1 = e˙ α + Kα eα . Let us design the virtual control input I and controller u as follows: I = KI−1 Nr−1 B1−1 τ ; u = T + uυ − J1T ub . where uυ = B1υ Nrυ Klυ Iυ , ub = B1b Nrb Klb Ib , B1 = diag[B1υ , B1b ] such that uυ , Iυ ∈ Rh−p1 −k1 and ub , Ib ∈ Rk1 . Next, we have Mf α¨ + Cf α˙ + Ff + G f + Tdf = B1υ Nrυ Klυ Iυ λυ = £(C1 α˙ + F1 + G 1 + Td 1 −
+T
B1υ Nrυ Klυ Iυ ) + B1b Nrb Klb Ib
(9) (10)
By using filter tracking error, Eq. (9) can be rewritten as Mf r˙1 + Cf r1 = B1υ Nrυ Klυ eI υ + B1υ Nrυ Klυ Iυd − χ (x) − Ff − Tdf
(11)
Dividing the function χ (x) into known and unknown into two parts as: χ(x) ˆ = ˜ f α¨ r1 + C˜ f α˙ r1 + G˜ f . Here, we utilize RBF neural ˆ f α¨ r1 + Cˆ f α˙ r1 + Gˆ f and χ˜ (x) = M M network for the remuneration of the unknown dynamic part as χ(x) ˜ = W T ζ (x) + (x)
(12)
Motion/Force Control for the Constrained Electrically Driven Mobile Manipulators . . .
451
where ζ (x) is the Gaussian type activation function. Put χ˜ (x) from (12) into (11). Mf r˙1 = B1υ Nrυ Klυ eI υ + B1υ Nrυ Klυ Iυd − W T ζ (x) − (x) − χ(x) ˆ − Ff − Tdf − Cf r1
(13)
Next, we have μ = [1 ˙q 1 1][d1 d2 d3 x ]T = E T φ
(14)
where E: known vector function and φ: a parameter vector, respectively. Next, conμˆ 2 r1 . The backstepping technique-based sider the adaptive bound part like: Υ = − μr ˆ 1 +δ control law is selected as B1υ Nrυ Klυ Iυd = χˆ (x) − Kd r1 + Wˆ T ζ (x) − eα + Υ
(15)
B1b Nrb Klb Ibd = λυd − Kf eλ
(16)
Mf r˙1 = −(Kd + Df )r1 − eα − W˜ ζ (x) − Ff − Tdf − (x) − Υ + B1υ Nrυ Klυ eI υ (17) T
where δ(0) > 0: design constant > 0 along with γ > 0 and δ˙ = −γ δ Next, we design the controller at the actuator level. For this, putting eI = I − Id into (10). (18) Lυ e˙ I = −Lυ I˙d − Rυ I − Kg Nr T α˙ + U Lυ e˙ I = −χ1 (x1 ) + U
(19)
˙ Dividing the terms like: I = [Iυ , Ib ]; U = where χ1 (x1 ) = Lυ I˙d + Rυ I + Kg Nr T α. [Uυ , Ub ]; eI = [eI υ , eIb ]T ; Lυ = diag[Lυυ , Lυb ]; Rυ = diag[Rυυ , Rυb ] and Kl = diag[Klυ , Klb ]. Next, design the controller U in such a way that I converge to Id . We apply RBFNN for nonlinear function such that Lυυ e˙ I υ = −W1T ζ1 (x1 ) − 1 (x1 ) + Uυ
(20)
Here, the NN reconstruction error is selected as 1 (x1 ) < x1 for some x1 > 0. Now, we design Uυ as follows and putting the value Uυ , we get Uυ = Wˆ 1T ζ1 (x1 ) − Kd 1υ eI υ − B1υ Nrυ Klυ r1 Lυυ e˙ I υ = −W˜ 1T ζ1 (x1 ) − 1 (x1 ) − Kd 1υ eI υ − B1υ Nrυ Klυ r1
(21) (22)
452
N. Kumar and M. Rani
Fig. 1 Block diagram of the proposed control scheme
4 Stability Analysis Under the designed control laws and the adaptive laws taken as W˙ˆ = −W ζ (x)r1T ; T T ∈ Rz1 ×z1 , W1 = W ∈ W˙ˆ 1 = −W1 ζ1 (x1 )eITυ ; φ˙ˆ = φ Er1 . where W = W 1
Rz2 ×z2 and φ = φT ∈ Rh1 ×h1 are the positive definite matrices, we will show that eα , e˙ α , eλ and eI converges to zero as t → ∞.
Proof 1 Let us consider the Lyapunov function as: X = X1 + X2 , where δ 1 T 1 1 −1 ˜ ˜ + r Mf r1 + tr(W˜ T W W ) + tr(φ˜ T φ−1 φ) 2 1 2 2 γ 1 1 −1 ˜ X2 = eITυ Lυυ eI υ + tr(W˜ 1T W W1 ) 1 2 2 1 ˙˜ + δ˙ −1 ˙˜ X˙ 1 = r1T M˙ f r1 + r1T Mf r˙1 + tr(W˜ T W W ) + tr(φ˜ T φ−1 φ) 2 γ X1 =
(23) (24) (25)
˙ˆ (25) can be rewritten as By using (17) and W˙˜ = −W˙ˆ , φ˙˜ = −φ, X˙ 1 = −r1T Kd r1 − r1T eα − r1T W˜ T ζ (x) + r1T B1υ Nrυ Klυ eI υ − r1T (Ff + Tdf + (x)) −
μˆ 2 r1T r1 ˙ˆ − δ −1 ˙ˆ − tr(W˜ T W W ) − tr(φ˜ T φ−1 φ) μr ˆ 1 + δ
(26)
By using adaptive laws along with adaptive bound part, (26) is represented as: X˙ 1 ≤ −r1T Kd r1 −
δ2 ˆ 1 + δ (E T φ)r
+ r1T B1υ Nrυ Klυ eI υ
(27)
Motion/Force Control for the Constrained Electrically Driven Mobile Manipulators . . .
453
In view of (22) and W˙˜ 1 = −W˙ˆ 1 , (24) can be rewritten as −1 ˙ˆ X˙ 2 = eITυ [−W˜ 1T ζ1 (x1 ) − 1 (x1 ) − Kd 1υ eI υ − B1υ Nrυ Klυ r1 ] − tr(W˜ 1T W W1 ) (28) 1
Now by using 1 (x1 ) < x1 and the weight update law, (28) can be rewritten as X˙ 2 ≤ eITυ x1 − eITυ B1υ Nrυ Klυ r1 − eITυ Kd 1υ eI υ X˙ = X˙ 1 + X˙ 2 ≤ −r1T Kd r1 −
δ
2
ˆ (E T φ)r
1
+δ
(29) − eITυ x1 − eITυ Kd 1υ eI υ
(30)
Since X > 0 and X˙ ≤ 0, therefore, system stability is obtained in Lyapunov’s. Thus, all the states of system: r1 (t), W˜ , W˜ 1 and hence Wˆ and Wˆ 1 are bounded. Therefore, α, α, ˙ α˙ r1 , and α¨ r1 are also bounded, and hence r˙1 and α¨ are bounded. Finally, X˙ goes to zero with t → ∞ by Barbalat’s lemma and thus r1 (t), eα and e˙ α → 0 with t → ∞. Putting the values from (15) and (16) into (10), we have (I + Kf )eλ = T −£(β) + Mf α¨ + B1b Nrb Klb eIb ; where + = ( T )−1 T . Next, in the force space, X1 = 0 and Ib ∈ Rp1 for remaining joints. Therefore, we have Lυb I˙b + Rυb Ib = Ub . T Kd 1b eIb − Since r1 = 0 and eα = 0, therefore, (29) can be rewritten as : X˙ 2 ≤ −eIb T ˙ ˙ eIb x1 . Thus, the uniform continuity of X2 is obtained. X2 goes to zero with t → ∞ by Barbalat’s Lemma. Thus, eIb → 0 as t → ∞. From this, finally eI → 0, i.e., I → Id .
5
Numerical Simulation Studies
In this section, we carry out the validity and effectiveness of the designed control technique by simulation tests over a 2-dof holonomic constrained non-holonomic RLED mobile manipulator. The details of the dynamical model at dynamic level as well as actuator level and parameter matrices are referred from [19]. Let the desired trajectories be Yd = 1.5sin(t), θd = 1.5sin(t) and θ1d = π/4(1 − cos(t)) and the desirable constraint force be λυd = 10N . The trajectory tracking responses of the designed scheme are depicted in Fig. 2. Responses of the position and velocity errors are depicted in Figs. 3 and 4. The torques and the current tracking errors are shown in Figs. 5 and 6, and constraint force response is depicted in Fig. 7. These figures clearly indicate the superiority and effectiveness of the designed control technique.
6 Conclusion A hybrid backstepping effective control scheme has been proposed for the constrained electrically driven mobile robots. The novelty of the paper is to add a conventional backstepping controller with the model-free computed torque controller and
454
N. Kumar and M. Rani
z coordinate [rad]
2
1.5
1
0.5
0 2 1 0 −1
y coordinate [rad]
−2
−1
−2
1
0
2
4
3
x coordinate [m]
Error (rad)
Error (rad)
Error (m)
Fig. 2 Trajectory tracking responses (blue-Actual trajectory; red-Desired trajectory)
(a)
5
Y − Yd
0 −5
0
5
Time (s)
10
(b)
1
15
θ − θd
0 −1
0
5
Time (s)
(c)
0.1
15
10
θ1 − θ1 d
0 −0.1
0
5
Time (s)
10
15
Fig. 3 Position errors. a In Y direction; b Along angle θ; c Along angle θ1
the RBF neural network controller. By the combined scheme, robustness has been provided in the direction of uncertainties and disturbances. By the Lyapunov stability theory, stability analysis has been done by making use of the online adaptation laws. The comparison of the simulation results makes obvious that the presentation of the proposed control scheme has been enhanced. Simulations and robustness verification has been successfully conducted to validate the performance of the designed scheme.
Error (m/sec)
Motion/Force Control for the Constrained Electrically Driven Mobile Manipulators . . .
(a)
10
Y˙ − Y˙d
0 −10
0
5
Time (s)
10
(b)
Error (rad/sec) Error (rad/sec)
455
15
θ˙ − θ˙d
2 0 −2
0
5
Time (s)
10
(c)
0.5
15
θ˙1 − θ˙1 d
0 −0.5
0
5
Time (s)
10
15
Torque (Nm)
Fig. 4 Velocity errors. a In Y direction; b Along angle θ; c Along angle θ1
(a)
5000
τl
0 −5000
0
5
10
15
Torque (Nm)
Time (s)
(b)
500
τr
0 −500
0
5
10
15
Torque (Nm)
Time (s)
(c)
20
τ1
0 −20
0
10
5
Time (s)
Fig. 5 Torques of a Left wheel motor; b Right wheel motor; c First joint motor
15
N. Kumar and M. Rani (a) 2 0 −2
5
0
Ild − Il
15
10
Time (s)
(b) 2 0 −2
0
5
Ird − Ir
10
Time (s)
15
(c) 0.05 0 −0.05
Current (A)
Current (A)
Current (A)
Current (A)
456
5
0
I1 d − I1
15
10
Time (s)
(d) 0.1 0 −0.1
5
0
I2 d − I2
15
10
Time (s)
Fig. 6 Current tracking errors. a Left wheel motor; b Right wheel motor; c First joint motor; d Second joint motor Fig. 7 Responses of the internal forces
120
λv
Constraint Force (N)
100
80
60
40
20
0
0
10
5
Time (s)
15
Motion/Force Control for the Constrained Electrically Driven Mobile Manipulators . . .
457
References 1. Xiao, L., Zhang, Y.: A new performance index for the repetitive motion of mobile manipulators. Int. J. Model. Identification Control 21(2), 193–201 (2014) 2. Asl, A.N., Menhaj, M.B., Sajedin, A.: Control of leader-follower formation and path planning of mobile robots using asexual reproduction optimization (ARO). Appl. Soft Comput. 14, 563–576 (2014) 3. Zhang, Y., Li, W., Liao, B., Guo, D., Peng, C.: Analysis and verification of repetitive motion planning and feedback control for omnidirectional mobile manipulator robotic systems. J. Intell. Rob. Syst. 75, 393–411 (2014) 4. Liu, Y., Li, Y.: Sliding mode adaptive neural-network control for nonholonomic mobile modular manipulators. J. Intell. Rob. Syst. 44(3), 203–224 (2005) 5. Lin, S., Goldenberg, A.A.: Neural-network control of mobile manipulators. IEEE Trans. Neural Netw. 12(5), 1121–1133 (2001) 6. Chen, Y., Liu, L., Zhang, M., Rong, H.: Study on coordinated control and hardware system of a mobile manipulator. In: World Congress on Intelligent Control and Automation (WCICA) (2006). https://doi.org/10.1109/WCICA.2006.1713748 7. Li, Z., Ge, S.S., Adams, M.: Adaptive robust output feedback motion/force control of electrically driven nonholonomic mobile manipulators. IEEE Trans. Control Syst. Technol. 16(6), 1308–1315 (2008) 8. Karry, A., Feki, M.: Adaptive tracking control of a mobile manipulator actuated by DC motors. Int. J. Model. Identification Control 21(2), 193–201 (2014) 9. Park, B.S., Park, J.B., Choi, Y.H.: Robust formation control of electrically driven nonholonomic mobile robots via sliding mode technique. Int. J. Control Autom. Syst. 9(5), 888–894 (2011) 10. Guangxin, H., Yanhui, Z.: Trajectory tracking control of nonholonomic wheeled mobile robots with actuator dynamic being considered. Adv. Mater. Res. 433–440, 2596–2601 (2012) 11. Zhu, Y., Fan, L.: On robust hybrid force/motion control strategies based on actuator dynamics for nonholonomic mobile manipulators. J. Appl. Math. 1–19 (2012) 12. Boukens, M., Boukabou, A., Chadli, M.: Robust adaptive neural network based trajectory tracking control approach for nonholonomic electrically driven mobile robots. Robot. Auton. Syst. 92, 30–40 (2017) 13. Sinaeefar, Z., Farrokhi, M.: Adaptive fuzzy model-based predictive control of nonholonomic wheeled mobile robots including actuator dynamics. Int. J. Sci. Eng. Res. 3(9), 1–7 (2012) 14. Cheng, L., Hou, Z.G., Tan, M.: Adaptive neural network tracking control for manipulators with uncertain kinematics, dynamics and actuator model. Automatica 45(10), 2312–2318 (2009) 15. Rajneesh, K.P., Santosh, K.R., Krishna, J.: Crop monitoring using IoT: a neural network approach. Soft Comput.: Theories Appl. 742, 123–132 (2018) 16. Vaishali, J., Rajendra, S.K.R., Ranjeet, S.T.: Named data network using trust function for securing vehicular Ad Hoc network. Soft Comput.: Theories Appl. 742, 463–471 (2018) 17. Suman, P., Malay, K.P.: A jitter-minimized stochastic real-time packet scheduler for intelligent routers. Soft Comput.: Theories Appl. 742, 547–556 (2018) 18. Rani, M., Kumar, N., Singh, H.P.: Efficient position/force control of constrained mobile manipulators. Int. J. Dyn. Control 6, 1629–1638 (2018) 19. Rani, M., Kumar, N., Singh, H.P.: Force/motion control of constrained mobile manipulators including actuator dynamics. Int. J. Dyn. Control 7, 940–954 (2019) 20. Rani, M., Dinanath, Kumar, N.: A new hybrid back stepping approach for the position/force control of mobile manipulators. In: International Conference on Next Generation Computing Technologies (NGCT), in Communications in Computer and Information Science (CCIS) Series, vol. 922, pp. 183–198 (2019) 21. Chiu, C.H., Peng, Y.F., Lin, Y.W.: Intelligent backstepping control for wheeled inverted pendulum. Expert Syst. Appl. 38(4), 33643371 (2011) 22. Mai, T.L., Wang, Y.: Adaptive-backstepping force/motion control for mobile-manipulator robot based on fuzzy CMAC neural networks. Control Theory Technol. 12(4), 368–382 (2014)
458
N. Kumar and M. Rani
23. Chen, N., Song, F., Li, G., Sun, X., Ai, C.: An adaptive sliding mode backstepping control for the mobile manipulator with nonholonomic constraints. Commun. Nonlinear Sci. Numer. Simul. 18, 2885–2899 (2013) 24. Cheng, M.B., Tsai, C.C.: Robust backstepping tracking control using hybrid sliding-mode neural network for a nonholonomic mobile manipulator with dual arms. In: Proceedings of the 44th IEEE Conference on Decision and Control, and the European Control Conference. pp. 1964–1969 (2005) 25. Lin, F., Chang, C., Huang, P.: Fpga-based adaptive backstepping slidingmode control for linear induction motor drive. IEEE Trans. Power Electron. 22(4), 1222–1231 (2007) 26. Chen, C.L., Peng, C.C., Yau, H.T.: High-order sliding mode controller with backstepping design for aeroelastic systems. Commun. Nonlinear Sci. Numer. Simul. 17(4), 1813–1823 (2012) 27. Chen, N., Song, F., Li, G., Sun, X., Ai, C.: An adaptive sliding mode backstepping control for the mobile manipulator with nonholonomic constraints. Commun. Nonlinear Sci. Numer. Simul. 18(10), 2885–2899 (2013)
Mathematical Interpretation of Fuzzy Information Model Bazila Qayoom and M. A. K. Baig
Abstract One of the important attributes of human thinking and reasoning is fuzziness or vagueness, which mostly arises due to imprecise information. To tackle such kinds of situations, the fuzzy theory came into existence. Keeping in consideration the instances of imprecise data and related situations, we have developed a new generalized two-parametric fuzzy entropy measure that is presented in this paper. A detailed proof of the properties of the new fuzzy entropy model is also discussed in this paper. Further, a deep mathematical evaluation of all the well-known axioms for fuzziness measures is carried out in this research paper. Keywords Fuzzy entropy · Shannon entropy · Sharpness · Resolution
1 Introduction Information theory emerged as a new branch of mathematics in 1940s. With due course of time, information theory was made mathematically rigorous and widely applicable in almost every field. Information theory deals with the system problems like information processing, information storage, information retrieval, and decision making. Basically, the studies in this particular field were undertaken by Nyquist [12] in 1924 and 1928 [13] and by Hartley [6] in 1928, who identified the logarithmic nature of the measure of information. The research paper of C.E. Shannon [18] (1948) entitled ‘A Mathematical Theory of Communication’ was a landmark in the field of information theory. This paper discussed the properties of information sources and of communication channels used for the transmission of outputs of these sources. The past few decades have witnessed a remarkable growth in the literature of information theory and its applications apart from communication theory and it is applied in physical, biological, social, and chemical sciences. Uncertainty and fuzziness are basic elements of the human perspective and of many real-world objectives. The fuzzy set theory was proposed by Lotfi A. Zadeh B. Qayoom (B) · M. A. K. Baig University of Kashmir, Srinagar 190006, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_37
459
460
B. Qayoom and M. A. K. Baig
[19] in the year 1965, further in 1968 [20]; this fuzzy set theory gained a lot of importance in many fields such as pattern recognition, image processing, feature selection, fuzzy aircraft control, and bioinformatics. The history of generalizations in fuzzy entropy is not a recent one, first of all Renyi [17] came up with the generalized version of Shannon’s [18] measure. In fuzzy entropy, first of all De Luca and Termini [11] came up with the axiomatic structure of the measure of fuzzy entropy corresponding to Shannon [18] entropy measure in the year 1972. After that the same path was followed by Kaufmann [8], Ebanks [5], Kosko [10], Kapur [7], Klir, Clair, Boyuan [9]. In the recent past, the generalized entropy measures with their noiseless coding theorems were studied by Bajaj, Hooda [3], Om Parkash and P.K Sharma [15], Baig and Dar [2], H.D Arora, Anjali Dhiman [1], Bhat and Baig [4], S. Peerzada, S. M. Sofi and R. Nisa [16], Ohlan. A. [14]. All the authors mentioned above made a good contribution and discussed the various properties of their generalized fuzzy measures.
2 Fuzzy Generalized Information Measure We define a new generalized fuzzy information measure as; Hαβ (A)
n βα α β μ A (xi ) + (1 − μ A (xi ) β − 1 ; 0 ≤ α < β ≤ 1, beta > α. = (1 − α) i=1
The new fuzzy measure satisfies the basic properties given below as;
2.1 Sharpness Hαβ (A) is minimum if and only if A is a crisp set, i.e., Hαβ (A) = 0 if and only if μ A (xi ) = 0 or 1∀i = 1, 2, 3, ..., n. Proof Let Hαβ (A) = 0. n βα α β ⇒ Hαβ (A) = μ A (xi ) + (1 − μ A (xi ) β − 1 = 0 (1 − α) i=1 n βα α μ A (xi ) + (1 − μ A (xi ) β − 1 = 0 ⇒ i=1
n βα α β ⇒ μ A (xi ) + (1 − μ A (xi ) =1 i=1
Mathematical Interpretation of Fuzzy Information Model
461
Or n βα α β μ A (xi ) + (1 − μ A (xi ) =n i=1
. Conversely suppose, n βα α β μ A (xi ) + (1 − μ A (xi ) =1 i=1
Or
n
α β
μ A (xi ) + (1 − μ A (xi )
α β
−1 =0
i=1
. Multiplying both sides by
β (1−α)
n βα α β μ A (xi ) + (1 − μ A (xi ) β − 1 = 0 (1 − α) i=1 ⇒ Hαβ (A) = 0
2.2 Maximality Hαβ (A)
n βα α β β μ A (xi ) + (1 − μ A (xi ) − 1 = (1 − α) i=1
Differentiating above equation with respect to μ A (xi )
n α α α ∂ Hαβ (A) β −1 β −1 = (xi ) − (1 − μ A (xi )) μA ∂μ A (xi ) (1 − α) i=1 When 0 < μ A (xi ) < 0.5, then ∂ Hαβ (A) > 0, ∀0 < α < 1, 0 < β ≤ 1, β > α. ∂μ A (xi ) ⇒ Hαβ (A) is an increasing function of μ A (xi ) whenever 0 < μ A (xi ) < 0.5.
462
B. Qayoom and M. A. K. Baig
When 0.5 < μ A (xi ) ≤ 1, then ∂ Hαβ (A) < 0, ∀0 < α < 1, 0 < β ≤ 1, β > α. ∂μ A (xi ) ⇒ Hαβ (A) is a decreasing function of μ A (xi ) whenever 0.5 < μ A (xi ) ≤ 1. For μ A (xi ) = 0.5 ∂ Hαβ (A) = 0, ∀0 < α < 1, 0 < β ≤ 1, β > α. ∂μ A (xi )
2.3 Resolution Hαβ (A) ≥ Hαβ (A∗ ), Where A∗ is the sharp version of A. Proof As we know Hαβ (A) is an increasing function of μ A (xi ) whenever, 0 < μ A (xi ) ≤ 0.5 and a decreasing function of μ A (xi ) whenever,0.5 < μ A (xi ) ≤ 1, so we have. μ A∗ (xi ) ≤ μ A (xi )
⇒ Hαβ (A∗ ) ≤ Hαβ (A), in[0, 0.5),
Andμ A∗ (xi ) ≥ μ A (xi ).
⇒ Hαβ (A∗ ) ≤ Hαβ (A), in(0, 0.5]. The above equations conclude that Hαβ (A) ≥ Hαβ (A∗ ).
Mathematical Interpretation of Fuzzy Information Model
463
β
Table 1 Behavior of Hα (A), when μ A (xi ) = 1 and μ A (xi ) = 0 with respect to α and β β
β
xi
α
β
μ A (xi )
Hα (A)
μ A (xi )
Hα (A)
1
0.2
0.4
0
0
1
0
2
0.3
0.2
0
0
1
0
3
0.4
0.4
0
0
1
0
4
0.5
0.7
0
0
1
0
2.4 Symmetry Hαβ (A) = Hαβ (Ac ), where Ac is the complement of A. Proof As we know μ A (xi ) = 1 − μ A (xi )∀xi ∈ X, in the definition of Hαβ (A); therefore, we conclude that Hαβ (A) = Hαβ (Ac ). As the new generalized fuzzy measure satisfies all the basic properties of fuzzy entropy. Hence, we can say that the new measure Hαβ (A) is a valid measure. Now, we will illustrate numerically that the proposed measure is a valid measure.
3 Mathematical Illustration 3.1 Sharpness See Table 1. The above table shows us that Hαβ (A) is minimum (i.e.,Hαβ (A) = 0) iff A is a crisp set, i.e., when (μ A (xi ) = 0 or μ A (xi ) = 1).
3.2 Maximality When 0 < μ A (xi ) < 0.5 and with different values of α and β the behavior of Hαβ (A) is shown in Table 2. β (A) > The above table clearly shows that Hαβ (A) is an increasing function, i.e., ( ∂∂μHαA (x i) 0) of μ A (xi ) whenever 0 < μ A (xi ) < 0.5. Now, when 0.5 < μ A (xi ) ≤ 1 and with different values of α and β, we have (Table 3). β (A) < 0) of The above table shows that Hαβ (A) is a decreasing function, i.e., ( ∂∂μHαA (x i) μ A (xi ) whenever 0.5 < μ A (xi ) < 1.
464
B. Qayoom and M. A. K. Baig β
Table 2 Behavior of Hα (A), when 0 < μ A (xi ) < 0.5 β
xi
α
β
μ A (xi )
∂ Hα (A) ∂μ A (xi )
1
0.3
1
0
Infinity
2
0.16
1.478644
3
0.34
4
0.46
β
Table 3 Behavior of Hα (A) when 0.5 < μ A (xi ) < 1 β
xi
α
β
μ A (xi )
∂ Hα (A) ∂μ A (xi )
1
0.3
0.4
0.67
− 1.86626
2
0.74
3
0.82
4
1.0
− Infinity
3.3 Resolution The resolution property satisfied by the new generalized measure is mathematically illustrated in Table 4. Above table clearly shows that Hαβ (A∗ ) ≤ Hαβ (A) whenever μ A∗ (xi ) ≤ μ A (xi ) in [0, 0.5) (Table 5). Table shows that Hαβ (A∗ ) ≤ Hαβ (A) whenever μ A∗ (xi ) ≥ μ A (xi ) in (0.5, 1]. Table 4 At [0, 0.5) and with μ A∗ (xi ) ≤ μ A (xi ) β
xi
α
β
μ A (xi )
Hα (A)
1
0.4
0.8
0.06
5.684532
μ A∗ (xi )
β
Hα (A∗ ) 5.017255
2
0.14
0.02
3
0.23
0.12
4
0.32
0.21
Table 5 At (0.5, 1] and with μ A∗ (xi ) ≥ μ A (xi ) β
β
xi
α
β
μ A (xi )
Hα (A)
μ A∗ (xi )
Hα (A∗ )
1
0.4
0.8
0.63
5.649296
0.69
5.338854
2
0.79
0.80
3
0.82
0.89
4
0.97
1.0
Mathematical Interpretation of Fuzzy Information Model
465
Table 6 . β
β
xi
α
β
μ A (xi )
Hα (A)
1 − μ A (xi )
Hα (Ac )
1
0.4
0.8
0.62
5.756684
0.38
5.756684
2
0.74
0.26
3
0.86
0.14
4
0.92
0.08
Above two tables convey that Hαβ (A∗ ) ≤ Hαβ (A) where A∗ is the sharpened version of A.
3.4 Symmetry See Table 6. The table above conveys that Hαβ (A) = Hαβ (Ac ), where Ac is the compliment of A.
4 Conclusion In this paper, a new two-parametric generalized fuzzy measure of entropy is proposed. The newly developed fuzzy measure of entropy is a valid measure and satisfies all the important axioms. In this paper, a detailed proof of the properties is presented. The validity of the newly developed measure is checked on the various values given to the parameters of the measure. This paper includes a deep mathematical interpretation of the basic properties, i.e., sharpness, maximality, resolution, and symmetry.
5 Future Endeavors In terms of applicability, the fuzzy theory is a vast area of research. From complex human thinking to daily usable machines and appliances, fuzzy theory finds its usage everywhere. The results obtained from this part of the research encourage us to apply the new model in the upcoming research works. The possible areas where this generalized fuzzy entropy model is intended to be applied are multi-criterion decision making, multi-attribute decision making, etc., for evaluating real-life problems.
466
B. Qayoom and M. A. K. Baig
References 1. Arora, H.D., Dhiman, A.: Application of fuzzy information measure to coding theory. Int. J. Adv. Technol. Eng. Sci. 2, 678–687 (2014) 2. Baig, M.A.K., Dar, M.J.: Some coding theorems on fuzzy entropy function depending upon parameter R and V. IOSR J. Math. 9:119–123 3. Bajaj, R.K., Hooda, D.S.: On some new generalized measures of fuzzy information. World Acad. Sci. Eng. Technol. 62, 747–753 (2010) 4. Bhat, A.H., Baig, M.A.K.: Some coding theorems on new generalized fuzzy entropy of order alpha and type beta. Appl. Math. Inf. Sci. Lett. 5, 63–69 (2017) 5. Ebanks, B.R.: On measures of fuzziness and their representations. J. Math. Anal. Appl. 94, 24–37 (1983) 6. Hartley, R.T.V.: Transmission of information. Bell Syst. Tech. J. 7, 535–563 (1928) 7. Kapur, J.N.: Generalized entropy of order α and type β. In: Maths Seminar, Delhi, vol. 4, pp. 78–94 (1967) 8. Kaufmann, A.: Fuzzy Subsets; Fundamental Theoretical Elements, 3rd edn. Academic , New Delhi (1980) 9. Klir, G., Boyuan, U.C.: Fuzzy set theory foundations and applications. Prentice Hall (1988) 10. Kosko, B.: Fuzzy entropy and conditioning. Inf. Sci. 40, 165–174 (1986) 11. De Luca, A., Termini, S.: A definition of non-probabilistic entropy in the setting of fuzzy set theory. Inf. Control 20, 301–312 (1972) 12. Nyquist, H.: Certain factors affecting telegraph speed. Bell Syst. Tech. J. 3, 324–346 (1924) 13. Nyquist, H.: Certain topics in telegraphy transmission theory. J. Am. Inst. Electr. Eng. 47, 617–619 (1928) 14. Ohlan, A., Ohlan, R.: Generalizations of Fuzzy Information Measures. Springer International Publishing, Berlin (2016) 15. Parkash, O., Sharma, P.K.: A new class of fuzzy coding theorems. Caribb. J. Math. Comput. Sci. 12, 1–10 (2002) 16. Peerzada, S., Sofi, S.M., Nisa, R.: A new generalized fuzzy information measure and its properties. Int. J. Adv. Res. Sci. Eng. 6, 1647–1654 (2017) 17. Renyi, A.: On measure of entropy and information. In: Proceedings of 4th Berkeley Symposium on Mathematics, Statistics and Probability, vol. 1, pp. 547–561 (1961) 18. Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27, 423–467 (1948) 19. Zadeh, L.A.: Fuzzy sets. Inf. Control 8, 338–353 (1965) 20. Zadeh, L.A.: Probability measures of fuzzy events. J. Math. Anal. Appl. 23, 421–427 (1968)
Methodological Development for Time-Dependent AHP Using Probability Distribution Arpan Garg and Talari Ganesh
Abstract The analytical hierarchy process (AHP) is an effective tool to deal with multi-criteria decision-making problems. In order to resolve many real-time issues, AHP has seen a number of developments over a period of time. In this work, the authors have proposed a systematic procedure to generalize Saaty’s notion of dynamic AHP using the statistical distributions. The procedure leads to a time-dependent pairwise comparison matrix of finite but possibly large order. The authors find it helpful to plot a graphical representation of the priority change over a period of time to conduct sensitivity analysis and to measure the significant time for a initial decision. Keywords Analytical hierarchy process (AHP) · Garg–Ganesh methodology · Decision theory · Uniform distribution
1 Introduction From the very beginning of mankind, humans had to make decisions for survival. Some of them are simple, which are easily predictable by pairwise comparison (PC). As of now, we have seen a variety of optimization problems from various fields along with their solutions [1–3]. But, for complex decision-making problems, a much efficient procedure was required. In this connection, Saaty [4–7] introduced a multi-criteria decision-making method (MCDM) known as the analytic hierarchy process (AHP), which combines a hierarchy together with PC to provide a set of preference values for a finite number of alternatives competing under pre-assumed criteria using the eigenvector (EV) method to the PC of alternatives. AHP is highly sensitive to pairwise comparison matrix (PCM) at any level of the hierarchy, and PCM is a quantified formulation of human responses based on their understanding, which may lead to an inherent variation in the estimated preference A. Garg (B) · T. Ganesh Department of Mathematics & Scientific Computing, National Institute of Technology Hamirpur, Himachal Pradesh 177005, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_38
467
468
A. Garg and T. Ganesh
vector. So, to ensure the preciseness of vector, many methods were then developed. In this connection, the simple geometric mean method (SGMM or GMM) was driven by Crawford and Williams [8]. Choo and Wedly [9] studied 18 different methods of estimating preference vector (PV) and found SGMM as the most effective among them. We have identified that in order to obtain a reliable outcome, many analysis studies [10, 11] have involved statistical tools. During a review of the literature, the author observes, in the introduction section, Crawford and Williams [8] recalls Saaty’s [4] arguments to conclude: The slight perturbation in entries of PCM results in a slightly different preference vector (PV). A ranking of alternatives requires a highly precise PV, and slight variations in PV may disturb it and the final decision. Practically, the varying nature of an alternative is one of the several reasons, which are possibly accountable for such variations in PV and corresponding sudden change of decision. At first, Saaty [7] highlighted this case. And, this inspires the author to develop a process, which is capable of measuring sensitivity and relevance time of a decision before it changes. The term relevance time for a decision has an indication about the study of perturbation from current estimated PV to its future estimation, and future estimation of PV demands the induction of well-developed statistical distribution theory to AHP procedure.
1.1 Research Gap and Focus of the Paper As per the author’s knowledge and literature review done, in many real-time decisionmaking problems, the author observes the lack of attention has been given to the significant relevance time of the decision based on final preference vector over the preciseness of vector and to the sensitivity analysis of a decision in AHP. This study is more essential if the nature of an alternative experiences a backward or forward updates with time under the prefixed criteria, which results as changes in the PC matrix, the preference vector(PV), and the decision with time, but existing AHP does not properly counter this possibility. To deal with such an extreme scenario, the author proposed the Garg–Ganesh (GG) process of expressing final preference vector in terms of distribution function followed by the nature of the alternatives that can also be used as sensitivity analysis tool for AHP and to study the changing pattern of PV.
1.2 Article Organization The preliminaries including an introduction to pairwise comparison, consistency check, and geometric mean method (GMM) are inducted in Sect. 2. The proposed Garg–Ganesh (GG) process involving initial and perturbed PC matrix, the timedependent functional formation of pairwise comparison (PC), preference variation, PC preferences, combined PC matrix, and normalized preference vector (PV) using
Methodological Development for Time-Dependent...
469
statistical distributions is explained in Sect. 3. GG process is then applied over a numerical example, and the changing patterns of PV components have been shown graphically in Sect. 4. In Sect. 5, the author has suggested using this procedure for the sensitive analysis of AHP and to calculate the effective and relevant time of a decision and mentioned its scientific limitations.
2 Preliminaries Definition 1 Matrix P = (pij )l×l is said to be positive reciprocal if pij > 0, pii = 1 and pij = 1/pji for all i, j ∈ {1, 2, ..., l}.
2.1 The Pairwise Comparison The PC is an efficient procedure that assigns preference values to the alternatives. In general, for a set of l no of alternatives, say {P1 , P2 , P3 , . . . , Pl }, each alternative is compared with all the others, and experts provide a positive real value pij for comparison between ith and jth alternative for all i, j ∈ {1, 2, ..., l}. A convenient way of representing all such comparisons is a matrix formation, which is known as a pairwise comparison (PC) matrix P = (pij )l×l , where pij ∈ R+ for all i, j ∈ {1, 2, ..., l}. It is easy to understand that a PC matrix is always a positive reciprocal matrix, together with each diagonal entry equal to 1. Since for any pij = 3 implies that an expert finds the ith alternative to be three times more preferred than the jth one, which is also sufficient to say that the jth alternative is three times less preferred than the ith one, so the expert assigns 1/3 value to pij . Moreover, every alternative is equally preferred when compared to itself; i.e., pii = 1 for all i, j ∈ {1, 2, ..., l}. In real time, an expert may feel the superiority of some alternative over others, which can lead to a lesser or higher numeric value when PC takes place. Hence, it is a tedious task to mention suitable bounds for pij ; in this connection, founder of AHP, Saaty[4] proposes a discrete scale where pij ∈ {1/9, 1/8, . . . , 1/2, 1, 2, . . . , 8, 9}. Many other scales [12–14] have also been suggested. For this article, we shall use Saaty’s fundamental scale [4].
2.2 Consistency Irregularities in responses of PC make it difficult to construct a reliable PC matrix for AHP. This can be seen in such an example, where someone prefers L over M and M over N; now, in order to maintain consistency, they should prefer L over N; however, choice of N over L will result in inconsistency. Mathematically,
470
A. Garg and T. Ganesh
Table 1 Random index values [5] l 1 2 3 RI value
0
0
0.58
4
5
6
7
8
9
0.9
1.12
1.24
1.32
1.41
1.45
A PC matrix P = (pij ) is said to be perfectly consistent if pij .pjk = pik for all i, j, k ∈ {1, 2, . . . , l } and i = j, j = k, i = k. AHP has the ability to measure inconsistency of responses through the consistency ratio (CR), and for l number of alternatives, Saaty [4] has defined consistency index(CI) and CR as follows: λmax − l l−1 CI Consistency ratio (CR) = RI
Consistency index (CI) =
where λmax is the greatest eigenvalue of the PCM; the consistency of responses is measured by comparing CI to RI. The random consistency index (RI), a value reflecting random judgement generated by Saaty [5], depending upon the number of alternatives, is shown in Table 1. According to Saaty, CR value must be less than 0.10.
2.3 Preference Vector (PV) Using Geometric Mean Method (GMM) For a PC matrix P = [pij ], i, j ∈ {1, 2, . . . , l} crawford [8] defined the PV as
( j pij (t))1/l ( j plj (t))1/l ( j p1j (t))1/l ,..., ,..., 1/l 1/l 1/l i ( j pij (t)) i ( j pij (t)) i ( j pij (t))
where pi =
l
[pij (t)]1/l , i ∈ {1, 2, . . . , l}
j=1
is geometric mean value for ith row.
Methodological Development for Time-Dependent...
471
3 Methodology In order to achieve there objective, Garg–Ganesh propose to use two PC matrices with tolerable inconsistency 0.10, out of which, one is an initial PC matrix and the other one represents the perturbation to PC matrix caused by proposed changes in nature of an alternative at a timelapse. Consider P1 = (xij )l×l as initial PC matrix and P2 = (yij )l×l as pertubed PC matrix for proposed changes in i = i0 th alternative of P1 = (x ij )l×l . Here, xi0 j − yi0 j represents the final variation for (i0 , j)th PC due to the change experienced by i0 th alternative for all the values of j = 1, 2, . . . , l. This variation has been presented as a strictly increasing functional expression of time with the help of probability function (PDF or PMF) say f followed by changes in nature of alternative as t pi0 j (t) = xi0 j − yi0 j . − inf f (u)du for PDF (1) = xi0 j − yi0 j . t0 f (t) for PMF for j ∈ {1, 2, . . . , l}. By using expression 1, define the corresponding time-variant PC value functions as : (2) pi0 j (t) = xi0 j + ki0 j .pi0 j (t) where
ki0 j = 1 if xi0 j ≤ yi0 j = −1 if xi0 j > yi0 j
is a directional step function. The developement of P2 = (yij )l×l is done in such a way that P2 may be nonidentical to P1 because of the row and column corresponding to i0 th alternative in P1 . However, the PC values, which do not lie in i0 th row and column, remain invariant; i.e., xij = yij for i = i0 and j = i0 . Also, to apply any scientific procedure of AHP, a PC matrix is required which is always reciprocal. Hence, the required time-variant PC matrix can be formed using expressions 1 and 2 for i0 th row, invariant PC values; i.e., pij = xij = yij for i = i0 and j = i0 , along with reciprocal property pij .pji = 1 for i, j ∈ {1, 2, .., l}, is defined as ⎡
⎤ p12 · · · 1/pi0 1 (t) · · · p1l ⎢ p22 · · · 1/pi0 2 (t) · · · p2l ⎥ ⎢ ⎥ ⎢ .. . . .. .. ⎥ .. ⎢ ⎥ . . . . . ⎥ P(t) = ⎢ ⎢pi0 1 (t) pi0 2 (t) · · · pi0 i0 (t) · · · pi0 l (t)⎥ ⎢ ⎥ ⎢ . .. . . .. .. ⎥ .. ⎣ .. . . . ⎦ . . pl2 · · · 1/pi0 l (t) · · · pll pl1 p11 p21 .. .
Now, from GMM [8], the geometric mean expression for ith row is defined as
472
A. Garg and T. Ganesh
pi =
l
[pij (t)]1/l , i ∈ {1, 2, . . . , l}
(3)
j=1
and the required PV [p1 , . . . , pi , . . . , pl ], as a normalized time variant PV, is defined as ( j p1j (t))1/l ( j pi0 j (t))1/l ( j plj (t))1/l ,..., ,..., (4) 1/l 1/l 1/l i ( j pij (t)) i ( j pij (t)) i ( j pij (t))
4 Numerical Example This paper aims to show the effect of variation in some criteria of an alternative, which may disturb the importance and relevance of the current best decision with time. In this example, one has to choose the most suitable mobile for them out of commonly popular in India such as Redmi Note 7 Pro (launch date: March 13, 2019), MI A3 (August 23, 2019), Redmi Note 8 Pro (October 21, 2019), and Redmi Note 9 Pro (March 17, 2020) of a China-based brand “Xiaomi.” Firstly, a comprehensive explanation about specifications of the product is delivered, and the responses from seven brand lovers are aggregated as PC matrix P1 : ⎡ Redmi Note 8 Pro 1 Redmi Note 9 Pro⎢ 1/2 ⎢ ⎣ MI A3 1/4 Redmi Note 7 Pro 1/3 Redmi Note 8 Pro
Redmi Note 9 Pro MI A3 Redmi Note 7 Pro
2 1 1/5 1/3
4 5 1 1/2
3 3 2 1
⎤ ⎥ ⎥ ⎦
Here, l = number of alternatives = 4 and λmax (P1 ) = 4.2383; i.e., CIP1 = 4.2383−4 = 4−1 = 0.0882 < 0.1. This consistency check shows that P is 0.0794 and CRP1 = 0.0794 1 .9 tolerably consistent. In order to apply the proposed methodology, a 10 percent commercial prize drop which is likely to happen quarterly in a year, in Redmi Note 9 Pro, has been offered, and the responses of same brand lovers are then aggregated as perturbed PC matrix P2 : ⎡ Redmi Note 8 Pro Redmi Note 9 Pro MI A3 1 1 4 Redmi Note 9 Pro⎢ 1 1 7 ⎢ ⎣ MI A3 1/4 1/7 1 Redmi Note 7 Pro 1/3 1/5 1/2
Redmi Note 8 Pro
Redmi Note 7 Pro
3 5 2 1
⎤ ⎥ ⎥ ⎦
Here, l = 4 and λmax (P2 ) = 4.1651; i.e., CIP2 = 4.1651−4 = 0.0551 and CRP2 = 4−1 0.0551 = 0.0612 < 0.1. This consistency check shows that P2 is tolerably consistent. .9
Methodological Development for Time-Dependent...
473
Also, the occurrence of the commercial offer follows uniform distribution defined as
1 f2 (t) = 10 if 0 ≤ t ≤ 10, = 0 otherwise,
where a quarter is divided into 10 unit of time for computational ease. Now, the formulation of required time variant PC matrix is done as follows: With i0 = 2, Equation 1 changes to Δp2j (t) = x2j − y2j .
t − inf
f (u)du, j = 1, 2, 3, 4.
(5)
t 1 1 t ⇒ p21 (t) = − 1 . du = , p22 (t) = 0 2 20 0 10 and
t
p23 (t) = |5 − 7| . 0
t 1 du = , p24 (t) = |3 − 5| . 10 5
t 0
t 1 du = and 10 5
for i0 = 2, Equations 2 and 3 change to p2j (t) = x2j + k2j .p2j (t), j = 1, 2, 3, 4 where
k2j = 1 if x2j ≤ y2j , j = 1, 2, 3, 4. = −1 if x2j > y2j
Equations 7, 8, and p2j (t) for j = 1, 2, 3, 4 give that p21 (t) = and p23 (t) = 5 + 1.
t (t + 10) 1 + 1. = , p22 (t) = 1 2 20 20
(t + 25) (t + 15) t t = , p24 (t) = 3 + 1. = 5 5 5 5
with p12 (t) =
20 1 = , p22 (t) = 1 p21 (t) (t + 10)
and p32 (t) =
1 1 5 5 = , p42 (t) = = , p23 (t) (t + 25) p24 (t) (t + 15)
(6)
(7)
474
A. Garg and T. Ganesh
and then, the required time-variant PC matrix P(t) is ⎡ Redmi Note 8 Pro 1 ⎢ (t+10) Redmi Note 9 Pro⎢ 20 ⎢ MI A3 1/4 ⎣ Redmi Note 7 Pro 1/3 Redmi Note 8 Pro
Redmi Note 9 Pro MI A3 Redmi Note 7 Pro ⎤ 20 4 3 (t+10) ⎥ (t+15) (t+25) 1 ⎥ 5 5 ⎥ 5 1 2 ⎦ (t+25) 5 1/2 1 (t+15)
Now, using Eq. 4 the components of PV are defined as pi =
4
[pij (t)]1/4 , i = 1, 2, . . . , 4
(8)
j=1
⇒ p1 (t) = [1.
20 .4.3]1/4 = [240/(t + 10)]1/4 (t + 10)
(9)
and p2 (t) = [ and
also
(t + 10) (t + 25) (t + 15) 1/4 (t + 10)(t + 15)(t + 25) 1/4 .1. . ] =[ ] (10) 20 5 5 500 5 1 .1.2]1/4 = [2.5/(t + 25)]1/4 p3 (t) = [ . 4 (t + 25)
(11)
5 1 1 p4 (t) = [ . . .1]1/4 = [0.8333/(t + 15)]1/4 3 (t + 15) 2
(12)
Hence, the normalized PV is defined as p(t) = [p1 (t)/
4 1
pi , p2 (t)/
4 1
pi , p3 (t)/
4 1
pi , p4 (t)/
4
pi ]
(13)
1
where i ( j pij (t))1/4 = 41 pi , and pi (t) values are defined in Equations 10, 11, 12, and 13, and their functional formation along with the normalized form of complete PV p(t) has been represented graphically in Fig. 1. It can be verified that the calculated combined PC matrix P(t) represents initial PC matrix P1 at t = 0 and pertubed PC matrix P2 at t = 10; i.e., P(0) = P1 , P(10) = P2 . Also, the normalized PV p(t) at t = 0 is p(0) = [0.4502, 0.3366, 0.1144, 0.0988] which is same as the calculated PV using GMM for initial PC matrix. According to p(0), Redmi Note 8 Pro initially secures the highest preference value (0.4502) and is the most suitable mobile product based on the initial matrix P1 , and Redmi Note 9 Pro, MI A3, and Redmi Note 7 Pro secure 0.3366, 0.1144, and 0.0988 preference values, respectively.
Methodological Development for Time-Dependent...
475
Fig. 1 PV versus time t
Now, the normalized PV p(t) at t = 10 is p(10) = [0.3553 0.4644 0.0987 0.0816] which is same as the calculated PV using GMM for pertubed PC matrix. According to p(10), Redmi Note 9 Pro secures the highest preference value (0.4644) and is the most suitable mobile product at offer implementation, and Redmi Note 8, Pro, MI A3, and Redmi Note 7 Pro secure 0.3553, 0.0987, and 0.0816 preference values, respectively. The perturbations have caused major variation in final PV, which led to change in conclusive suitable product from Redmi Note 8 Pro to Redmi Note 9 Pro that is also visible in Fig. 1. It becomes a matter of interest to measure time for which the initial decision remains significant. From t = 4.465, where p(4.465) = [0.40110.40120.10730.0904], p2 (t) starts producing higher preference value then other pi (t), i = 1, 3, 4 (see). Hence, using the proposed GG process of AHP, the author concludes that Redmi Note 8 Pro remains the most suitable mobile for t = 4.465(approximately41days) and for t = t0 , 4.465 ≤ t0 ≤ 10 final PV p(t) results Redmi Note 9 Pro as the most suitable mobile to purchase in light of uniform distribution followed by implementation of the commercial offer.
5 Conclusion The authors have thoroughly formulated and explained the methodological development of AHP using probability distributions with the help of a real-time mobile selection problem. The significance of GG procedure can be seen easily in the solved numerical example, where the procedure has effectively shown that a decision based on initial PC matrix has only 41 days of relevance. GG procedure even allows to use intermediate scale values for PC which generalizes Saaty’s 1–9 discrete scale to a continuous scale. And, this procedure can be used as a way to conduct sensitivity analysis of AHP, where it allows to study the pattern of change in each component of final PV concerning the perturbation in a single alternative as shown in Fig. 1 for changes in i0 = 2nd alternative.
476
A. Garg and T. Ganesh
The authors suggest practicing this methodology when a decision is highly sensitive to small perturbations in the nature of the alternatives, involved in the study. For example, changes in medical prescription according to variations in some symptoms of disease. Also, GG procedure can be used as an effective sensitivity analysis technique in AHP and for the studies where a decision’s significance time period is more important. However, the proposed procedure answer many queries; the authors identify some of its limitations. GG procedure allows the study of the pattern of change in each PV component concerning changes in single alternative but not for more than one; otherwise, it violates reciprocalness of PC matrix, and it requires highly precise identification of probability distributions involved in the process.
References 1. Choudhary, D., Pahuja, R.: Performance optimization by MANET AODV-DTN communication. In: Soft Computing: Theories and Applications, pp. 1–9. Springer, Singapore (2020) 2. Zaheer, H., Pant, M.: Solution of multiobjective portfolio optimization problem using multiobjective synergetic differential evolution (MO-SDE). In: Soft Computing: Theories and Applications, vol. 584, pp. 191–199. Springer, Singapore (2017) 3. Rajput, N.S., Shukla. D.D., Ishan, L., Sharma, T.K.: Optimization of compressive strength of polymer composite brick using Taguchi method. In: Soft Computing: Theories and Applications, pp. 453–459. Springer, Singapore (2018) 4. Saaty, T.L.: A scaling method for priorities in hierarchical structures. J. Math. Psychol. 15(3), 234–281 (1977) 5. Saaty, T.L.: What is the analytic hierarchy process? In: Mathematical Models for Decision Support, pp. 109–121. Springer (1988) 6. Saaty, T.L.: Fundamentals of Decision Making and Priority Theory with the Analytic Hierarchy Process, vol. 6. RWS Publications (2000) 7. Saaty, T.L.: Time dependent decision-making; dynamic priorities in the AHP/ANP: generalizing from points to functions and from real to complex variables. Math. Comput. Model. 46(7–8), 860–891 (2007) 8. Crawford, G., Williams, C.: A note on the analysis of subjective judgment matrices. J. Math. Psychol. 29(4), 387–405 (1985) 9. Choo, E.U., Wedley, W.C.: A common framework for deriving preference values from pairwise comparison matrices. Comput. Oper. Res. 31(6), 893–908 (2004) 10. Kumar, V., Parida, M.K., Albert, S.K.: Analysis of SMAW parameters using self organizing maps and probability density distributions. In: Soft Computing: Theories and Applications, pp. 7–18. Springer, Singapore (2020) 11. Ramansh, K., Kalra, P., Mehrotra, D.: Trend analysis for retail chain using statistical analysis system. In: Soft Computing: Theories and Applications, pp. 53–62. Springer, Singapore (2020) 12. Yuen, K.K.F.: Compound linguistic scale. Appl. Soft Comput. 21, 38–56 (2014) 13. Franek, J., Kresta, A.: Judgment scales and consistency measure in ahp. Procedia Econ. Finance 12, 164–173 (2014) 14. Dong, Y., Xu, Y., Li, H., Dai, M.: A comparative study of the numerical scales and the prioritization methods in AHP. Eur. J. Oper. Res. 186(1), 229–242 (2008)
Implementation of Speculate Modules and Performance Evaluation of Data Mining Clustering Techniques on Air Quality Index and Health Index to Predict High-Risk Air Polluted Stations of a Metropolitan City Using R Programming N. Asha and M. P. Indira Gandhi Abstract Cluster analysis is a statistical approach of grouping similar objects that belong to the same category. This technique is used to group the assorted air pollution monitoring stations of a metropolitan city based on air quality index (AQI) and health index. Air quality index is used by government bodies to determine how the area is polluted currently due to enormous increase in vehicle population and industries which emits pollutants like PM10 , PM2.5 , SO2 , NO2 , CO, O3 , and NH3 . The subindex of individual pollutant concentration is calculated. The pollutant with the maximum subindex determines the AQI. AQI values of a particular station determine the quality of air. Central Pollution Control Board (CPCB) classify breathing discomforts of people called as health index into good, satisfactory, and poor based on AQI. This paper focuses on clustering the assorted air pollution monitoring stations based on health index using clustering techniques such as K-means and hierarchical clustering. By the virtue of this technique, the paper also emphasis on the performance of clustering techniques and suggest a speculate modules to determine which stations are under high risk for any given metropolitan city which could be used by the government to take necessary actions and decision policies to control air pollution. The work is carried out using R programming tool. Keywords Cluster analysis · K-means clustering · Hierarchical clustering · AQI · Subindex · R programming
1 Introduction Air pollution is recognized as an important problem all over the world. It can be referred as a mixture of multiple pollutants that vary in size and composition. Air N. Asha (B) · M. P. Indira Gandhi Mother Teresa Women’s University, Kodaikanal, Tamil Nadu, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_39
477
478
N. Asha and M. P. Indira Gandhi
pollutants are commonly grouped as particulate matters such as PM10 and PM2.5 , (particulate matter) and ground-level pollutants such as carbon monoxide (CO), sulfur dioxide (SO2 ), nitrogen oxides (NO). Emission of such air pollutants by industries and prodigious increase in vehicle population cause high risk in breathing discomforts of people, plants, and animal health. Air pollutant concentration triggers the chronic disease such as asthma, heart attack, and bronchitis determined by CPCB [1]. The ambient air quality monitoring stations produce data of pollutant concentration on hourly basis. Existing machine learning tools are used for predicting the pollution in high-risk stations, and comparative analysis the techniques gives the best model to predict the highly polluted station with reference to data size and processing time [2]. So, as the data collected is voluminous, it is necessary to mine the large dataset to get required information. K-means and hierarchical cluster techniques are implemented to cluster the air pollution monitoring stations into good, moderate, and satisfactory called as health index. This implementation predicts which monitoring station in urban areas is prone to high health risk.
2 Related Work In [3], the paper as focused on clustering technique to predict the healthy and unhealthy area in the city. The harmful air pollutant like O3 , PM10 , and SO2 are used with 500 instances and 8 attributes. The study also compares K-means and Em clustering techniques which are implemented using Weka platform. In [4], the paper reviews on broad applications of clustering techniques on air pollution studies. Also, it discusses classifications, advantages, and disadvantages of using K-means and hierarchical clustering methods. The hazardous pollutants that are expelled to the environment which cause intense effect on health of human beings have analyzed using clustering techniques. In [5], the objective of this paper is on applying K-means and hierarchical clustering on large dataset of air pollutant that are obtained by Internet of things (IOT) in smart cities. The paper carries the work by applying the clustering techniques to find the mean distance between the clusters obtained using K-means techniques for any given value of k (number of clusters). In [6], the paper directs on enhancing the K-means algorithm to predict high accurate value of AQI with minimum execution time when compared with the existing techniques. In [7], the paper gives insights into applying K-means clustering methods on big datasets to achieve global optima. The research work is carried out using R studio platform and presents the performances of enhanced clustering technique over existing K-means algorithm. In [8], the paper investigates three partial clustering algorithms such as K-means, K-medoids, and fuzzy C means. The research work is carried by comparing the three algorithms based on metrics like network throughput, network lifetime, and dead nodes per round.
Implementation of Speculate Modules and Performance Evaluation …
479
In [9], the paper applies data mining techniques on 20 cloud-based data services based on service-oriented architecture. A successful approach is proposed that classifies the services as best good and satisfactory. In [10], the aim of this paper is to detect anomalies in the dataset outlier removal clustering technique on iris dataset. The ORC technique simultaneously performs both K-means clustering and outlier detection. In [11], the paper proposes a comprehensive approach for predicting fraudulent claims in healthcare industry using predictive modeling techniques such as logistic regression, neural network, and decision tree. In [12], the paper shows the performance review of clustering data mining techniques to know which data mining technique best suits to identify customer behavior, pattern of purchase, and improves customer satisfaction and retention.
3 Case Study One of the very popularly known as garden city called Bangalore is taken as case study to predict the high-risk stations caused due to air pollution. The air pollutant concentration such as PM10 , PM2.5 , SO2 , NO2 , CO, NO, and NH3 is collected from 17 different locations using manual equipments under National Ambient Air Quality Monitoring Program (NAMP) covering industrial area, mixed urban area, and sensitive area [13]. Karnataka State Pollution Control Board (KSPCB) is a government body which is monitoring the ambient air quality of Bangalore City at 17 different locations. The data emitted by the manual equipments is collected in excel format for each pollutant on hourly basis from which monthly and yearly average data is also calculated. For the study purpose, monthly average data from 2016 to 2019 of 17 assorted stations is archived which contains 746 instances with 21 attributes from KSPCB Web site.
4 Findings of Previous Work The average yearly data collected with the air pollutants such as PM10, SO2 , and NO2 of Bangalore City was normalized to know the highest air pollutant that is affecting the city using linear regression technique, multi-linear regression models, and backpropagation algorithm. These techniques have given the best results in predicting the air quality index of the smart city. Linear regression techniques were applied for two attributes individually such as pollutant and its subindex value to predict the air quality index. Multi-linear regression models were applied together on all the three pollutants [PM10 , S02 , and NO2 ] to predict the air quality index. Same procedure was also carried out for backpropagation algorithm to predict air quality that was affecting the environmental conditions of the city. This technique gave the best predictions of
480
N. Asha and M. P. Indira Gandhi
Table 1 Columns representing the predicted AQI and health impacts or health index
predicting the air quality index of all the 15 stations in Bangalore city. The predictions were conducted with the aid of Weka tool. The statistical observations of all the developed models are compared with their level of predictions in terms of accuracy and relative error [14]. The experiments conducted in turn helped to find out the high concentration of PM10 pollutant in the city. This work paves way for further analysis of predictions made by MLP that can be classified into moderate, satisfactory, good, and poor which can be given as an input to decision tree and Naïve Bayes algorithm to process nominal data and identify the highly polluted place in Bangalore city [14]. The work continued by considering the output file which contained the predictions of air quality index (AQI) of PM10 , SO2 , and N02 . The AQI of these pollutants were classified manually in excel worksheets based on the standards of CPCB (i.e., if AQI is between the range 0–50 it is good, 51–100 it is satisfactory, 101–200 it is moderate, and 201–300 it is poor) called as health impacts or health index of air pollutants. This file was used to predict the health impacts using classification techniques in data mining such as Naive Bayes, decision tree, and K-nearest neighbor classifier using Weka tool [13]. The experiments gave the best classification in classifying the air quality index of each pollutant into its health standards as defined by CPCB. The decision tree algorithm gave the best classification results in terms of its prediction and time. A small output file is displayed which is in excel format that shows predictions of AQI and its health impacts. In the above Table 1, the column predicted AQI was predicted using regression techniques and the column predicted health impacts were predicted using classification techniques.
5 Methodology Further, the work is accomplished for large datasets that incorporate average monthly data archived for the year 2016–2019. The data collected is enhanced with more number of instances (746) and also with more pollutant parameters like PM2.5 , CO, O3 , and NH3 . Stations names, year, and month are also included for each instances. The target of this experiment is to cluster how many stations come under good, moderate, and satisfactory category. This is achieved using clustering techniques in data mining adopting K-means and hierarchical clustering. In order to get the best results instead of using Weka tool, clustering is handled with R programming tool. Since both are open source software for analytics. R programming is chosen to show
Implementation of Speculate Modules and Performance Evaluation …
481
Table 2 Monthly average data of 17 assorted stations of Bangalore city archived to predict cluster of high-risk stations based on health impacts
the best results interpreted in terms of graphical presentation. The following Table 2 shows the input file that contains 21 attributes and 746 instances. Table 2 shows air pollutant data accumulated for 17 different stations of Bangalore city. The average data of every month of each year is collected, here is a small instance shown for reference. The above results of Table 2 were obtained by conducting experiments in the previous work to predict AQI index using backpropagation algorithm and decision tree to classify AQI into health impacts of good, moderate, and satisfactory. This section involves two modules to describe the proposed methodology.
5.1 Module 1: K-Means Cluster Analysis to Predict Clusters of Assorted Stations Based on Health Impacts K-means cluster analysis is a popular method to divide the large dataset into groups that belong to the same category or cluster based on the mean value assigned to each cluster called the centroid. The number of clusters in this experiment is determined as k = 3 as the dataset contains three categorical values such as “good”, “moderate”, and “satisfactory” as shown in Table 2. Therefore, the target variable to be clustered is stations names which are clustered into good, moderate, and satisfactory. The following steps are carried out to do the task. Algorithm 1: Clustering of AQI with respect to year and health impacts 1 The dataset called kmeans_pred in excel format which is shown in Table 2 is converted to csv (comma separated variables) format in Rstudio platform 2 A copy of the original file is saved as kmeans_pred_origin 3 The columns such as Year, Month, Station (string type), health impacts are made Null as they are not used for clustering. (year, month, station names and health impacts are later mapped with clustering) 4 The dataset is normalized in order to have the standard range of values for all the attributes (continued)
482
N. Asha and M. P. Indira Gandhi
(continued) Algorithm 1: Clustering of AQI with respect to year and health impacts 5 Results are obtained using kmeans algorithm 6 The results are then mapped with year and health impacts 7 The results obtained in the table format are then compared with the original table
Algorithm 2: Clustering of Station names with respect to air pollutant, AQI, and health impacts 1
The dataset called kmeans_pred in excel format which is shown in Table 2 is converted to csv (comma separated variables) format in Rstudio platform
2
A copy of the original file is saved as kmeans_pred_country
3
The columns such as Year, Month, Station (string type) are made Null as they r not used for clustering. (year, month, station names and health impacts are later mapped with clustering)
4
As Predicted.health.impacts attribute is categorical in nature, so it is converted into numeric attribute, so good is represented as “1”, moderate as “2”, satisfactory as “3”
5
The dataset is normalized in order to have the standard range of values for all the attributes
6
The kmeans clustering is performed by considering the predicted AQI and health impacts
7
Results are obtained using kmeans algorithm named as res_clus_country
8
Attribute station list is assigned to cluster results. The results of the cluster are written into excel
9
The results of the cluster are written into excel file for verification with the input file as shown in Table 2
10 Cluster plot graph is obtained which shows the number of the stations clustered into good moderate and satisfactory instead of station names
5.2 Module 2: Hierarchical Clustering to Predict Clusters of Assorted Stations Based on Health Impacts. This is a type of clustering which builds hierarchy of clusters. In this experiment, stations are clustered into tree-like structure. The agglomerative method of clustering is used to cluster the station which uses bottom-up approach where each observation groups into its cluster and pairs of clusters are merged as one moves up the hierarchy. Only 50 instances are used for hierarchical clustering as to avoid overlaps in the graph. Algorithm 3: Clustering of Station names with respect to air pollutant, AQI, and health impacts 1
The dataset called test_he_50 in excel format is converted to csv (comma separated variables) format in Rstudio platform (continued)
Implementation of Speculate Modules and Performance Evaluation …
483
(continued) Algorithm 3: Clustering of Station names with respect to air pollutant, AQI, and health impacts 2
A copy of the original file is saved as test_50_origin
3
The columns such as Station (string type) is made Null as they r not used for clustering
4
Distance matrix is built using Euclidian distance method for test_50 csv file
5
Hierarchical clustering is applies to the input file test_50
6
A dendogram is built using hclust function in R platform
7
A cutree function to cut the tree structure with number of cluster k = 3
8
The cluster results are appended back to th original dataframe under the column name cluster with mutates function
9
The dplyr package used to count how many observations are assigned to each cluster
10
A trend analysis between station and health impacts is obtained using the function ggplot function
6 Results and Discussion of Module 1 and Module 2 6.1 Results of Module 1 The succeeding tables and graphs are the results of Module 1 of Algorithm 1 and 2 demonstrate the clusters of health impacts, year, and stations with respect to AQI index and its air pollutants.
6.1.1
Comparing the Results of the Cluster Table with Attribute Predicted Health Impacts
Analysis: (a)
Here, from the above Table 3, the values 14 + 197 + 23 + 4 + 0 + 182 + 7 + 3 + 316 = 746(total observations). Therefore, the good is correctly clustered as 197, moderate is correctly clustered as 182, and satisfactory is correctly clustered as 316.
Table 3 Confusion matrix of cluster table with attribute predicted health impacts
Good
Good
Moderate
14
197
Satisfactory 23
Moderate
4
0
182
Satisfactory
7
3
316
484
N. Asha and M. P. Indira Gandhi
Fig. 1 Results of K-means cluster analysis for the attribute health impacts
400 300
moderate
100
satisfactory
0
(b)
6.1.2
good
200
Cluster 1 Cluster 2 Cluster 3
Also, 14 + 23 + 4 + 0 + 7 + 3 = 51/746 * 100 = 6.8664 means the error rate is 7% and it has correctly clustered to 93% (i.e., 197 + 182 + 316/746 * 100 = 93.16%). The following Fig. 1 represents the cluster counts of health impacts of clustering analysis. Comparing the Results of the Cluster Table with Attribute Year
Analysis: (a) (b)
Here from the above Table 4 1 + 46 + 106 + 3 + 46 + 155 + 17 + 46 + 168 + 4 + 62 + 92 = 746(total observations). Also, 1 + 46 + 3 + 46 + 17 + 46 + 4 + 62 = 225/746 * 100 = 30.16 means the error rate is 30% and it has correctly classified to 70%. The following Fig. 2 represents the cluster counts of year wise of clustering analysis.
Table 4 Confusion matrix of predicted health impacts with year
Fig. 2 K-means cluster analysis results year wise
Good
Moderate
Satisfactory
1
46
106
2017
3
46
155
2018
17
46
168
2019
4
62
92
2016
200 150
Good
100
Moderate
50 0
Satisfactory 2016
2017
2018
2019
Implementation of Speculate Modules and Performance Evaluation …
6.1.3
485
Clustering of Station Names with Respect to Air Pollutant, AQI, and Health Impacts
The following Table 5 demonstrates the number of observed clusters versus the number of predicted clusters. The count of the observed clusters and predicted clusters are done in excel worksheets using the following code. Countifs (rangeofattribute station, “station_name”, rangeofattribute clusters, “cluster number”) Figure 3 represents the 746 air pollution monitoring stations clustered into good represented as 1, moderate represented as 2, and satisfactory represented as 3.Each station is numbered from 1 to 746 in the graph.
6.2 Results of Module 2 The following Fig 4 shows the cluster dendogram of hierarchical clustering where each station is cluster in top-down approach. The figure shows how many stations belong to the cluster “1”, “2”, and “3” as “good”, “moderate”, and “satisfactory” (Fig. 5).
7 Performances of K-means and Hierarchical Clustering • From the above experiments, it is observed that K-means clustering is giving better results when compared to hierarchical clustering. • The stations are not clustered into its predicted clusters in hierarchical clustering when compared to K-means. • In terms of computation, K-means is less expensive than hierarchical that run on large data frame within a reasonable time frame. • Numbers of clusters in K-means have to be decided earlier before computation, but it is not necessarily important in hierarchical clustering. • K-means is sensitive to outliers and hierarchical clustering is less sensitive to outliers. • In hierarchical clustering, the cutting of dendogram may be difficult to decide [4] the high-risk polluted stations. • K-means is capable of handling large datasets and hierarchical clustering handles data of different size and shape. • So K-means clustering algorithm is best suited for air pollutant dataset in order to predict high-risk polluted stations of a metropolitan city in terms of cost, computation, and clustering.
Teri_Domulur
Bansawadi_Police_Station
UVCE_KR_Circle
DTDC_House_Victoria_Road
Swan_Silk_Peenya
KHB_Industrial_area_Yelhanka
13
14
15
16
17
Indira_Gandhi_Nimhans
8
12
Victoria_Hospital
7
Kajisonnenhalli
Central_Silk_Board
6
11
Amco_Batteries_Mysore_Road
5
City_Railway_Station
Yeshwanthpura
4
Sanegurvanhalli_CAAQM
Peenya_Industrial_Area
3
10
Rail_wheel_factory
2
9
Export_Commercial_Park
1
Station names
Comparing of clusters in each station
29
8
39
9
18
14
4
17
10
3
1
4
2
2
13
17
4
Good
2
16
0
29
22
15
30
24
10
35
38
4
19
24
8
10
8
Moderate
11
19
3
5
1
13
9
2
23
5
4
35
23
16
23
16
23
Satisfactory
K-means clustering (predicted clusters)
Table 5 Counting the number of clusters in predicted (K-means clustering) versus observed clusters
29
7
39
14
20
14
4
28
10
7
6
5
2
3
13
16
4
Good
11
16
3
5
0
11
3
1
16
1
2
32
15
11
17
12
25
Moderate
Observed clusters
2
19
0
23
22
17
35
13
16
34
35
5
25
28
12
14
13
Satisfactory
486 N. Asha and M. P. Indira Gandhi
Implementation of Speculate Modules and Performance Evaluation …
487
Fig. 3 Result of cluster of station numbers using clustplot function in R programming
8 Conclusions Therefore, from the study of the above clustering methods that K-means method is giving best results as compared to hierarchical clustering. Central_silk_board Station has high risk of health compared to other stations. Therefore, using these steps one could find out which station is under high health risk caused due to air pollutants. The year 2018 had high risk and more moderate health impact compared to 2016, 2017, and 2019. These steps could be followed as a model that provides information to KSPCB to take necessary action and make decision policies to control air pollution. There are two flaws in this analysis: • The data obtained does not contain all the attributes of health impacts such as poor, severe, and very poor. • The AQI is also depended on environmental parameters such as longitude, latitude, ozone concentration, and wind speed that decides the breathing discomfort of people in smart cities.
9 Future Work • Due to COVID-19, the entire country is under lockdown; therefore, regression techniques, classification, and clustering analysis are essential to observe how the environmental parameters are behaving when there is no traffic and industrial pollution in smart cities. • Prediction of increase in the components like SO2 , CO, NH3 , O3 , NO2 that is affecting the increase in the value of PM10 and PM2.5 in high-risk stations. • To derive a mathematical model to predict the components that is affecting the particulate matter.
488
N. Asha and M. P. Indira Gandhi
Fig. 4 Cluster dendogram of stations
• To predict the health index of the high-risk areas and categorizing how this pollutant is behaving on different age group. • Showing the results of predictions of PM10 not only depends on vehicle population, but also factors like climatic condition, season, and time. • Prediction of environmental factors like wind speed, wind direction, and ozone may also cause the increase in particulate matter.
Implementation of Speculate Modules and Performance Evaluation …
489
Fig. 5 ggplot of station versus clusters
References 1. Akolkari, A.B.: National air quality. www.cpcb.nic.in. 2. Ameer, S., Shah, M., Khan, A., Song, H., Maple, C., Islam, S., Asghar, M.: Comparative analysis of machine learning techniques for predicting air quality in smart cities. In: IEEE Access. p. 1 (2019). https://doi.org/10.1109/ACCESS.2019.2925082
490
N. Asha and M. P. Indira Gandhi
3. Sathya, D., Anu, J., Divyadharshini, M.: Air pollution analysis using clustering algorithms. In: 2017 International Conference on Emerging trends in Engineering, Science and Sustainable Technology (ICETSST). ISSN: 2348 – 8387 4. Govender, P., Sivakumar, V.: Application of K-means and Hierarchical clustering techniques for analysis of air pollution: a review (1980–2019). Elsevier vol. 11, Issue no. 1 (2019) 5. Doreswamy, Ghoneim, O.A., Manjaunath, B.R.: Air pollution clustering using K-means algorithm in smart city. Int. J. Innov. Res. Comput. Commun. Eng. 3(7) (2015) 6. Kingsy, G., Manimegalai, R., Geetha, D., Rajathi, U., Raabiathul, B.: Air pollution analysis using enhanced K-means clustering algorithm for real time sensor data. IEEE (2016). ISSN: 2159-3450 7. Agnivesh, Rajiv, P., Amarjeet, S.: Enhancing K-means for multidimensional Big Data clustering using R on Cloud. Int. J. Innov. Technol. Exploring Eng. (IJITEE) 8(7). ISSN: 2278–3075 8. Sharma, R., Vashisht, V., Singh, U.:: Performance analysis of evolutionary technique based partitional clustering algorithms for wireless sensor networks. In: Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computing, Springer, Berlin, pp. 171–180 (2018) 9. Zeenat, P., Yadav, R.B.S.: Classification of SOA-based cloud services using data mining technique. In: Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computing, Springer, Berkin, pp. 971–978 (2019) 10. Sarvani, A., Venugopal, B., Devarakonda, N.: Anomaly detection using k-means approach and outliers detection technique. In: Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computing, Springer, Berlin, pp. 375–385 (2017) 11. Pandey, P., Saroliya, A., Kumar, R.: Analyses and detection of health insurance fraud using data mining and predictive modeling techniques. In: Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computing, Springer, Berlin, pp. 41–49 (2016) 12. Shalini, S.D.: Comparative analysis of clustering techniques for customer behaviour. In: Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computing, Springer, Berlin, pp. 753–763 (2016) 13. Asha, N., Indira Gandhi, M.P.: A forecasting model to predict the health impacts in metropolitan cities using data mining techniques and tools. Int. J. Appl. Eng. Res. 13(15), 12202–12208 (2018). [Research India Publications] 14. Asha, N., Indira Gandhi, M.P.: Approaches in predicting urban air quality in Bangalore city and comparative analysis of predictive models using data mining. Glob. J. Eng. Sci. Res. (2017). ISSN 2348-8034
Automated Gait Classification Using Spatio-Temporal and Statistical Gait Features Ratan Das, Preeti Khera, Somya Saxena, and Neelesh Kumar
Abstract This work presents the evaluation of gait spatio-temporal and statistical parameters for automatic classification of human gait. A bundle of clinically relevant walk features is obtained from a cohort of healthy controls as well as neuro-impaired subjects and is normalized using dimensionless normalization to account for physiological variation like height, weight, etc. For feature selection and optimization, significant differences between the derived features from both groups are computed. A machine learning strategy is employed to train and classify these data into healthy and pathological group. The classifier reported a best classification accuracy of around 96% with both dimensional and dimensionless feature sets, with absolute minimum accuracy of just over 90%. The present work demonstrates the effectiveness of spatio-temporal and derived statistical features for gait classification. Extraction of such features is a relatively low-cost and less burdensome process in comparison with traditional approaches involving raw kinetic or kinematic parameters such as ground reaction force and bio-signals. Keywords Gait · Neurological disorder · Machine learning · Spatio-temporal parameter
1 Introduction Human walking is perceived as a necessity for quality life. The area of human biomechanics and gait has gained significant attention in recent years [1]. Gait-related disorders can affect anyone regardless of age or sex and thus impede activities of daily living. Clinicians anticipate gait analysis to quantify the factors governing R. Das (B) · P. Khera · N. Kumar Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India CSIR-Central Scientific Instruments Organisation, Chandigarh 160030, India S. Saxena Post Graduate Institute of Medical Education and Research, Chandigarh 160030, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_40
491
492
R. Das et al.
human biomechanics. For effective gait analysis and rehabilitation planning, objective assessments based on quantitative gait measurements are desired. However, 3D gait analysis, which is traditionally conducted using motion capture cameras and force platforms, is expensive, requires technical expertise and is inaccessible to a majority of clinics and hospitals. To overcome these limitations, many wearable sensors such as foot insoles [2–4] and inertial sensors [5–7] are widely used these days. Moreover, gait data outputs can include a multitude of variables, which may be overwhelming for clinicians to interpret, adding to the challenges of using gait analysis in clinical settings. A wider range of data also adds on to the analysis cost and complexity owing to prerequisite of subjective judgement and experience of the clinician. Thus, a framework that can analyse these data and provide a supporting statement regarding the quality of gait performances can save time and efforts of a clinician in the analysis. To explore this, machine learning techniques have recently found use towards analysing the non-stationary, high-dimensional and complex gait data to provide cost-effective solutions. Automatic methods to segregate normal and abnormal gait could be of great interest in several clinical applications especially towards rehabilitation. Clinical reports have indicated that gait abnormalities [8] and variability are not only clinical syndromes, but also independent predictors for potential disorder diagnosis [9, 10]. A number of studies have also investigated the relationship between gait abnormalities and cognitive function for Alzheimers [9]. Statistical modelling along with intelligent machine learning methods such as artificial neural network (ANN), support vector machine (SVM), decision tree (DT) and HMM has been explored in the last decade for automatic gait and disorder recognition. Machine learning methods such as ANN and SVM have found application in automatic recognition of pathological gait. Lakany [11] implemented an ANN-based method to differentiate 89 healthy controls from 32 pathological patients. Input classification features were derived from spatial and temporal gait parameters measured using stereo-photogrammetry. Thanasoontornrerk and colleagues [12] classified knee osteoarthritis (KOA), PD and HC using kinematic and temporal gait parameters as input features for the designed DT classifier. However, the presented method reported significantly low accuracy. Begg et al. [13] implemented a SVM to recognize gait changes corresponding to ageing from kinematic feature sets extracted from foot clearance of 28 older adults and 30 young subjects. Williams et al. [14] classified 102 traumatic brain injury (TBI) patients into six groups using pelvic rotation, hip, knee and ankle angle and 48 statistical features acquired from VICON motion capture system. The designed SVM classifier could classify with 76.85% accuracy using features only from affected limb feature in addition to overall classification accuracy of 82.15% using both limb features. The work reported by Manini et al. [15] used a range of gait temporal features along with HMM and time/frequency domain features to automatically group post stroke (PS), Huntington’s disease (HT) and elderly using a SVM classifier with best overall classification accuracy of 90.5%. However, a vast majority of these studies are based on input features from standard gait measurement systems such as ground reaction force, joint kinematics and even sEMG [16–18]. There are very limited reported works on correlation and automatic classification of disorders using spatio-temporal
Automated Gait Classification Using Spatio-Temporal …
493
features. Although spatio-temporal features can be extracted with much less efforts and resource compared to kinetics and kinematics data, however the former suffers from data disparity due to anthropometric variations among subjects. This work reports the automatic classification of human walk into normal and pathological gait based on spatio-temporal and statistical gait features. Gait data was collected from 28 subjects (14 healthy and 14 neuro-impaired patients). Section 2 presents the methods and data acquisition protocols for the study. A brief description of feature extraction, normalization and machine learning framework is also presented in the same section. The results and findings are discussed in Sect. 3 with emphasis on classifier performances. Finally, the paper is concluded with a brief discussion on the significance of the work and future scopes.
2 Methods and Materials 2.1 Research Data and Study Protocol A cohort of 14 patients, both in early and advanced stages of neurological disorders (Stroke and Parkinson’s disease), was recruited after due ethical approval from PGIMER, Chandigarh institutional ethics committee. The study was conducted in Gait Laboratory of Physical and Rehabilitation Medicine Department, PGIMER Chandigarh, under the supervision of experienced physician. Equal number of healthy controls (HC) with no known history of physical or neurological impairments that could possibly affect the characteristics of their walking were recruited from CSIR-CSIO, Chandigarh. Participants consent was obtained from all subjects after describing the procedures of the trial. The height and weight of each participant were recorded for further use in parameter normalization. Data was recorded from patient group using a 16 force plate-based walking platform from BTS Bioengineering (USA). For the HC, the data was recorded using our indigenously developed wireless foot sensor module (WFSM) [19, 20]. The developed WFSM is an inertial sensor-based compact, low-power, wireless device that gives measure of acceleration and angular velocities along three axes along with foot inclination angles. The authors in their previous works [19, 20] have reported its application for gait event detection, and those are crucial for computing spatio-temporal parameters, as well as spatial parameters like stride length and gait velocity. All these studies were validated against force plate and pressure mat systems, which are gold standards for gait analysis. The WFSM prototype, one at each foot, was placed using medical grade straps (as shown in Fig. 1). Both the modules are time synchronized using a digital trigger from LabVIEW. The subjects were required to stand normally before beginning to walk, and any offsets associated with the foot were zeroed from the software automatically. Each subject was then required to walk in the open and spacious lawn inside CSIO premise in their self-selected normal speed. Data recording was triggered after 3–4 steps and recorded for 30 s of continuous walk. The subjects
494
R. Das et al.
Fig. 1 (Left) Foot measurement axis (right) WFSM prototype strapped at foot
were asked to stop 3–4 steps after the recording time was over to avoid start-up and slowdown effects associated with gait initiation and termination.
2.2 Feature Extraction A set of primary features (XRp ) that includes step time (StpT), single limb support time (SlsT), stance time (StnT), swing time (SwtT), gait cycle time/stride time (GCT) and cadence are computed using gait events, namely initial contact (IC) and toe off (TO) detected from both the legs [21]. However, since gait signature is very unique for each individual, these absolute primary features may not always be an ideal choice or significant indicator for a general group. As a result, the relative percentage of each feature (X R ) w.r.t. GCT is extracted using Eq. 1. XR =
XP × 100% GCT
(1)
These relative features closely match with individuals belonging to particular group, for example healthy or impaired. For example, the stance to swing ratio in a healthy gait is around 60:40 of total gait time [22] and varies significantly in case of a pathological gait. Moreover, although seemingly simple, walking is a complex process involving need for coordinated action of large number of joints and muscles. As a result, even for a healthy person, each step may vary from previous one. Slight asymmetry may reflect functional differences in the contribution of each limb to propulsion and control during walking. However, for persons with impaired gait, asymmetry is a strong indicator for studying gait pathology. For example, a person with arthritis of right knee will tend to walk ensuring there is minimal loading in the right knee (leg). Hence, the left foot stance time will be relatively higher than the duration of right foot stance time resulting into a prominent asymmetry between the two legs. The most prominent matrices used to study gait asymmetry, viz. symmetry index (SI), ratio index (RI), gait asymmetry (GA) and symmetry angle (SA) [23] have
Automated Gait Classification Using Spatio-Temporal …
495
been computed for the bilateral gait parameters. Similarly, the covariance (CoV) for intra-stride variations that also signifies a possible condition with potential to gait alteration [24] is extracted to study stride-to-stride fluctuations and is given by Eq. 2. CoV =
Standard Deviation × 100% Mean
(2)
To compare gait trends between two people of significantly different heights or masses, one must ‘normalize’ the information in an attempt to remove all variation due to these differences. Minimizing the effect of between subject differences in physical dimensions and walking speed on spatial–temporal gait data may reduce data dispersion, leading to improved gait classification accuracy. One effective solution to scaling problem is to present the data in non-dimensional (NDN) form [25]. [26] reported improved accuracy of cerebral palsy diagnosis using a SVM approach from 83.3% to 96.8% when spatial–temporal gait data is first normalized using leg length and subject age [27] using dimensionless equations. In this present study, the time (t), velocity (ω) and frequency (f) parameters are normalized using the dimensionless equation given in [25, 27]. As a result, a set of 37 features that includes 12 primary spatio-temporal features, and rest statistical features are extracted from each individual for each stride. These features are significant indicators to study about human gait and balance [28]. However, spatio-temporal gait characteristics of an individual are affected by physical differences between subjects, including height, limb length, body mass, etc. This increases gait variability resulting into limiting the degree to which pathological gait signatures may reliably be distinguished.
2.3 Classification The general workflow for automatic classification of human walk into normal and impaired is shown in Fig. 2. The raw, i.e. dimensional feature set (both normal and pathological) is segregated as 70:30 for training and testing. The training data is used to train the classifiers to automate the learning process. Thereafter, the model performance is evaluated using the unseen test data set. To avoid overfitting, fivefold cross-validation is used. Next, in order to achieve best classification performance, the feature extraction and selection are carried out in our work. Each classification model is built using three sets of features (i) by using all spatio-temporal and statistical features as reported in previous section (37 features) and (ii) feature extraction using principal component analysis (PCA) to each input representation of raw features and thus, selecting 50% of input features (18 features), (iii) manually selected features based on literature and having greater variance (significant difference) between intergroups (29 features). The significance between both groups for each features was computed using the arithmetic mean as well as their standard deviation. As demonstrated in [12–14], ML techniques have achieved good classification accuracies in
496
R. Das et al.
Fig. 2 Systematic flow diagram of the classification framework
gait recognition and balance studies. SVM, kNN and DT are shown to have better generalization capabilities and are widely used for classification of neurological disorders. All the above-mentioned classifiers are trained in binary fashion using one-vs-one strategy. The performance evaluation was conducted by calculating five performance measures, i.e. classification accuracy (training and testing accuracies), precision, recall, specificity and F1-score. All measures are defined in terms of true positives (TP), false positives (FP), true negatives (TN) and false negatives (FN) and are given by Eqs. 3–7. Accuracy =
TP + TN TP + FP + TN + FN
(3)
TP TP + FP
(4)
Precision = Recall =
TP TP + FN
Specificity = F1 - Score = 2 ∗
TN TN + FP
Precision ∗ Recall Precision + Recall
(5) (6) (7)
3 Results The results of the experiment conduced on dimensional and non-dimensional data set are summarized in Table 1. The performance of fivefold cross-validation and
DT
kNN
SVM
DT
kNN
SVM
Classifier
All PCA Manual All PCA Manual All PCA Manual
All PCA Manual All PCA Manual All PCA Manual
Features
TesƟng Accuracy (%ge) 96.94 95.41 93.10 90.83 90.83 90.83 96.18 90.83 96.18
Specificity (%ge) 97.14 94 98.57 100.00 100.00 100.00 97.14 94.29 97.14
96.00 95.60 96 96.00 92.80 92.00 94.00 94.00 93.10 95.40 95.40 95.41 96.94 95.00 91.60 92.36 94.65 92.36
98.31 98.31 98.31 100.00 100.00 100.00 100.00 98.31 98
With Dimensionless normalized features
Training Accuracy (%ge) 97.70 98.00 95 89.90 91.60 90.00 96.70 94.00 96.70
With Raw Feature Sets
93.06 93.06 93.06 94.44 91.67 84.72 86.11 90.20 86.11
SensiƟvity (%ge) 96.72 96.72 86.89 80.33 80.83 80.83 95.08 86.89 95.08
Table 1 Classifier performances measured in terms of accuracy, specificity, sensitivity, precision and F1-score
98.53 98.53 98.53 100.00 100.00 100.00 100.00 98.31 98
Precision (%ge) 96.72 94 98.15 100.00 100.00 100.00 97 93 97
95.71 95.71 95.71 97.14 95.65 91.73 92.54 97.74 90.67
F1-Score (%ge) 96.72 95.16 92.17 89.09 89.09 89.09 95.87 89.83 95.87
Automated Gait Classification Using Spatio-Temporal … 497
498
R. Das et al.
evaluation on independent test data set showed the similar trends leading to generalization ability of the method. Out of the three classifiers used, SVM showed better performance for all three sets of features reported earlier in terms of overall performance evaluation. The accuracies shown by SVM and DT are nearly equal (~96%) resulting in robust classification for considered features. All features selected have shown other evaluation metrics above 95% which determines accurate segregation of both the classes. However, the kNN classifier reported a decrease of nearly 3–6% in accuracies for considered three data sets. The specificity and precision of kNN model are 100% showing accurate classification for normal or healthy group. The decrease in performance is due to misclassification of impaired group because few subjects were in early disease stage and their gait characteristics almost resembled to normal walking. As a result, few samples were not classified by this classifier resulting in slightly lower performance. Moreover, the feature selection and extraction process using PCA and feature selection by group variance have shown almost similar performance to all selected features. As also can be observed, the classifier performed considerably well with manually selected feature. This reduced feature set without any decrease in performance can strengthen the computational efficiency of the model. However, spatio-temporal gait characteristics of an individual are affected by physical dimensions between subjects, including age, height, weight, body mass index, etc. This increases gait variability resulting into limiting the degree to which pathological gait signatures may reliably be distinguished. The inclusion of dimensionless normalization, after being normalized by physical differences between individuals, resulted in achieving similar performance for SVM, an additional increase in accuracy of 2–7% for kNN and a deviation of ±2 to 4% for DT classifier. Although, all the other metrics calculated are above 85% leading to better prediction performance. The F1-score for all classifiers using considered three data sets is above 90% as mentioned in Table 1. The much improvement for kNN classifier is due to normalization technique which reduces inter-subject variations.
4 Discussion and Conclusion This paper presents a machine learning approach for automated segregation of healthy subjects w.r.t. a neuro-impaired population based on the human walking performances. A set of spatio-temporal and statistical parameters that can be extracted with much lesser resources have been used as input features. Data processing techniques such as dimensional normalization coupled with statistical methods such as symmetry and covariance have resulted into a strong feature set that removes the possible disparity arising due to anthropometric and physiological variations. This is observed from the classifier performance with all features that showed a classification accuracy in the range of ~91 to 97% over 91–95% and 91–96% for PCA and manual features, respectively, for dimensional features. Similarly, for NDN-based features, the classification accuracy reported for all three classifiers is in the range
Automated Gait Classification Using Spatio-Temporal …
499
of 92–97%, ~95% and 91–95% for all features, PCA and manual features, respectively. This indicates the significance of the derived features towards identifying gait performances and can further be optimized for improving computational complexity. One important aspect noticeable from the results is the performance of the classifier with dimensional and dimensionless features. The average accuracy of the classifiers with NDN features (with a maximum accuracy of ~97%) is significantly higher for kNN classifier in comparison with the same classifier performance with dimensional features (with a maximum accuracy of ~91%). In addition, F1-score (increase of 8% as reported for kNN) which determines the harmonic mean of precision and recall is higher in NDN as compared to DN which shows better capability of NDN in gait studies. However, as observed from Table 1, the other two classifiers have same or marginally smaller accuracies. The possible reason for this could be the sample size and demography of trial subjects. To observe noticeable differences owing to dimensional normalization, a big data set with maximum intersubjective demographic variations needs to be considered. The presented method has shown promising performance for binary gait classification. However, for a gait analysis system to be widely acceptable, it should be able to classify and recognize different degrees of disorders. The final aim of the undertaken work is to validate a method for multiclass gait disorder classification based on the severity grading of the individual. A smart tool that can classify or group patients according to the disorder scales (e.g. Hoehn and Yahr scale for PD patients) has tremendous application in clinical gait analysis. The system along with providing an automated reference about the status of the patient will be helpful in tracking the progress of the patient on regular basis. The outcome of rehabilitation therapy can be monitored with ease and will immensely benefit the clinician in saving efforts and time. This requires training classifiers with large volume of healthy as well as patient data to take into account of various gait disparities that could arise. However, such work needs more extensive study with large and varied population group, and work is in progress in that direction.
References 1. Ferber, R., et al.: Gait biomechanics in the era of data science. J. Biomech. 49(16), 3759–3761 (2016) 2. Qin, L.-y., Ma, H., Liao, W.-H.: Insole plantar pressure systems in the gait analysis of poststroke rehabilitation. In: 2015 IEEE International Conference on Information and Automation. IEEE (2015) 3. Tahir, A.M., et al.: A systematic approach to the design and characterization of a smart insole for detecting vertical ground reaction force (vGRF) in gait analysis. Sensors 20(4), 957 (2020) 4. Liu, T., Inoue, Y., Shibata, K.: A wearable ground reaction force sensor system and its application to the measurement of extrinsic gait variability. Sensors 10(11), 10240–10255 (2010) 5. Caldas, R., et al.: A systematic review of gait analysis methods based on inertial sensors and adaptive algorithms. Gait Posture 57, 204–210 (2017)
500
R. Das et al.
6. Muro-De-La-Herran, A., Garcia-Zapirain, B., Mendez-Zorrilla, A.: Gait analysis methods: an overview of wearable and non-wearable systems, highlighting clinical applications. Sensors 14(2), 3362–3394 (2014) 7. Tao, W., et al.: Gait analysis using wearable sensors. Sensors 12(2), 2255–2283 (2012) 8. Singh, J., Singh, P., Malik, V.: Effect of intrinsic parameters on dynamics of STN model in parkinson disease: a sensitivity-based study. In: Soft Computing: Theories and Applications, pp. 417–427. Springer, Berlin 9. Hsu, Y.-L., et al.: Gait and balance analysis for patients with Alzheimer’s disease using an inertial-sensor-based wearable instrument. IEEE J. Biomed. Health Inf. 18(6), 1822–1830 (2014) 10. Marquis, S., et al.: Independent predictors of cognitive decline in healthy elderly persons. Arch. Neurol. 59(4), 601–606 (2002) 11. Lakany, H.: Extracting a diagnostic gait signature. Pattern Recogn. 41(5), 1627–1637 (2008) 12. Thanasoontornrerk, R., et al.: Tree induction for diagnosis on movement disorders using gait data. In: 2013 5th International Conference on Knowledge and smart technology (KST). IEEE (2013) 13. Begg, R.K., Palaniswami, M., Owen, B.: Support vector machines for automated gait classification. IEEE Trans. Biomed. Eng. 52(5), 828–838 (2005) 14. Williams, G., et al.: Classification of gait disorders following traumatic brain injury. J. Head Trauma Rehabil. 30(2), E13–E23 (2015) 15. Mannini, A., et al.: A machine learning framework for gait classification using inertial sensors: application to elderly, post-stroke and Huntington’s disease patients. Sensors 16(1), 134 (2016) 16. Begum, S.V., Rani, M.P.: Recognition of neurodegenerative diseases with gait patterns using double feature extraction methods. In: 2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS). IEEE (2020) 17. Joyseeree, R., Abou Sabha, R., Müller, H.: Applying machine learning to gait analysis data for disease identification. Stud. Health Technol. Inf. 210, 850–854 (2015) 18. Tahir, N.M., Manap, H.H.: Parkinson disease gait classification based on machine learning approach. J. Appl. Sci. (Faisalabad) 12(2), 180–185 (2012) 19. Das, R., Hooda, N., Kumar, N.: A Novel Approach for real-time gait events detection using developed wireless foot sensor module. IEEE Sens. Lett. 3(6), 1–4 (2019) 20. Hooda, N., Das, R., Kumar, N.: Fusion of EEG and EMG signals for classification of unilateral foot movements. Biomed. Signal Process. Control 60, 101990 (2020) 21. Teufl, W., et al.: Towards inertial sensor based mobile gait analysis: Event-detection and spatiotemporal parameters. Sensors 19(1), 38 (2019) 22. Iosa, M., et al., The golden ratio of gait harmony: repetitive proportions of repetitive gait phases. BioMed Res. Int. 2013 (2013) 23. Bła˙zkiewicz, M., Wiszomirska, I., Wit, A.: Comparison of four methods of calculating the symmetry of spatial-temporal parameters of gait. Acta Bioeng. Biomech. 16(1) (2014) 24. Goldberger, A.L., et al.: Fractal dynamics in physiology: alterations with disease and aging. Proc. Natl. Acad. Sci. 99(suppl 1), 2466–2472 (2002) 25. Hof, A.L.: Scaling gait data to body size. Gait Posture 3(4), 222–223 (1996) 26. Kamruzzaman, J., Begg, R.K.: Support vector machines and other pattern recognition approaches to the diagnosis of cerebral palsy gait. IEEE Trans. Biomed. Eng. 53(12), 2479–2490 (2006) 27. Wahid, F., et al.: Classification of Parkinson’s disease gait using spatial-temporal gait features. IEEE J. Biomed. Health Inf. 19(6), 1794–1802 (2015) 28. Beauchet, O., et al.: Guidelines for assessment of gait and reference values for spatiotemporal gait parameters in older adults: the biomathics and Canadian gait consortiums initiative. Front. Hum. Neurosci. 11, 353 (2017)
Real-Life Applications of Soft Computing in Cyber-Physical System: A Compressive Review Varsha Bhatia, Vivek Jaglan, Sunita Kumawat, and Kuldeep Singh Kaswan
Abstract Cyber-physical system (CPS) is one of the most prominent emerging systems that are used for connecting the physical and cyber space. Recent advances in wireless sensor networks (WSN) and network and embedded systems have paved way for development of CPS for varied domains. CPS application domain includes health care, smart cities, transportation, environmental monitoring, and manufacturing. CPS requires smart and intelligent sensors with computation, communication, and control capabilities. Cyber-physical systems are complex heterogeneous distributed system, which require robust, efficient, and cost-effective solutions. Soft computing is one of the techniques capable of dealing with complex system, as it does not require strict mathematical definitions. Soft computing is a collection of evolutionary, artificial neural network, fuzzy logic, and Bayesian network. The power of soft computing is its strong learning ability and tolerance to uncertainty and imprecision. This work presents a comprehensive review of soft computing-based approaches used in diverse CPS applications. Keywords Cyber-physical system · Soft computing · Fuzzy logic · Evolutionary algorithm · Swarm intelligence
1 Introduction CPS facilitates interfacing between physical and cyber world with help of sensors and actuators. The inputs obtained from sensors and actuators help out in real-time V. Bhatia (B) · S. Kumawat Amity University Haryana, Gurugram, India S. Kumawat e-mail: [email protected] V. Jaglan Graphic Era Hill University, Dehradun, India K. S. Kaswan Galgotias University, Greater Noida, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_41
501
502
V. Bhatia et al.
decision making. With the help of a series of control processes, these decisions can be implemented in the physical world. An ideal CPS can effectively monitor, control, and manage physical process. A CPS comprises of heterogeneous devices working on different communication technologies. For example, a CPS can consist of resource constraint wireless sensor nodes, mobile devices equipped with GSM, or workstations operating on wired network. A CPS comprises of sensors, network, control centre, and control system. The control centre process the collected data an act as decision support system, while control system is responsible for implementing the instructions or decision. WSN are suitable for CPS owing to their capability of real-time operations, but limited resource and dynamic topology pose great challenge [1]. As CPS operates in distributed manner, there might be uncertainty in operations due to heterogeneity in underlying network. CPS encompasses several overlapping network and nodes can join or leave network dynamically. If CPS comprises of wearable mobile sensors or devices the complexity increases.
1.1 Characteristics of CPS • CPS deeply integrates computation with physical process and all the embedded devices have cyber capabilities. • CPS devices can be resource constraint in terms of computing capabilities and bandwidth. • CPS is a product of integration of different distributed systems comprising devices connected by wired, wireless, Wi-Fi, GSM, and Bluetooth technologies [2]. • CPS-based systems require high degree of automation and advanced feedback control technologies. • CPS-based systems are large-scale complicated systems, hence, they must be capable of reconfiguring and reorganizing themselves. CPS-based system must be adaptable to change rules and command as per the task requirement. • For any CPS-based system security and reliability are necessary. • CPS is rapidly expanding in terms of application in various domains. Each of these applications has their own set of features that are essential. But there are some fundamental features that are mandatory in order to implement CPS for any application. These fundamental features are real-time constraint, integration of diverse communication protocols, dealing with dynamic topology, and remote Internet access [3]. CPS applications domains include healthcare, smart home, buildings and cities, transport, automated control of refiners, advanced weapon control systems, and smart grids [4].
Real-Life Applications of Soft Computing in Cyber-Physical …
503
2 Soft Computing Techniques Soft computing techniques are group of methodologies which consist of fuzzy logic, neural network, evolutionary computation, and probabilistic computing [5]. SC techniques are capable of solving complex real-life problem which cannot be solved using conventional approach. Apart from this, it offers an economical solution to construct intelligent system. The strong point of SC is better knowledge representation, acquisition, and processing [6]. Soft computing techniques have their own strengths and weaknesses.
2.1 Evolutionary Algorithms (EA) EA are population-based metaheuristics algorithms inspired by natural evolution. Genetic algorithm, differential evolution, and artificial immune system are all evolutionary algorithms. All these algorithms operate by maintaining population as potential solutions and based on the optimization criteria evolve through these solutions over time. The stopping criterion is when algorithm has reached maximum runtime or has reached a predefined threshold performance value. The advantage of EA is that it can preserve several solutions for the next iterations to avoid loss of best solution. EA are flexible and robust enough to deal with complex and noisy real-world problems.
2.2 Swarm Intelligence Algorithms (SI) SI algorithms are inspired by collective behaviour in nature such as nest building foraging in insects and swarming herding and flocking in vertebrates. Some of swarm intelligence algorithms are ant colony optimization (ACO), fire fly, bacterial foraging algorithms, cuckoo search, particle swarm optimization (PSO), etc. The algorithm is widely used combinatorial and continuous optimization [7]. These algorithms have been applied successfully to numerous diverse problem domains. PSO is popular population-based method used for optimization problems in diverse domains. Tuning parameters play a vital role in covering the entire search space and convergence of the algorithm [8].
2.3 Fuzzy Logic It helps in representing human knowledge and processing in a specific domain of application. Fuzzy logic system comprises of fuzzy rules and fuzzy interference
504
V. Bhatia et al.
system. The most important task in applying fuzzy logic is defining a set linguistic rule that govern the system to be modelled. Fuzzy logic systems have good knowledge representation and acquisition quality but are less adaptive in nature. Fuzzy systems are suitable for designing nonlinear control systems [9].
2.4 Learning Systems It comprises of two famous methods artificial neural networks (ANN) and support vector machine (SVM) ANN is used when the model interpretability is not important [10]. ANN have capability of detecting trends and extracting meaningful information to draw inference from imprecise data and complex data, and ANN has flexible learning capabilities and suitable for developing nonlinear models using input-output data. Support vector machine is mainly applied to work out binary classification problem, where the learning problem is quadratic in nature [7].
3 Literature Review The work done in [11] presents a characteristic and concept of CPS and the obstacles in research of CPS. A brief review of application of soft computing methods in bioinformatics is presented [7]. The review discusses various soft computing methods used in domains of sequence alignment and single nucleotide polymorphism problem. A detailed review on application of soft computing methods for CPS in health care is presented in [12]. The work presents approaches used for predicting and diagnosis of disease. Identifying which techniques are suitable for designing clinical support system. A broad review on soft computing techniques for CPS dependability is covered in [13]. The work covers the strength and weaknesses of soft computing techniques and future challenges for designing dependable CPS are discussed. It provides a detailed literature review of work done to improve reliability and dependability attributes of CPS using various soft computing techniques. This review covers the publications from year 2012 to 2020 related to application of soft computing techniques in cyber-physical systems. The review comprises of two sections, the first section covers soft computing techniques in diverse CPS application domains and the second section presents some publications where soft computing techniques are used to enhance working of CPS.
Real-Life Applications of Soft Computing in Cyber-Physical …
505
4 Soft Computing Techniques in CPS Application Domains 4.1 Waste Management System The continuous increase in urban population brings tremendous pressure on urban resources and environment. Intelligent waste collection and transportation system are mandatory for governance of urban environmental development and smart communities. Soft computing method for multi-objective problem for determining the position of garbage accumulation points in smart cities is proposed [14]. The work presents single as well as multi-objective heuristic based on page rank and multi-objective evolutionary algorithm to improve cost-effectiveness and accessibility of installing garbage bins in urban scenarios. The algorithms were based on three criteria to improve QOS first to capitalize on the total amount of waste collected, reduce cost of operation, and proximity of installed bins with region of interest. Intelligent waste management system and optimal collection path are proposed using rapid exploration of random trees [15]. The waste removal system is based on route operation system (ROS) robot operating system and rapid exploration-based random trees for path planning. The process is initiated by individual bins when they are full, and they send message to waste collection vehicle. In response to the message waste collection, vehicle collects and transports the waste to the desired location.
4.2 Smart Logistic System CPS is applied for logistic and transport systems. Currently, logistic systems are progressing in the direction of intelligence. In order to design an intelligent logistic system, a perfect synchronization between logistic operation and logistic information system is required. Smart logistic path selection is important part of logistic scheduling. Path selection is necessary for enhancing logistic distribution efficiency, proper resource utilization, and cost reduction. Smart logistic path for CPS decision method based on soft computing algorithms is proposed [16]. The work analyses ant colony, genetic algorithm, and simulated annealing algorithm for logistic path selection for CPS with IoT. The algorithms are compared based on shortest path distance and convergence speed under these algorithms. In terms of path optimization and convergence speed, ant colony algorithm performed better than two other algorithms. Another important problem in transportation and supply chain management is capacitated vehicle routing problem (CVRP). Smart logistics also involves efficient and cost-effective solution for capacitated vehicle to deliver orders. The aim of CVRP is to find a cost-effective route for group of vehicles starting from warehouse to deliver order to various dispersed locations in a large region.
506
V. Bhatia et al.
An order aware hybrid genetic algorithm (OHGA) is proposed [17]. The effectiveness of OHGA is due to improved population initialization approach and problemspecific cross over points. The OHGA converges quickly for order instances ranging from 32 to 100. CVRP problem can also be utilize to solve diverse problems like location routing issues, bicycle renting issues, essential medical supplies in emergency situations, charging of electric vehicles, UAV routing, etc. [17]. An initial routing adjustment model and traffic route adjustment model are proposed based on learnable genetic algorithm [18]. The parameters considered are road capacity, vehicle capacity, maximum travelling distance, and customer time window. Both static and dynamic model of cyber-physical logistic system is proposed.
4.3 Cyber-Physical Vehicle System (CPVS) BeeJamA is the distributed and self-adaptive routing protocol based on swarm intelligence approach [19]. The algorithm is designed for large scale cyber-physical system. The approach provides directions to drivers before each intersection. The objective of BeeJamA routing algorithm is to minimize the local and global travel times, avoid traffic congestion and reduce environmental pollution. The work [20] uses a soft fuzzy approach to enhance the efficiency of solar panel to give maximum power output. Traditional controllers are replaced by fuzzy logic controller (FLC). FLC are more suitable for complex real-life system as they are based on linguistic rules to perform control operations and do not require detail knowledge of the system. The work in [21] proposes a CPS-based control framework for a plug in hybrid aircraft towing tractor. The energy data of the tractor is first collected, analysed, and then it is modelled for online particle swarm optimization programming. The final OSIP control programme is developed using chaos-enhanced accelerated particle swarm optimization (CPSO) algorithm. Another application of CPS is intelligent transportation to improve safety and comfort in road transport. An intelligent crowd monitoring system is proposed using fuzzy logic [22]. The proposed system monitors large amount of crowd in buses based on input parameters obtained from onboard sensors. The system monitors the crowd, allocates another bus if crowd exceeds the bus capacity and informs the users. A ubiquitous city information platform Wi-City plus is proposed with decision support [23]. The platform system uses data collected from social platform and networked devices. The research trends in CPVS are moving in the direction of green transportation and researchers are exploring electric vehicle and its environment as new research area. Among various issues related to electric vehicle (EV) the most critical is the safety. Braking system is the most safety critical system in EV, a lot of research in development of devices, control algorithms, and safety standards have been performed. The work in [24] implements ANN-based machine learning framework to quantitatively estimate the brake pressure of EV. Swarm intelligence is also
Real-Life Applications of Soft Computing in Cyber-Physical …
507
used for autonomous driving, electric grids, medical applications, environmental, monitoring, and emergency response [25].
4.4 Smart Industries Smart production system is an important part of smart industries. The work in [26] presents a multi-objective optimization problem of material flow system in smart production system. Genetic algorithm is used for selection of a pareto front of optimal layouts. The CPS environment is based on developing effective solutions to make resources collaborate for fulfilment of production process requirements. In recent times, various methodologies were proposed for dynamic process planning in CPS. A multi-agent system architecture based on particle swarm and binary differential evolution was proposed for dynamic teaming of resources like robots and machines in CPS [27]. For development of CPS for smart manufacturing, a model of conveyor belt was simulated using SVM [28]. The goal was to develop flexible system for detecting abnormalities and reconfiguring CPS after occurrence of abnormal condition. A CPS and big data-enabled machining optimization process is presented [29]. The optimization process considers dynamic and ageing factors of machine tool system during manufacturing lifecycles. A evolutionary fruit fly algorithm is used to schedule and rescheduling to improve system adaptability during manufacturing lifecycles.
4.5 Smart Homes and Buildings Energy efficiency is a prime concern in smart homes and buildings. Smart homes or buildings are different from traditional home automation systems. A wide range of functions is supported to manage the energy and comfort space. Some of them are heating control, air conditioning control, energy metering, emergency lighting, air quality control, window control, solar shading control, occupancy detection, etc. A hybrid intelligent device for smart home is proposed to manage space heating device [30]. CPS for smart buildings provide intelligent building automated system, HVAC system, lighting, elevators, and other electrical subsystem. The work [31] presents CPS architecture for occupancy based wireless networked intelligent building automation. The architecture is apt for large buildings with varying occupancy. The objective is to minimize energy consumption and maximize the comfort of occupants. In order to minimize energy consumption, prime zones are selected using fuzzy interference engine. The non-dominated genetic algorithm is used for tackle with trade-off between the occupant comfort and energy consumption. Feedback-based CPS is applied to control actuators in the system [32]. The feedback system considers environmental variation and gap between set point and process variable. The control system consists of a central control module, which acts as manager and provides information about operators to its actuator controller.
508
V. Bhatia et al.
The control systems consist of a fuzzy controller based on fuzzy rules to control the actuators. ANN-based approach is used for intelligent building control [33]. The cyber security and physical fault detection are also important in CPS. The work [34] presents a Bayesian network-based approach to find cyber and physical anomalies from unlabelled data. The Bayesian approach was evaluated on the unlabelled data of commercial building system.
4.6 Medical CPS Medical CPS comprises of sensors, actuators, smart medical devices and decision support system. The system generates a large amount of data, which has to be stored and analysed for diagnosis of diseases. The security and confidentiality of data are also important. The vital factor in functioning of MCPS is the network life. Generally, MCPS data are stored on cloud or fog due to large volume of data. Edge devices are used to decrease the energy consumption and execution time of the application. A hybrid whale PSO algorithm is proposed to preserve power of edge devices [35].
5 Soft Computing for Enhancing CPS Operations 5.1 Computation Off Loading for CPS and IoT Environment A hybrid adaptive genetic and particle swarm (AGA–PSO)-based algorithm is designed for scheduling off loadable components in an application in order to minimize energy consumption and execution time of an application [36]. The works basically help in offloading computational intensive task or module to edge devices instead off performing them on cloud. Edge computing helps in implementing distributed approach to store and process data proximate to the end users. The result shows that hybrid algorithm AGA–PSO outperforms standard GA and PSO algorithms and results depict that offloading is beneficial.
5.2 Distributed Task Scheduling for CPS CPS comprises of physical and cyber subsystem both operate in different time domain and response time. Scheduling in CPS is a challenging task as both cyber and physical environment are unpredictable. The work in [37] proposes a task optimization and scheduling approach for distributed CPS using improved ant colony algorithm. The improved algorithm applies different pheromone change methods and heuristic function. The simulation results depict improvements in local search ability, effectiveness,
Real-Life Applications of Soft Computing in Cyber-Physical …
509
scalability, and adaptability. An adaptive neuro fuzzy inference system-based scheduler is proposed for CPS [38]. The neuro fuzzy inference system is designed by incorporating neural network in fuzzy controller.
5.3 QOS Improvement in CPS Routing in CPS is a major parameter for improving communication quality of network. QOS multicast routing algorithm by combining genetic and ant colony algorithm was proposed [39]. The work also proposes a four layer architecture of CPS. The goal of algorithm is enhance real-time performance and reliability of system transmission. The simulation results depict that CPSGAAD performs better in terms of delivery ratio and transmission delay. Competitive swarm optimizer [40] is new PSO-based algorithm which works on inter-particle competition mechanisms to solve gateway deployment problem in wireless mesh network. The wireless mesh network is widely used in CPS environment. The novel CSO algorithm reduces the computational cost and avoids trapping for local optimal solution. The algorithm helps in proper deployment of nodes in the given geometric plane. The aim is to reduce the routing distance between access point and gateway to improve the communication quality of network. A linear approximation fuzzy model is proposed for fault detection in CPS used for supply chain management [41]. The method is suitable for IoT-assisted CPS and detects and classifies fault on the basis of rough set approximation and fuzzy membership functions. A hierarchical CPS architecture is proposed to utilize system resources improve QOS parameters [42]. The study is performed in a case of unmanned vehicles with WSN navigation. Particle swarm algorithm is used for optimization of system with resource constraint. The summary of application of soft computing techniques for CPS is presented below. Table 1 represents specific areas and subareas where recent works have been done.
6 Conclusion and Future Work This work presents a review of some of the recent real-life application of soft computing in CPS environments. The review describes the specific subareas where soft computing techniques have been successfully implemented. The major drawbacks of using soft computing techniques are requirement of large computer resources and computational time. This limits their application for online CPS systems. On the other hand, soft computing techniques are a boon to the situation where impreciseness exists and adapting to system environment or fine-tuning of system model are essential. A greater involvement of soft computing techniques to serve various domains of CPS is foreseen.
510
V. Bhatia et al.
Table 1 Soft computing technique for diverse CPS application domains S. No.
Soft computing technique
Areas of application
Subarea
References
1
Genetic algorithm
Smart logistic system, road Vehicle routing transport
[17, 18]
Production system (manufacturing environment)
Material flow system
[26]
Intelligent building automation system
Energy consumption optimization
[31]
2
Genetic CPS algorithm and ant colony
QOS multi-cast routing
[39]
3
Competitive swarm optimization
CPS information transmission
Gateway deployment problem
[40]
4
Particle swan optimization
Smart home
Home automation (temperature control)
[30]
Intelligent manufacturing
Dynamic cooperation between machines and robots
[27]
Resource management
Unmanned vehicle and WSN navigation
[42]
5
Hybrid whale PSO
E-Healthcare
Lifetime [35] improvement by preserving power of edge devices
6
Online swam intelligent programming (OSIP)
Hybrid vehicular CPS
Energy saving
[21]
7
Adaptive neural fuzzy interface system (AFNIS)
Resource allocation in CPS system
Dynamic scheduling of computational task
[38]
8
Fuzzy logic
Power consumption in CPS Energy [32] economizing in heating and cooling Intelligent transportation Cyber-physical system (ITCPS)
Automatic crowd [22] monitoring in buses, to avoid bus bunching (continued)
Real-Life Applications of Soft Computing in Cyber-Physical …
511
Table 1 (continued) S. No.
9
Soft computing technique
ANN
Areas of application
Subarea
References
Roadways transport system (RTS) and autonomous automotive CPS (A2CPS)
Autonomous vehicle (AV) safety
[43]
Smart city
Customer satisfaction for services
[23]
Intelligent building environment
Intelligent [33] environment control
Cyber-physical vehicle system (CPVS)
State estimation of braking pressure system
[24]
Smart manufacturing/production
Energy saving in machining process in machining companies
[29]
[28]
10
Support vector machine (SVM)
Smart factory
Reconfiguration of manufacturing system based on abnormity
11
Swarm intelligence
Smart traffic
Autonomous [25] driving emergency response
Smart energy grids
Swam-based energy [25] grid for scheduling renewal energy generation and consumption
12
Ant colony
Cyber-physical logistics system, road transport
Smart logistic path selection
[16]
13
Rough sets and gene expression programming
Cyber-physical power system
Model for accurate prediction for security risk in power system
[44]
14
Evolutionary
Smart cities, intelligent Location of garbage [14] waste management system accumulation points
13
Fuzzy rough sets Supply chain sector
Identification of [41] defective component in cyber systems of supply chain management
14
Bayesian network
Impact of cyber threat on physical process safety
Industrial CPS
[45]
(continued)
512
V. Bhatia et al.
Table 1 (continued) S. No.
Soft computing technique
Areas of application
Subarea
References
Smart grid/smart building
Reactive and mitigating measure for anomalies in CPS due to cyber attack or operational faults
[34]
References 1. Bhatia, V., Kumawat, S., Jaglan, V.: Comparative study of cluster based routing protocols in WSN. Int. J. Eng. Technol. 7, 171–174 (2017). https://doi.org/https://doi.org/10.14419/ijet. v7i1.2.9045. 2. Wan, J., Yan, H., Suo, H., Li, F.: Advances in Cyber-Physical Systems Research. KSII Trans. Internet Inf. Syst. 5, 1891–1908 (2011) 3. Jabeur, N., Sahli, N., Zeadally, S.: Enabling cyber physical systems with wireless sensor networking technologies, multiagent system paradigm, and natural ecosystems. Mobile Inf. Syst. 2015, 1–15 (2015). https://doi.org/10.1155/2015/908315 4. Brazell, J.: The Need for a Transdisciplinary Approach to Security of Cyber Physical Infrastructure. Presented at the July 1 (2014). https://doi.org/10.1007/978-1-4614-7336-7-2 5. Bonissone, P., Chen, Y.-T., Goebel, K., Khedkar, P.: Hybrid soft computing systems: industrial and commercial applications. Proc. IEEE 87, 1641–1667 (2000). https://doi.org/10.1109/5. 784245 6. Dote, Y., Ovaska, S.J.: Industrial applications of soft computing: a review. Proc. IEEE 89, 1243–1265 (2001). https://doi.org/10.1109/5.949483 7. Karlik, B.: Soft computing methods in bioinformatics: a comprehensive review. Math. Comput. Appl. 18, 176–197 (2013). https://doi.org/10.3390/mca18030176 8. Bhatia, V.: Jaglan, V, Kumawat, S., et al.: A hidden markov model based prediction mechanism for cluster head selection in WSN. Int. J. Adv. Sci. Technol. 28, 585–600 (2019) 9. Ibrahim, D.: An overview of soft computing. Procedia Comput. Sci. 102, 34–38 (2016). https:// doi.org/10.1016/j.procs.2016.09.366 10. Giri, P., Gajbhiye, A.: ANN-based modeling for flood routing through gated spillways. Presented at the January 1 (2018). https://doi.org/10.1007/978-981-10-5687-1_22 11. Liu, Y., Peng, Y., Wang, B., Yao, S., Liu, Z.: Review on cyber-physical systems. IEEE/CAA J. Automatica Sin. 4, 27–40 (2017). https://doi.org/10.1109/JAS.2017.7510349 12. Gambhir, S., Malik, S.K., Kumar, Y.: Role of soft computing approaches in healthcare domain: a mini review. J. Med. Syst. (2016). https://doi.org/10.1007/s10916-016-0651-x 13. Atif, M., Latif, S., Ahmad, R., Kiani, A.K., Qadir, J., Baig, A., Ishibuchi, H., Abbas, W.: Soft computing techniques for dependable cyber-physical systems. IEEE Access. 7, 72030–72049 (2019). https://doi.org/10.1109/ACCESS.2019.2920317 14. Toutouh, J., Rossit, D., Nesmachnow, S.: Soft computing methods for multiobjective location of garbage accumulation points in smart cities. Ann. Math. Artif. Intell. 88, 105–131 (2020). https://doi.org/10.1007/s10472-019-09647-5 15. Zhang, Q., Li, H., Wan, X., Skitmore, M., Sun, H.: An intelligent waste removal system for smarter communities. Sustainability. 12, 6829 (2020). https://doi.org/10.3390/su12176829 16. Zhang, N.: Smart logistics path for cyber-physical systems with Internet of Things. IEEE Access. 6, 70808–70819 (2018). https://doi.org/10.1109/ACCESS.2018.2879966 17. Lin, N., Shi, Y., Zhang, T., Wang, X.: An Effective order-aware hybrid genetic algorithm for capacitated vehicle routing problems in Internet of Things. IEEE Access 7, 86102–86114 (2019). https://doi.org/10.1109/ACCESS.2019.2925831
Real-Life Applications of Soft Computing in Cyber-Physical …
513
18. Lai, M., Yang, H., Yang, S., Zhao, J., Xu, Y.: Cyber-physical logistics system-based vehicle routing optimization. J. Ind. Manag. Optim. 10, 701 (2014). https://doi.org/10.3934/jimo.2014. 10.701 19. Wedde, H.F., Senge, S.: BeeJamA: a distributed, self-adaptive vehicle routing guidance approach. IEEE Trans. Intell. Transp. Syst. 14, 1882–1895 (2013). https://doi.org/10.1109/ TITS.2013.2269713 20. Kumar, N.K., Gandhi, V.I.: Implementation of fuzzy logic controller in power system applications. J. Intell. Fuzzy Syst. 36, 4115–4126 (2019). https://doi.org/10.3233/JIFS-169971 21. Zhou, Q., Zhang, Y., Li, Z., Li, J., Xu, H., Olatunbosun, O.: Cyber-physical energy-saving control for hybrid aircraft-towing tractor based on online swarm intelligent programming. IEEE Trans. Industr. Inf. 14, 4149–4158 (2018). https://doi.org/10.1109/TII.2017.2781230 22. Selvanayaki, P.S., KumarKaliappan, V.: Intelligent transportation cyber physical system toward comfort and safety perspective using fuzzy logic. J. Phys: Conf. Ser. 1362, 012061 (2019). https://doi.org/10.1088/1742-6596/1362/1/012061 23. Costanzo, A., Faro, A., Giordano, D., Spampinato, C.: Implementing cyber physical social systems for smart cities: a semantic web perspective. In: 2016 13th IEEE Annual Consumer Communications Networking Conference (CCNC), pp. 274–275 (2016). https://doi.org/10. 1109/CCNC.2016.7444777. 24. Lv, C., Xing, Y., Zhang, J., Cao, D.: State estimation of cyber-physical vehicle systems. In: Cyber-Physical Vehicle Systems: Methodology and Applications, pp. 29–40. Morgan & Claypool (2019) 25. Schranz, M., Di Caro, G.A., Schmickl, T., Elmenreich, W., Arvin, F., Sekercio˘ ¸ glu, A., Sende, M.: Swarm intelligence and cyber-physical systems: concepts, challenges and future trends. Swarm Evol. Comput. 60, 100762 (2021). https://doi.org/10.1016/j.swevo.2020.100762 26. Shchekutin, N., Overmeyer, L., Shkodyrev, V.: Layout Optimization for Cyber-Physical Material Flow Systems Using a Genetic Algorithm. Presented at the January 1 (2020). https://doi. org/10.1007/978-3-030-34983-7_4 27. Hsieh, F.: Collaboration of machines and robots in cyber physical systems based on evolutionary computation approach. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2019). https://doi.org/10.1109/IJCNN.2019.8852421 28. Shin, H.-J., Cho, K.-W., Oh, C.-H.: SVM-Based Dynamic Reconfiguration CPS for Manufacturing System in Industry 4.0. https://www.hindawi.com/journals/wcmc/2018/5795037/. Last accessed 26 Nov 2020. https://doi.org/10.1155/2018/5795037 29. Liang, Y.C., Lu, X., Li, W.D., Wang, S.: Cyber Physical system and big data enabled energy efficient machining optimisation. J. Cleaner Prod. 187, 46–62 (2018). https://doi.org/10.1016/ j.jclepro.2018.03.149 30. Zhu, J., Lauri, F., Koukam, A., Hilaire, V., Lin, Y., Liu, Y.: A hybrid intelligent control based cyber-physical system for thermal comfort in smart homes. Int J Ad Hoc Ubiquitous Comput. 30, 199 (2019). https://doi.org/10.1504/IJAHUC.2019.098863 31. Reena, M., Mathew, D., Jacob, L.: An occupancy based cyber-physical system design for intelligent building automation. Math. Prob. Eng. 2015, 1–15 (2015). https://doi.org/10.1155/ 2015/132182 32. Cheng, S.-T., Chou, J.-H.: Fuzzy control to improve energy-economizing in cyber-physical systems. Appl. Artif. Intell. 30, 1–15 (2016). https://doi.org/10.1080/08839514.2015.1121065 33. Ma, B., Li, N., Wang, Y., Qiu, H., Zhang, W., Fu, J.: Cyber physical system based on artificial neural way. Shenyang Jianzhu Daxue Xuebao (Ziran Kexue Ban)/J. Shenyang Jianzhu Univ. (Nat. Sci.). 28, 375–379 (2012) 34. Krishnamurthy, S., Sarkar, S., Tewari, A.: Scalable Anomaly Detection and Isolation in Cyber-Physical Systems Using Bayesian Networks. Presented at the ASME 2014 Dynamic Systems and Control Conference December 19 (2014). https://doi.org/https://doi.org/10.1115/ DSCC2014-6365 35. Majumdar, A., Laskar, N., Biswas, A., Sood, S.K., Baishnab, K.: Energy efficient e-healthcare framework using HWPSO-based clustering approach. J. Intell. Fuzzy Syst. 36, 1–13 (2019). https://doi.org/10.3233/JIFS-169957
514
V. Bhatia et al.
36. R R, E., Reedy, M., Umamakeswari, A.: A new hybrid adaptive GA-PSO computation offloading algorithm for IoT and CPS context application. J. Intell. Fuzzy Syst. 36, 1–9 (2019). https://doi.org/10.3233/JIFS-169970 37. Yi, N., Xu, J., Yan, L., Huang, L.: Task optimization and scheduling of distributed cyber– physical system based on improved ant colony algorithm. Future Generation Comput. Syst. 109, 134–148 (2020). https://doi.org/10.1016/j.future.2020.03.051 38. Padmajothi, V., Iqbal, J.L.M.: Adaptive neural fuzzy inference system-based scheduler for cyber–physical system. Soft Comput. 24, 17309–17318 (2020). https://doi.org/10.1007/s00 500-020-05020-5 39. Gao, Z., Ren, J., Wang, C., Huang, K., Wang, H., Liu, Y.: A genetic ant colony algorithm for routing in CPS heterogeneous network. Int. J. Comput. Appl. Technol. 48, 288 (2013). https:// doi.org/10.1504/IJCAT.2013.058351 40. Huang, S., Tao, M.: competitive swarm optimizer based gateway deployment algorithm in cyber-physical systems. Sensors. 17, 209 (2017). https://doi.org/10.3390/s17010209 41. Wang, L., Zhang, Y.: Linear approximation fuzzy model for fault detection in cyber-physical system for supply chain management. Enterprise Inf. Sys. 1–18 (2020). https://doi.org/10.1080/ 17517575.2020.1791361 42. Yan, H.H., Wan, J.F., Suo, H.: Adaptive resource management for cyber-physical systems. https://www.scientific.net/AMM.157-158.747. Last accessed 26 Nov 2020. https://doi.org/10. 4028/www.scientific.net/AMM.157-158.747 43. Vismari, L.F., Camargo, J.B., Naufal, J.K., Almeida, J.R. de, Molina, C.B.S.T., Inam, R., Fersman, E., Marquezini, M.V.: A Fuzzy logic, risk-based autonomous vehicle control approach and its impacts on road transportation safety. In: 2018 IEEE International Conference on Vehicular Electronics and Safety (ICVES). pp. 1–7 (2018). https://doi.org/10.1109/ICVES. 2018.8519527 44. Deng, S., Yue, D., Fu, X., Zhou, A.: Security risk assessment of cyber physical power system based on rough set and gene expression programming. IEEE/CAA J. Automatica Sin. 2, 431– 439 (2015). https://doi.org/10.1109/JAS.2015.7296538 45. Lyu, X., Ding, Y., Yang, S.: Bayesian network based C2P Risk assessment for cyber-physical systems. IEEE Access 8, 88506–88517 (2020). https://doi.org/10.1109/ACCESS.2020.299 3614
A Study on Stock Market Forecasting and Machine Learning Models: 1970–2020 Pradeepta Kumar Sarangi, Muskaan, Sunny Singh, and Ashok Kumar Sahoo
Abstract The Stock market predictions refer to the study of the historical behaviour of stock parameters and estimates the future direction. Many strategies are used to predict the stock market using specific mathematical, statistical or machine learning methods. Nowadays, machine learning methods have become extremely popular because of their ability to produce highly accurate results and predicted values. The major roles of machine learning models are to draw generalized pattern on the basis of input data and producing a desired output. This work presents a detailed review of different learning models used to predict the stock market for last 50 years (1970 to 2020). The factors considered in this work are predicting strategies, highly cited publications, performance measurement parameters and recent developments. Finally, this work looks at the prediction’s techniques used and summarizes the key success metrics used by each paper. Keywords Machine learning models · Stock price prediction · Time-series analysis · Financial forecasting
1 Introduction Machine learning is a small unit of artificial intelligence that uses a variety of probabilistic and optimization techniques which enable the machines to “learn” from previous examples and to detect patterns from complex or noisy data sets that are difficult to distinguish. ML plays a vital role in various applications such as forecasting, pattern recognition and classification/prediction tasks, etc. Forecasting is a phenomenon of knowing what may happen in the future [1]. It is a human tendency and helps in making timely decision [2]. Stock market forecasting is the approach to P. K. Sarangi · Muskaan (B) · S. Singh Chitkara University Institute of Engineering and Technology, Chitkara University, Rajpura, Punjab, India e-mail: [email protected] A. K. Sahoo Graphic Era Hill University, Dehradun, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_42
515
516
P. K. Sarangi et al.
determine the future value of a particular stock traded on an exchange. A successful stock forecasting may yield significant profit to the investors. Advances in technology have allowed the vast archives of computer systems to be analyzed. In general, the extensive use of computer-based models is studied under the title of machine learning. Therefore, the review of the financial market forecast, and application of various techniques is extremely broad and comprehensive in terms of technological innovations, new models, and methods. The purpose of this article is to highlight some of the research publications selecting key developments in machine learning that is effective in financial market forecasting. In this work, a detailed study of various work is carried out by the researchers for the period of 1970–2020. The parameters considered are as follows: (i) Year-wise publication and (ii) Various prediction techniques used in the previous work done. The main motivation of this work is to analyze the existing work and finding out the research gaps for further research in the field of Stock Market Forecasting.
2 Literature Review There are several methods used by investigators to predict the futures direction of a particular stock or stock exchange. Some of them are mathematical methods such as Regression, ARIMA and others are based on machine learning such as ANN and SVM [3]. Gurjar et al. [4] presents a paper on stock market predictions. In their paper the authors not only discussed various ways to predict stocks but also explained how machine learning strategies such as ANN can be used in stock forecasting. The daily rate of NASDAQ stock exchange using the neural network (ANN) is predicted by Moghaddam et al. [5]. The authors have used a back propagation algorithm to train neural network models. The authors have used historical stock prices and day of the week as input. Authors like Rasel et al. [6] in their work have used the SVM as an algorithm for machine learning to understand the trends of the stock market trends. The authors conducted experiments on three different stock indicators: Dhaka Stock Exchange (DSE) in Bangladesh, S&P 500 index and IBM index. Yetis et al. [7] in their work introduces a method focused on neural networks in stock market analysis using the Artificial Neural Network (ANN). The authors discussed how the NASDAQ stock could be predicted using ANNs with assigned market input parameters. Dipti Singh et al. developed ANN model to foreshow the calorific value of MSW of Uttar Pradesh. The method wad validated using MSE on the training as well as on the validated dataset. The ANN model outperforms in the prediction analysis. Ramansh et al. [32] analysed the factors that are affecting the performance of the store dataset based on the previous data. Using trend analysis technique, a detailed study has been done in understanding the trends among the customers.
A Study on Stock Market Forecasting and Machine Learning Models … 30000 25000 20000 15000 10000 5000 0
517
2815527442 27991 19321 11616 742 761 986 1586 2689
4291
Fig. 1 Most-cited publications in the range between 500 and 30,000
Year-wise Publications Count of Publications
Fig. 2 Year wise count of publications from 1970 to 2020
15 10 5 0
Years of publications considered for this work
Rahman et al. [33] has used various techniques such as k-mean, Genetic algorithm for segmentation of images and compared the proposed technique with other techniques. The proposed method outperforms better than other methods. Besides these, many other authors have significantly worked on stock market forecasting, financial analysis and machine learning applications on time series data [8–10]. Many good publications also have got high citations. Some of these are depicted in Fig. 1. Also, the count of publications over several years is given in Fig. 2.
3 Analysis and Discussion This section discusses the stock market forecast analysis based on techniques, evaluation parameters, properties, and techniques.
3.1 Based on Year-Wise Publication This section examines the various research publication years of the methods known for forecasting the stock market. Illustration 2 explains the number of academic papers written over the years from 1970 until 2020.
518
P. K. Sarangi et al.
Fig. 3 Various prediction techniques used in previous works
Prediction Techniques 6%
6% 6%
4%
29% 49%
SVM/SVR
Neural Networks
kNN
ARIMA/GARCH
Fuzzy Logic
Decision Trees/RF
3.2 Based on the Prediction Techniques In this subsection, the study is carried out based on the forecast techniques used in the stock market. The techniques used for a successful stock-market forecast are shown in Fig. 3. It is noted that SVM/SVR used 25% of the works used by ANN, 15% of the research papers. KNN employs 2% of the studies, Decision Trees employs 3% of the works, ARIMA/GARCH employs 3% and Fuzzy Logics employs 3% of the works. Consequently, NN and ANN are primarily used for the techniques of stock market analysis. Below is a brief overview of the financial market prediction literature which uses machine learning techniques. This work suggested the most appropriate techniques for the analysis of procedures, studies, observations, and experimental results on the subject suggested. And this article discusses the state-of-the-art of this literature in Table 1.
4 Major Findings • To explore whether deeper networks with similar hierarchical structures could be built to further enhance predictive power [11]. • Fine-tuning of the SVM system to achieve generalization efficiency [13]. • Focusing on residue prediction through various methods to better forecast impact and greater accuracy [14]. • Consideration of other technical parameters that can affect the ANN architecture, such as the secret layer, type of transfer feature that helps to improve the precision [18, 21]. • Introduce more fundamental analytical variables to build proposed models, such as non-quantitative data, macroeconomic policies, and using AI techniques to optimize proposed methods such as GA [16].
A Study on Stock Market Forecasting and Machine Learning Models … Table 1 Assets used in various publications
519
References
Data set
Technique used
Market
[11]
Indices
SVM, KNN
China
[12]
Indices
NN, SVM, decision tree
USA
[13]
Index
SVM, NN
USA
[14]
Indices
KNN, ARIMA USA
[15]
Indices
Decision trees, Brazil, Japan GARCH
[16]
Indices
Fuzzy logic
Taiwan, Hong Kong
[17]
Index
PSO
China
[18]
Indices
NN
Multiple
[19]
Index
NN
USA
[20]
Currency, Indices
SVM, NN
Taiwan
[21]
Index
NN, SVM
Turkey
[22]
Stocks
Neural networks, SVM
Iran
[23]
Index
SVM, NN
Korea
[24]
Stocks
SVM, KNN
USA
[25]
Indices
NN
USA
[26]
Index
NN
Spain
[27]
Stocks
NN
Taiwan
[28]
Index
SVM, NN, ARIMA
USA
[29]
Index
NN, GMM
Taiwan
[30]
Stocks
Fuzzy logic
Taiwan
[31]
Stocks
Fuzzy logic
Taiwan
• Different techniques for pre-processing the data can be used as main component analysis, such as classification theory or clustering, thereby these techniques further enhancing network classification speed and accuracy [18]. • Compare the efficiency of our predictive models with other forecasting techniques [19]. • Optimization of classification algorithm parameters using metaheuristics algorithms to boost predictive results. In addition to simple features, technical features, and textual details we can use more comprehensive features and predict short-term stock situations [22, 31].
520
P. K. Sarangi et al.
It was also found that advanced approach to optimize ANN such as ANN-GA and ANN-PSO have not been implemented by the researchers. This seems to be a potential area for further research.
5 Research Gap This study shows that machine learning techniques has been widely used in stock market forecasting. However, from the Table 1, it can easily be observed that mostly used techniques are SVM and ANN. Since, ANN is not capable of dealing with local minima problem so it is suggested that the weight matrix of ANN can be optimized by using some hybrid models such as ANN-GA or ANN-PSO. This opens a new window for researchers.
6 Conclusion and Future Directions This paper provides a systematic, objective, and quantitative review of the literature. This study identified bibliographic techniques such as surveys and used them in the systematic analysis of forecasts on the financial market using machine learning techniques. This analysis led to a description of the best practices developed in the field by the scientific literature to achieve reliable results while researching the prediction of financial markets using machine learning. As the article described the study of machine learning techniques such as ANN, RF, SVM. The paper summarized the findings of the reviewed papers, validating them critically, as well as reviewing the outcomes of previously published work and reflecting on the possible goals for submitting new works for publication. Ultimately, this work centered on the methods of prediction used and summarized the key success metrics used by each paper. There’s heavy use of neural and SVM network applications. Likewise, the majority of forecasts apply to market indices. One of the possible assumptions on the classification suggested here is to conclude that data usage will be consistent with the neural and SVM network benchmarks. The use of modern models of financial market analysis continues to provide opportunities for study, as does study into the behavior of emerging market forecasts. Though, the study has covered all aspects of stock market predictions still there may be possibilities of considering more factors affecting stock market and also the study can further be expanded by including more number of research publications.
A Study on Stock Market Forecasting and Machine Learning Models …
521
References 1. Sarangi, P.K., Singh, N., Chauhan, R.K., Singh, R.: Short term load forecasting using artificial neural network: a comparison with genetic algorithm implementation. J. Eng. Appl. Sci. 4(9), 88–93 (2009) 2. Gupta, A., Sarangi, P.K.: Electrical load forecasting using genetic algorithm based backpropagation method. J. Eng. Appl. Sci. 7(8), 1017–1020 (2012) 3. Sarangi, P.K., Sinha, D., Sinha, S.: Financial modeling using ANN technologies: result analysis with different network architectures and parameters. Indian J. Res. Capital Markets VI(1), 21–33 (2019) 4. Rasel, R.I., Sultana, N., Hasan, N.: Financial instability analysis using ANN and feature selection technique: application to stock market price prediction. In: International Conference on Innovation in Science, Engineering and Technology (ICISET) (2016) 5. Moghaddam, A.H., Moghaddam, M. H., Esfandyari, M.: Stock market index prediction using artificial neural network. J. Econ. Finance Adm. Sci. ISSN 2218-0648, Elsevier España, Barcelona, Vol. 21, Issue. 41, pp. 89–93 6. Rasel, R.I., Sultana, N., Meesad, P.: An efficient modelling approach for forecasting financial time series data using support vector regression and windowing operators. In. J. Computa. Intell. Stud. 4(2), 134–150 (2015) 7. Yetis, Y., Kaplan, H., Jamshidi, M.: Stock market prediction by using artificial neural network. In: World Automation Congress (WAC), pp. 718–722, 3–7 Aug 2014 8. Pant, M., Sarangi, P.K., Bano, S.: Future trend in Indian automobile industry: a statistical approach. Apeejay–J. Manag. Sci. Technol. 1(2), 28–32 (2014) 9. Singh, S., Sarangi, P.K.: Growth RATE of Indian spices exports: past trend and future prospects. Apeejay J. Manag. Sci. Technol. II(1), 29–34 (2014) 10. Sharma, M., Sarangi, P.K., Sinha, D., Sinha, S.: Forecasting consumer price index using neural networks models. Innov. Pract. Oper. Manag. Inf. Technol. 84–93 (2019) 11. Chen, H., Xiao, K., Sun, J., Wu, S.: A double-layer neural network framework for highfrequency forecasting. ACM Trans. Manag. Inf. Syst. (TMIS) 7(4), 1–7 (2017 Jan 12) 12. Weng, B., Ahmed, M.A., Megahed, F.M.: Stock market one-day ahead movement prediction using disparate data sources. Expert Syst. Appl. 15(79), 153–163 (2017) 13. Tay, F.E., Cao, L.: Application of support vector machines in financial time series forecasting. Omega 29(4):309–317 14. Zhang, N., Lin, A., Shang, P.: Multidimensional k-nearest neighbor model based on EEMD for financial time series forecasting. Phys. A 1(477), 161–173 (2017 Jul) 15. Bezerra, P.C., Albuquerque, P.H.: Volatility forecasting via SVR–GARCH with a mixture of Gaussian kernels. CMS 14(2), 179–196 (2017) 16. Chen, Y.S., Cheng, C.H., Tsai, W.L.: Modeling fitting-function-based fuzzy time series patterns for evolving stock index forecasting. Appl. Intell. 41(2), 327–347 (2014 Sept 1) 17. Yan, D., Zhou, Q., Wang, J., Zhang, N.: Bayesian regularisation neural network based on artificial intelligence optimization. Int. J. Prod. Res. 55(8), 2266–2287 18. Hsu, M.W., Lessmann, S., Sung, M.C., Ma, T., Johnson, J.E.: Bridging the divide in financial market forecasting: machine learners vs. financial economists. Expert Syst. Appl. 61, 215–234 (2016) 19. Laboissiere, L.A., Fernandes, R.A., Lage, G.G.: Maximum and minimum stock price forecasting of Brazilian power distribution companies based on artificial neural networks. Appl. Soft Comput. 1(35), 66–74 (2015 Oct) 20. Kara, Y., Boyacioglu, M.A., Baykan, Ö.K.: Predicting the direction of stock price index movement using artificial neural networks and support vector machines: The sample of the Istanbul Stock Exchange. Expert Syst. Appl. 38(5), 5311–5319 (2011) 21. Barak, S., Arjmand, A., Ortobelli, S.: Fusion of multiple diverse predictors in the stock market. Inf. Fusion 1(36), 90–102 (2017) 22. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001 Oct 1)
522
P. K. Sarangi et al.
23. Pai, P.F., Lin, C.S.: A hybrid ARIMA and support vector machines model in stock price forecasting. Omega 33(6), 497–505 (2005 Dec 1) 24. Kim, K.J., Han, I.: Genetic algorithms approach to feature discretization in artificial neural networks for the prediction of a stock price index. Expert Syst. Appl. 19(2), 125–132 (2000) 25. Fernandez-Rodrıguez, F., Gonzalez-Martel, C., Sosvilla-Rivero, S.: On the profitability of technical trading rules based on artificial neural networks: Evidence from the Madrid stock market. Econ. Lett. 69(1), 89–94 (2000) 26. Tsai, C.F., Hsiao, Y.C.: Combining multiple feature selection methods for stock prediction: Union, intersection, and multi-intersection approaches. Decis. Support Syst. 50(1), 258–269 (2010) 27. Enke, D., Thawornwong, S.: The use of data mining and neural networks for forecasting stock market returns. Expert Syst. Appl. 29(4), 927–940 (2005) 28. Chang, P.C., Liu, C.H., Lin, J.L., Fan, C.Y., Ng, C.S.: A neural network with a case-based dynamic window for stock trading prediction. Expert Syst. Appl. 36(3), 6889–6898 (2009) 29. Wang, Y.F.: Predicting stock price using a fuzzy grey prediction system. Expert Syst. Appl. 22(1), 33–38 (2002) 30. Wang, Y.F.: Mining stock price using a fuzzy rough set system. Expert Syst. Appl. 24(1), 13–23 (2003) 31. Kumar, P.H., Patil, S.B.: Forecasting volatility trend of INR USD currency pair with deep learning LSTM techniques. In: 2018 3rd International Conference on Computational Systems and Information Technology for Sustainable Solutions (CSITSS) 2018 Dec 20, pp. 91–9). IEEE 32. Ramansh, K., Kalra, P., Mehrotra, D.: Trend analysis for retail chain using statistical analysis system. In: Soft Computing: Theories and Applications 2020, pp. 53–62. Springer, Singapore 33. Rahman, K.F., Mukherjee, S.: Feature extraction-based segmentation of anti-personnel landmines and its optimization using genetic algorithm. In: Soft Computing: Theories and Applications 2020, pp. 321–329. Springer, Singapore
Discussion on the Optimization of Finite Buffer Markovian Queue with Differentiated Vacations M. Vadivukarasi, K. Kalidass, and R. Jayaraman
Abstract This paper examines the optimality of a single server queues where the server is permitted to take two differentiated hiatus. The inter-arrival times of arriving clients, the service times and two hiatus times are all exponentially distributed with λ, μ, α1 and α2 , respectively. The ceiling of clients admitted into the system is of L. The stationary system size distributions of the model by using probability generating functions are obtained. Optimization of the model is studied using particle swarm optimization. A few numerical arguments validating the impact of parameters pertaining to in our system on vital performance measures of the model are hosted. Keywords Differentiated Hiatus times · Steady state probabilities · Optimization
1 Introduction In reality scenario, we face many waiting situations in daily life with buffer size is of finite. In such kind of queues, owing to the system is full an arriving client forcibly diverted from entering point and to be lost. The reason to studying finite capacity queues is to minimize this quitting probability by allocating adequate buffer size. As our main aim is to discuss the transient solution of finite capacity queues with differentiated vacations, we restricted ourselves to refer only on the works on the same kind here. Further for the interested readers, refer the following: Doshi [1, 2], Tian [3], Ke et al. [4], Upadhyaya [5] for the survey papers and Takagi [6] and Tian and Zhang [7] for the monographs on vacation queues.
M. Vadivukarasi · K. Kalidass (B) Department of Mathematics, Karpagam Academy of Higher Education, Coimbatore, Tamil Nadu 641021, India R. Jayaraman Department of Mathematics, Periyar University College of Arts and Science, Pappireddipatti, Tamil Nadu 636905, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_43
523
524
M. Vadivukarasi et al.
Takagi [8] discussed the distributions of the unfinished work, the virtual waiting time and the real waiting time, etc., for M /G/1/N vacation queue where the server follows exhaustive service scheme. Zhang et al. [9] optimized the finite buffer Markovian vacation queues with two kind of clients intolerance namely balking and reneging. Ghimire and Basnet [10] studied finite capacity single server queuing system with vacations and possibility of server failures. Yue et al. [11] discussed the optimality of finite capacity Markovian vacation queues where the possibility of two types of client impatience. Yue et al. [12] analysed the waiting time of M /M /1/N multiple vacation queues with two kind of clients intolerance. Hui and Wei [13] obtained the steady-state solution of an unreliable single server queueing models with multiple vacations by employing blocked matrix technique. Kalidass et al. [14] analysed the time-dependent solution of single server Markovian queues with finite waiting space where the possibility of server breakdowns. Ibe and Isijola [15] first analysed a new type of vacation in queueing models with vacations, in which the server is allowed to take two different vacations and the span of the second type of vacations is shorter than the span of the first type of vacations. Ibe and Adakeja [16] discussed the steady-state solution of M /M /1 differentiated multiple vacation queues where the server provides service in the rates on vacation dependent. The time-dependent solution of the queues analysed in Ibe and Isijola [15] is carried out in Vijayashree and Janani [17]. Bouchentouf and Guendouzi [18] carried out the sensitivity analysis of single service provider Markovian queue with feedback, multiple differentiated hiatus, vacation interruptions and impatient clients. Subsequently, Suranga Sampath and Liu [19] explored the time-dependent solution with the impact of client impatience on the model of Ibe and Isijola [15]. Unni and Mary [20] have studied the steady-state solution of M /M /C queueing systems with type 1 and type 2 vacations. Suranga et al. [21] studied the transient solution of multiple differentiated hiatus queues with impatient clients with an application to the IEEE 802.16E power-saving mechanism. Recently, the steady-state analysis of queues with differentiated vacations under the mixed strategy of customers is discussed in [22]. It is observed that no researches listed above discuss getting a steady-state solution for the system with finite capacity and differentiated vacations. Thus, the crucial aim of the considered research is to analyse the optimality of an M /M /1/N queue with differentiated vacations. The paper is laid out as follows. The immediate next section describes the considered model and end with balance flow equations of the model. The steady-state solution of the queues depicted in Sect. 2 is acquired in Sect. 3. The discussion on the optimality of taken model is carried in Sect. 4. Some pertinent performance characteristics is derived in Sect. 5, and then the concluding notes is summarized in Sect. 6.
Discussion on the Optimization of Finite Buffer …
525
2 The System Description We consider a queueing system with the following aspects. • Flow of clients joining into the service station is a Poisson stream where the arrival rate is λ > 0. • All joined clients receive their service on a FIFO basis, and service times are exponentially distributed with mean μ1 . • After serving all clients joined in the service station, that is at the end of every nonzero busy period, the service provider immediately takes a first kind of hiatus and these hiatus times are exponentially distributed with parameter α1 . • The service provider faces two kind of situations at the end of the first kind of hiatus. 1. If zero clients present in the system, the service provider performs another second kind of hiatus of brief duration than the first kind. The second kind of hiatus times of the service provider is exponentially distributed with parameter α2 . 2. If at least one client in the system, the service provider starts the regular busy period. • The capacity of the clients waiting in the system is ceiling with a finite value L. The state transition rate diagram of model with the above aspects is given in Fig. 1. Define pn is the steady-state probability for the system to be in state of serving process going on and n ≥ 1 clients present in the system. Let qi,n be the steadystate probability that the service provider is in the ith-type hiatus, and there are n ≥ 0 clients present in the system. By Markov theory, the steady-state balance flow equations of assumed model is as follows. (λ + μ)p1 = μp2 + α1 q1,1 + α2 q2,1 , (λ + μ)pk = λpk−1 + μPk+1 + α1 q1,k + α2 q2,k , 2 ≤ k ≤ L − 1,
(1) (2)
μpL = λpL−1 + α1 q1,L + α2 q2,L , (λ + α1 )q1,0 = μp1 ,
(3) (4)
(λ + α1 )q1,k = λq1,k−1 , 1 ≤ k ≤ L − 1, α1 q1,L = λq1,L−1 , λq2,0 = α1 q1,0 ,
(5) (6) (7)
(λ + α2 )q2,k = λq2,k−1 , 1 ≤ k ≤ L − 1, α2 q2,L = λq2,L−1 .
(8) (9)
The solution of Eqs. (1)–(9) with the help of probability generating functions (PGFs) is discussed in the next section.
526
M. Vadivukarasi et al.
Fig. 1 State transition rate diagram
3 Steady State Analysis From (6), q1,L =
λ q1,L−1 . α1
The recursive application of (5) becomes λL q1,0 , α1 (λ + α1 )L−1 λL μp1 , by (4). = α1 (λ + α1 )L
q1,L =
(10)
From (9), q2,L =
λ q2,L−1 . α2
The recursive application of (8) becomes q2,L = =
λ α2 μ α2
λ λ + α2 λ λ + α2
L−1 q2,0 , L−1
α1 λ + α1
p1 .
(11)
Discussion on the Optimization of Finite Buffer …
527
Define the partial probability generating functions as, Q1 (z) =
L
q1,n z , Q2 (z) = n
n=0
L
q2,n z , P(z) = n
n=0
L
pn z n .
n=1
Multiplying the appropriate z n with (1)–(3) leads to P(z) =
α1 zQ1 (z) + α2 zQ2 (z) − α1 q1,0 z − α2 q2,0 z − μzp1 + λpL z L+1 (1 − z) . −(λz 2 − (λ + μ)z + μ) (12)
Multiplying the appropriate z n with (4)–(6) gives Q1 (z) =
μp1 + λq1,L z L (1 − z) . α1 + λ(1 − z)
(13)
Multiplying the appropriate z n with (7)–(9) leads to Q2 (z) =
α1 q1,0 + α2 q2,0 + λq2,L z L (1 − z) . α2 + λ(1 − z)
The denominator of Q1 (z) has the root z1 = ator of Q1 (z). That is
λ+α1 . λ
This root also satisfies the numer-
μp1 + λq1,L z1L (1 − z1 ) = 0. The denominator of Q2 (z) has the root z2 =
λ+α2 . α2
μ λ
(15)
∴,
α1 q1,0 + α2 q2,0 + λ(1 − z2 )q2,L z2L = 0. In the same way, z3 =
(14)
(16)
satisfies numerator of P(z). That is,
0 = α1 z3 Q1 (z3 ) + α2 z3 Q2 (z3 ) − α1 z3 q1,0 − α2 z3 q2,0 − μz3 p1 + λpL z3L+1 (1 − z3 ), μα1 p1 = α1 Q1 (μ/λ) + α2 Q2 (μ/λ) − λ + α1 μ L+1 α1 μ p1 − μp1 + −α2 (λ − μ)pL , λ + α1 λ λ = f1 p1 + ρL+1 (λ − μ)pL ,
(17)
528
M. Vadivukarasi et al.
where ρ = λ/μ, f1 =
α1 μ +
(λ−μ)μL+1 (λ+α1 )L
α μ λ + α 1 2 −μ λ λ + α1 (λ−μ)μL α2 + L λ λ(λ+α2 )
− λ + α1 − μ
+
μα1 λ
λ+α2 λ+α1
λ + α2 − μ
(18)
From the normalization condition, 1 = P(1) + Q1 (1) + Q2 (1), μp1 α1 q1,0 + α2 q2,0 + + α1 α2 λpL . = f2 p1 − μ−λ
=
λμp1 α1
+λ
α1 q1,0 +α2 q2,0 α2
− λq1,L − λq2,L − λpL
μ−λ
,
(19)
where f2 =
λ + α2 μ μα1 + α1 λα2 λ + α1 L L
λμ μα1 λ+α2 λ λ + 1 − 1 − α1 λ+α1 α2 λ+α1 λ+α2 + μ−λ
(20)
By solving (17) and (19), we have p1 =
1 f2 −
λ f (μ−λ)2 ρL 1
.
(21)
where the values of f1 and f2 are already known. Remark 1 When L → ∞, (21) is modified as p1 = μ α1
(μ − λ)/μ λ+α2 + αα12 μλ λ+α1
(22)
The above equation is in accord with p1,0 given in Ibe and Isijola [15] after appropriate notation assumptions.
Discussion on the Optimization of Finite Buffer …
529
4 Optimization Analysis In this section, a cost model by defining the total cost function, in which either the service rate or the hiatus rate is the control variable is presented. The motto is to control these variables in order to reduce the total mean expense per quantity. The cost elements are defined per unit time as follows: C1 ≡ holding expense for each client seen in the system; C2 ≡ waiting expense when one client is waiting to receive the service; C3 ≡ expense for the period of server busy; C4 ≡ expense when the server is on hiatus; C5 ≡ expense for service. Here, for the numerical discussion, we take the service rate μ as the decision variable and with the above cost elements, the cost function TC as: TC(μ) = C1 E(N ) + C2 E(L) + C3 PB + C4 PV + C5 μ where PB be the sum of busy probabilities and PV be the sum of hiatus probabilities. The expense reduction work can be summarized mathematically as Minimize TC(x)
(23)
where x is the service rate. The motto is to get the optimal service rate μ∗ to cut down the total expense TC(μ). We solve the problem using particle swarm optimization (PSO) which is introduced by Kennedy and Eberhart [23] and the results are presented in Sect. 5. Works [24–29] on this topic are referred to interested readers.
4.1 Algorithm Step 1: Generate the initial population of X in the range X (l) and X (u) randomly as X1 , X2 , . . . , XN . Hereafter, for convenience, the particle (position of) j and its velocity in iteration i are denoted as Xj(i) and Vj(i) , respectively. Step 2: Evaluate the objective function values corresponding to the particles as f [X1 (0)], f [X2 (0)],. . . , f [XN (0)]. Step 3: Find the velocity of particle j in the ith iteration as follows: Vj (i) = Vj (i − 1) + c1 r1 [Pbest,j − Xj (i − 1)] + c2 r2 [G best − Xj(i − 1)]; j = 1, 2, . . . , N .
where c1 and c2 are the cognitive (individual) and social (group) learning rates, respectively, and r1 and r2 are uniformly distributed random numbers in the range 0 and 1. Here the historical best value of Xj (i) (coordinates of jth particle in the current iteration i), Pbest,j , with the highest value of the objective function, f [Xj (i)],
530
M. Vadivukarasi et al.
encountered by particle j in all the previous iterations. And the historical best value of Xj (i) (coordinates of all particles up to that iteration), G best , with the highest value of the objective function f [Xj (i)], encountered in all the previous iterations by any of the N particles. Step 4: Find the position or coordinate of the jth particle in ith iteration as Xj (i) = Xj (i − 1) + Vj (i); j = 1, 2, . . . , N Evaluate the objective function values corresponding to the particles as f [X1 (i)], f [X2 (i)],. . . , f [XN (i)]. Step 5: Check the convergence of the current solution. If the positions of all particles converge to the same set of values, the method is assumed to have converged. If the convergence criterion is not satisfied, then go to step 3 by updating the iteration number as i = i + 1, and by computing the new values of Pbest,j and G best .
5 Numerical Analysis In this section, we inspect the influence of several parameters on the behaviour of mean number of clients in the system and mean time of a client spends in the system. It is assumed that α1 = 1, α2 = 4, L = 8 and then study the domination of the arrival rate and the service rate on E[T ], respectively. In Fig. 2, It is observed that when the arrival rate increases, the value of E[T ] increases, as it is expected. The surge for the curve is high for the lesser service rate μ = 0.74 comparing with the other two values μ = 0.9 and μ = 0.8. Figure 3 sketches the variation of E[T ] due to changes of the service rate μ for three values of λ as 0.74, 0.8 and 0.9. In the wake of increase in the service providing rate, dip arise in the mean number of clients, as it is expected. Among the three curves respective to the arrival rate 8, 7 and 6.5, decrease in the curve is slow for λ = 0.8 when compared with the remaining two. Now, PSO algorithm is applied to optimize the expense function described in Sect. 4. Here, it is assumed C1 = 750, C2 = 800, C3 = 600, C4 = 200, C5 = 950, λ = 0.5, α1 = 2, α2 = 5, L = 50. Figure 4 depicts the number of iterations required to attain the optimum cost in PSO. The curve pertaining to the optimal cost attains the steady state after 12th iterations. Hence, the minimum expected cost 109.5415 is obtained at μ∗ = 0.1130. Table 1 delivers the optimals of the service rate μ and the total cost TC for various choices of λ, α1 and α2 . Here the taken expense elements are C1 = 750, C2 = 800, C3 = 600, C4 = 200, C5 = 950. From the table, it is shown that as the rate of entry increases, the optimal values of μ and TC increase, which is in accord with the expectation.
Discussion on the Optimization of Finite Buffer …
Fig. 2 Effect of λ on E[T ]
Fig. 3 Effect of the service rate, μ on E[T ]
531
532
M. Vadivukarasi et al. PSO convergence characteristic
1000 900 800
Total Cost
700 600 500 400 300 200 100 0
20
40
60
80
100
Iteration
Fig. 4 Total cost against iterations Table 1 Effect of TC for different values of λ, α1 , α2 and μ∗ S. No. λ α1 α2 01 02 03 04 05 06 07 08 09 10
0.3 0.5 0.8 1.6 2.5 2.9 3.5 3.7 4.0 4.7
0.8 2.0 1.0 3.0 5.0 6.0 8.0 6.5 7.0 8.0
3.0 5.0 3.0 6.8 9.0 10.0 12.0 10.8 11.0 13.0
μ∗
TC(μ∗ )
0.1021 0.1130 0.2603 0.2681 0.3248 0.3421 0.3498 0.4048 0.4293 0.4334
98.9531 109.5415 252.2350 259.7640 314.7310 331.5287 338.9396 392.2150 415.9758 419.9918
6 Conclusion In this paper, we discussed an M /M /1/L queueing system with differentiated server vacations. It is obtained with the steady-state probabilities involving in the model through PGFs. It also derived some necessary performance measures of the model and developed a cost model to optimize the service performance. The results obtained are useful in the context of designing various service-providing sectors where the
Discussion on the Optimization of Finite Buffer …
533
waiting capacity is limited with reasonable service rate and hiatus rate so that the system as optimal. Some numerical examples are depicted to show how the various parameters of the model affect the dealing of the system.
References 1. Doshi, B.T.: Queueing systems with vacations—a survey. Queueing Syst. 1, 29–66 (1986). https://doi.org/10.1007/BF01149327 2. Doshi, B.T.: Single server queues with vacations. In: Takagi, H. (ed) Stochastic Analysis of Computer and Communication Systems. North-Holland, Amsterdam, pp. 217–265 (1990) 3. Tian, N.: Stochastic vacation service systems. Oper. Res. Commun. 1, 17–30 (1990). (in Chinese) 4. Ke, J.C., Wu, C.H., Zhang, Z.G.: Recent developments in vacation queueing models: a short survey. Int. J. Oper. Res. 7(4), 3–8 (2011) 5. Upadhyaya, S.: Queueing systems with vacation: an overview. IJMOR 9(2), 167–213 (2016). https://doi.org/10.1504/IJMOR.2016.077997 6. Takagi, H.: Queueing Analysis: A Foundation of Performance Analysis, vol. 1 of Vacation and Priority Systems, part 1, Elsevier Science Publishers B.V., Amsterdam, The Netherlands (1991). https://doi.org/10.1145/122564.1045501 7. Tian, N., Zhang, Z.G.: Vacation Queueing Models: Theory and Applications. Springer, New York (2006) 8. Takagi, H.: M /G/1/N queues with server vacations and exhaustive service. Oper. Res. 42(5), 926–939 (1994). https://doi.org/10.1287/opre.42.5.926 9. Zhang, Y., Yue, D., Yue, W.: Analysis of an M /M /1/N queue with balking, reneging and server vacations. In: Proceedings of the Fifth International Symposium, pp. 37–47 (2005) 10. Ghimire, R.P., Ritu Basnet.: Finite capacity queueing system with vacations and server breakdowns. IJE Trans. Basics 24(4), 387–394 (2011). https://doi.org/10.5829/idosi.ije.2011.24. 04a.07 11. Yue, D., Zhang, Y., Yue, W.: Optimal performance analysis of an M/M/1/N queue system with balking, reneging and server vacation. Int. J. Pure Appl. Math. 28, 101–115 (2006) 12. Yue, D., Sun, Y.: The waiting time of the M /M /1/N queueing system with balking, reneging and multiple vacations. Chin. J. Eng. Math. 5, 943–946 (2008) 13. Hui Z., Wei G.: The two-phases-service M /M /1/N queuing system with the server breakdown and multiple vacations. In: Liu, B., Chai, C. (eds) Information Computing and Applications. ICICA 2011. Lecture Notes in Computer Science, vol. 7030 (2011). Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25255-6-26 14. Kalidass, K., Gnanaraj, J., Gopinath, S., Kasturi, R.: Time dependent analysis of an M /M /1/N queue with catastrophes and a repairable server. Opsearch 49, 39–61 (2012). https://doi.org/ 10.1007/s12597-012-0065-6 15. Ibe, O.C., Isijola, O.A.: M /M /1 multiple vacation queueing systems with differentiated vacations. Model. Simulat. Eng. 3, 1–6 (2014). https://doi.org/10.1155/2014/158247 16. Oliver C. Ibe, Olubukola A. Adakeja.: M/M/1 Differentiated multiple vacation queueing systems with vacation-dependent service rates. IREMOS 8(5), 505 (2015). https://doi.org/10. 15866/iremos.v8i5.6778 17. Vijayashree, K.V., Janani, B.: Transient analysis of an M /M /1 queueing system subject to differentiated vacations. Qual. Technol. Quant. Manag. 15(6), 730–748 (2017). https://doi.org/ 10.1080/16843703.2017.1335492 18. Bouchentouf, A.A., Guendouzi, A.: Sensitivity analysis of feedback multiple vacation queueing system with differentiated vacations, vacation interruptions and impatient customers. IJAMAS 57(6) (2018)
534
M. Vadivukarasi et al.
19. Suranga Sampath, M.I.G., Liu, J.: Impact of beneficiaries impatience on an M /M /1 queueing system subject to differentiated vacations with a waiting server. Qual. Technol. Quant. Manag. 17(2), 125–148 (2018). https://doi.org/10.1080/16843703.2018.1555877. 20. Unni, V., Mary, K.J.R.: Queueing systems with C-servers under differentiated type 1 and type 2 vacations. Infokara Res. 8, 809–819 (2019) 21. Suranga Sampath, M.I.G., Kalidass, K., Liu, J.: Transient analysis of an M /M /1 queueing system subjected to multiple differentiated vacations, impatient customers and a waiting server with application to IEEE 802. Indian J. Pure Appl. Math. 51(1), 297–320 (2020). https://doi. org/10.1007/s13226-020-0402-z 22. Unni, V., Mary, K.J.R.: M/M/1 multiple vacations queueing systems with differentiated vacations under mixed strategy of customers. In: AIP Conference Proceedings, vol 2261(1), 1–8 (2020). https://doi.org/10.1063/5.0018988 23. Kennedy J., Eberhart R.: Particle swarm optimization. In: Proceedings of ICNN’95International Conference on Neural Networks, Perth, WA, Australia, vol. 4, 1942–1948 (1995). https://doi.org/10.1109/ICNN.1995.488968 24. Sonamani Singh T., Yadava R.D.S.: Application of PSO clustering for selection of chemical interface materials for sensor array electronic nose. In: Pant, M., Ray, K., Sharma, T., Rawat, S., Bandyopadhyay, A. (eds) Soft Computing: Theories and Applications, vol. 583, 449–456, Springer, Singapore (2017). https://doi.org/10.1007/978-981-10-5687-1_40 25. Kumar, S., Ajmeri, M.: Design of controllers using PSO techniques for second order stable process with time delay. In: Pant, M., Kumar Sharma, T., Arya, R., Sahana, B., Zolfagharinia, H. (eds), Soft Computing: Theories and Applications, vol. 1154, 143–152, Springer (2019) 26. Agarwal, M., Srivastava, G.M.S.: A PSO algorithm-based task scheduling in cloud computing. In: Soft Computing: Theories and Applications, pp. 295–301, Springer, Singapore (2017). https://doi.org/10.1007/978-981-13-0589-4_27 27. Jana, B., Chakraborty, M., Mandal, T.: A task scheduling technique based on particle swarm optimization algorithm in cloud environment. In: Ray, K., Sharma, T., Rawat, S., Saini, R., Bandyopadhyay, A. (eds) Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computing, vol. 742. Springer, Singapore(2019).https://doi.org/10.1007/ 978-981-13-0589-4_49 28. Kushwah, V.S., Goyal, S.K., Sharma, A.: Meta-heuristic techniques study for fault tolerance in cloud computing environment: a survey work. In: Ray, K., Sharma, T., Rawat, S., Saini, R., Bandyopadhyay, A. (eds) Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computing, vol. 742. Springer, Singapore (2019).https://doi.org/10.1007/978981-13-0589-4_1 29. Yang, D.-Y., Chiang, Y.-C.: An evolutionary algorithm for optimizing the machine repair problem under a threshold recovery policy. J. Chin. N. Inst. Eng. 37(2), 224–231 (2014). https:// doi.org/10.1080/02533839.2012.757050
Stability Analysis of HJB-Based Optimal Control for Hybrid Motion/Force Control of Robot Manipulators Using RBF Neural Network Komal Rani and Naveen Kumar
Abstract This paper presents intelligent optimal control approach based on Hamilton–Jacobi–Bellman (HJB) optimization for hybrid motion/force control problem of constrained robot manipulators. For designing of control scheme, first of all a state-space form of error dynamics is derived for quadratic optimization describing the constrained and unconstrained motion separately. Then, the explicit solution of HJB equation for optimal control is obtained by Riccati equation. The uncertainties of the system are compensated using radial basis function neural network (RBFNN) and adaptive compensator. Thus, the proposed control scheme is combination of the linear optimal control, neural network, and adaptive bound. The asymptotic stability of the system is demonstrated using Lyapunov stability analysis, and the simulated results are produced with two-link constrained manipulator. Keywords Optimal hybrid motion/force control · RBF neural network · HJB optimization · Asymptotic stability
1 Introduction In various applications of robotics like grasping and grinding, the end effector is requisite to be in contact with the environment and it is in need to control the end effector trajectory as well as the constrained force. This control problem is termed as hybrid motion/force control problem. In the literature, various schemes have been presented for hybrid motion/force control. Chen and Chang [1] developed a modified model of robot dynamics containing two sets of constrained and unconstrained variable and presented the robust force/motion control for robot manipulator using a sliding mode control approach to deal with unknown environment. Johansson and Spong [2] investigated an optimal control algorithm for force and motion control of K. Rani Babu Anant Ram Janta College, Kaul, Kaithal, Haryana, India K. Rani · N. Kumar (B) National Institute of Technology, Kurukshetra, Haryana, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_44
535
536
K. Rani and N. Kumar
a robot manipulator. Wang et al. [3] used an optimal control approach which is transformed from robust control problem in the presence of uncertainties. For a discrete time nonlinear system, an optimal control method is adopted by Zhang et al. [4], Liu and Wei [5]. Yang and Chundi [6] presented a robust optimal control for rigid robot manipulator based on neural network assuming the unknown bounded function of system nonlinearities. Kim et al. [7] designed the NN-based optimal control for uncertain robot manipulators. Cheng et al. [8] designed the fixed final time constrained optimal control method using the Hamilton–Jacobi–Bellman equation for constrained system. Panwar and Sukavanam [9] described an optimal control method for hybrid position/force control method using neural network for robotic system. Kumar et al. [10] proposed an adaptive hybrid force/position control scheme for robot manipulators with feedforward NN. He et al. [11] used an adaptive controller based on NN for the uncertain robot manipulator with unknown dynamics. Zhou et al. [12] presented a fuzzy adaptive control method for a nonlinear system with input saturation. A hybrid force/position controller is presented for constrained robots using RBFNN [13, 14]. But in most of adaptive and robust control techniques, used for the constrained manipulators error converge to a small neighborhood of zero. Motivation of this manuscript is the development and analysis of an asymptotically stable intelligent optimal controller combining HJB optimization and neural network for hybrid motion/force control of robot manipulators. The optimal hybrid motion/force control scheme is derived utilizing the optimization of Hamilton–Jacobi–Bellman (HJB) and neural network. The system uncertainties are compensated by RBFNN and the adaptive bound part. The paper is organized in six sections. Section 2 presents the reduced state-space model of dynamical system. In Sect. 3, neural network-based optimal approach is presented. Stability analysis is performed in Sect. 4. Section 5, presents simulation results and followed by concluding remarks in Sect. 6.
2 Robot Manipulator Dynamics Constrained robot manipulator motion with friction can be expressed by general equation based on Lagrangian mechanics as D(q)¨q + Bm (q, q˙ )˙q + G(q) + F(˙q) = τ + J T f
(1)
where D(q) ∈ Rn×n stands for the inertia matrix. Bm (q, q˙ ) ∈ Rn×n is the centripetal– coriolis matrix. G(q) ∈ Rn and F(˙q) ∈ Rn represents the gravity and friction effects, respectively. τ ∈ Rn is the torque input vector. J ∈ Rm×n is a jacobian matrix and force vector f ∈ Rm is defined as f = KΩ(t)
(2)
Stability Analysis of HJB-Based Optimal Control …
537
where function Ω : R1 → Rm and K is environment stiffness matrix. The constrained surface which is intersection of m mutually independent hypersurfaces may be expressed as Ψ (q) = 0 (3) where Ψ : Rn → Rm is differentiable function. Taking derivative of (3) w.r.t time t, we get ∂Ψ (q) q˙ = J (q)˙q (4) Ψ˙ = ∂q The matrix J (q) is full rank matrix, and it can be decomposed as J (q)= J1 (q) J2 (q) , where J1 (q) ∈ Rm×(n−m) and J2 (q) ∈ Rm×m . Without loss of generality, we can choose J2 (q) as invertible matrix. Now, due to the constrained motion of robot manipulator which comes in the contact of surface given by (3), the position vector p has only n − m independent coordinates. So, we define pu = [q1 , q2 , . . . qn−m ]T as vector of unconstrained motion which can be expressed as pu = Eq, where E = I(n−m)×(n−m) 0(n−m)×m . From (4), we have q˙s = −J2 (q)−1 J1 (q)p˙u + J2 (q)−1 Ψ˙ (q)
(5)
where qs = [qn−m+1 , . . . , qn ]T . If pc = Ψ (q), we have I(n−m)×(n−m) 0 p˙u = p˙ ≡ O(q)p˙u + P(q)p˙c p˙ + q˙ = q˙s −J2 (q)−1 J1 (q) u J2 (q)−1 c
(6)
where O(q) ∈ Rn×(n−m) and P(q) ∈ Rn×m are full rank matrices and S(q) = T O(q) P(q) . Since p = puT pcT ∈ Rn , then we have
Differentiating (7), we have
q˙ = S(q)˙p
(7)
˙ p q¨ = S(q)¨p + S(q)˙
(8)
Putting the value of q˙ and q¨ from (7) and (8) in (1) and multiplying with S T we get T
D(q)¨p + B(q, q˙ )˙p + G(q) + F(˙q) = τ + J f
(9)
˙ D(q) = S T (q)D(q)S(q), B(q, q˙ ) = S T B1 (q, q˙ ), B1 (q, q˙ ) = D(q)S(q) + Bm (q, q˙ )S(q). G(q) = S T (q)G(q), F(˙q) = S T F(˙q), τ = S T τ , J
T
The following properties are satisfied by this modified equation.
= ST J T
538
K. Rani and N. Kumar
Property 1 The matrix D(q) is positive definite symmetric matrix. It also satisfies the following inequality [9]: αD In ≤ D(q) ≤ β D In ,
where αD and β D are + ve constants.
(10)
˙ Property 2 Matrix (D(q) − 2B(q, q˙ )) between inertia and centripetal–coriolis matrices is skew symmetric [9]. Assumption 1 For c ≥ 0 and d ≥ 0, we have ||F(˙q)|| ≤ c + d ||˙q|| [14].
3 Control Scheme Design The optimal control system to obtain the desired force fd and position trajectory pd is designed in this section. For this, we choose sliding function ru ∈ Rn−m for unconstrained motion as: (11) ru = e˙ u + λu eu where eu = pd − pu and matrix λu ∈ R(n−m)×(n−m) is positive definite. The sliding function rc ∈ Rm for constrained motion is chosen as rc = e˙ c + λc ec
(12)
where ec = pc − Δpc , force compensator Δpc is defined as Δpc = Γ (fd − f )dt and λc ∈ Rm×m is positive definite matrix. Due to f = KΩ(t) and pc = Ψ (q), if rc = 0, we have ˙f + KΓ (f − fd ) + λc f + KΓ (fd − f )dt = 0 (13) We choose the following Lyapunov function to analyze the stability of (13): Vf = and
1 ˙T ˙ 1 f f + (f − fd )T λc KΓ (f − fd ) 2 2 V˙ f = −f˙ T (KΓ + λc )f˙
(14)
(15)
So, from Lyapunov stability and LaSalle’s theorem [15], we come to the conclusion that f˙ (t) → 0 as t → ∞ and this implies that f → fd . T T Define e = euT ecT and r = ruT rcT . Then the error and robot dynamics may be expressed as: e˙ = −λe(t) + r(t) (16)
Stability Analysis of HJB-Based Optimal Control …
D(q)˙r (t) = −B(q, q˙ )r(t) − τ + g(y)
539
(17)
T λu 0 and y(t) = euT e˙ uT ecT e˙ cT efT e˙ fT pdT p˙ dT p¨ dT with ef = fd − f . where λ = 0 λu The nonlinear function g(y) in (17) of unknown dynamics is defined as
T p¨ d + λu e˙ u p˙ d + λu eu + B(q, q˙ ) + G + F(˙q) − J (q)f −Γ e˙ f + λc e˙ c −Γ ef + λc ec (18) Now defining the input u(t) as
g(y) = D(q)
u(t) = g(y) − τ
(19)
and using (19) in (17), the dynamics becomes D(q)˙r (t) = −B(q, q˙ )r(t) + u(t)
(20)
Taking (16) and (20) together, we obtain the augmented system 0 −λ I e e˙ + p˙˜ (t) = = −1 −1 u(t) r r˙ D 0 −D B
(21)
p˙˜ (t) = A(q, q˙ )˜p(t) + C(q)u(t)
(22)
or
is a state-space error dynamics where A(q, q˙ ) ∈ R2n×2n , C(q) ∈ R2n×n , p˜ (t) = eT r T ∈ R2n is state variable and u(t) ∈ Rn is its control variable. Now our control objective is to obtain the optimal control vector u(t) such that quadratic performance index θ(u) is minimized subjected to (22). Where ∞ L(˜p, u)dt
(23)
1 1 T p˜ (t)H p˜ (t) + uT (t)Mu(t) 2 2
(24)
θ(u) = t0
with Lagrangian functions L(˜p, u) =
Theorem 1 Let u∗ be the optimal control input that achieve the control objective. The necessary and sufficient condition that u∗ optimize the performance index θ(u) subjected to (22) is to choose a function Y = Y (˜p, t) satisfying the Hamilton–Jacobi– Bellman equation (HJB) [7]
540
K. Rani and N. Kumar
∂Y (˜p, t) ∂Y (˜p, t) + min Q p˜ , u, ,t =0 u ∂t ∂ p˜
(25)
where Hamiltonian function Q of optimization is defined as
∂Y (˜p, t) ,t Q p˜ , u, ∂ p˜
= L(˜p, u) +
∂Y (˜p, t) ˙ p˜ ∂ p˜
(26)
Lemma 1 If the function Y for HJB equation is chosen as [7] 1 T Z 0 1 T p˜ Y = p˜ R(q)˜p = p˜ 0 D(q) 2 2
(27)
where Z is positive symmetric matrix. The matrix R satisfies the Riccati equation R˙ + RA + AT R + H − RCM −1 C T RT = 0
(28)
Then, the optimal solution u∗ for u is [7]: u∗ (t) = −M −1 C T R(q)˜p
(29)
With this optimal control input, the torque τ ∗ is computed as: τ ∗ = g(y) − u∗ (t)
(30)
3.1 RBF Neural Networks and Adaptive Bound RBF which is used to approximate the nonlinear unknown dynamics is described as [14]: (31) g(y) = W T ζ(y) + (y) where
||(y − ci )||2 ) ζi (y) = exp − σi2
i = 1, 2, . . . l
(32)
and the adaptive bound β is define taking assumption 1 and upper bound l for (y) [16, 17] (33) ||F + (y)|| ≤ c + d ||˙q|| + l = β = X T (||˙q||)Φ
Stability Analysis of HJB-Based Optimal Control …
541
3.2 NN-Based Optimal Controller Design In this subsection, we define the optimal control torque input by using neural network and adaptive compensator as βˆ 2 r τ ∗ = Wˆ T ζ(y) − u∗ + ˆ β||r||
(34)
ˆ Φˆ is estimated value of where Wˆ is estimated neural network weight, βˆ = X T Φ, parameter Φ. Using this control input in (17), we get
βˆ 2 r D(q)˙r (t) = −B(q, q˙ )r(t) − Wˆ T ζ(y) − u∗ + ˆ β||r||
+ W T ζ(y) + F + (y)
(35) The state-space form of above equation putting the value of optimal control input, we obtain
ˆ2r β −1 T T (36) p˜˙ (t) = (A − CM C R)˜p + C W˜ ζ(y) − + F + (y) ˆ β||r||
4 Stability Analysis In this section, we express that the resulting system is stable and the error state p˜ and hence the tracking errors converge asymptotically with the adaptation law given by the following equations (37) W˙ˆ = ΓW ζ(y)(C T R˜p)T Φ˙ˆ = ΓΦ X ||r||
(38)
where ΓW = ΓWT ∈ Rl×l and ΓΦ = ΓΦT ∈ Rk×k are positive definite. Proof: For system stability, consider the following Lyapunov function V =
1 1 1 T ˜ p˜ R˜p + tr(W˜ T ΓW−1 W˜ ) + tr(Φ˜ T ΓΦ−1 Φ) 2 2 2
(39)
Taking first derivative of function V and using (36), W˙˜ = −W˙ˆ , Φ˙˜ = −Φ˙ˆ we have
542
K. Rani and N. Kumar
βˆ 2 r V˙ = p˜ T R[(A − CM −1 C T R)˜p + C(W˜ T ζ(y) − + F + (y))] ˆ β||r|| 1 ˙ ˙ˆ + p˜ T R˜ p − tr(W˜ T ΓW−1 W˙ˆ ) − tr(Φ˜ T ΓΦ−1 Φ) 2
(40)
Now from Riccati equation (28) and using the adaptation law given by (37) and (38), we get βˆ 2 r 1 V˙ = − p˜ T U p˜ − p˜ T RC + p˜ T RC[F + (y)] − tr(Φ˜ T X ||r||) ˆ 2 β||r||
(41)
T where U = H + RCM −1 C T RT . Using (33) and RC = 0 I , we get ˜ ||˜pT RC[F + (y)]|| = ||r||||F + (y)|| ≤ X T Φ||r|| = X T (Φˆ + Φ)||r||
(42)
Now applying the above inequality in (41), we obtain ˆ 2 ||r||2 1 (X T Φ) 1 ˆ ≤ − p˜ T U p˜ + X T Φ||r|| V˙ ≤ − p˜ T U p˜ − T ˆ 2 2 X Φ||r||
(43)
1 V˙ ≤ − σmin ||˜p||2 ≤ 0 2
(44)
This implies
where σmin is the min singular value of U . Now V > 0 and V˙ ≤ 0, this implies that p˜ ,W˜ ,Φ˜ and hence Wˆ ,Φˆ are bounded. Let Ξ (t) be a function defined by Ξ (t) = σmin ||˜p||2
(45)
Then Ξ (t) ≤ −V˙ and V (0) is bounded function, V (t) is a decreasing and bounded, we get the result t Ξ (t)dt < ∞ (46) lim t→∞
0
Also r(t) is bounded and therefore r˙ (t) is bounded, and hence, Ξ˙ (t) is bounded. It shows that Ξ (t) is uniformly continuous. Thus, it can be shown that lim t→∞ Ξ (t) = 0 ⇒ r(t) → 0 as t → ∞ using Barbalat’s lemma [18]. This demonstrates that the resulting system is asymptotical stable.
Stability Analysis of HJB-Based Optimal Control …
543
5 Numerical Simulation This section presents numerical simulation results for two-link manipulator contacting with circular environment. Details of dynamics are given as in : The friction term is taken as F(˙q) = 5.3˙q 0 . The parameter values are taken as m1 = 0.5 kg, m2 = 0.8 kg, l1 = 1.3 m, l2 = 1.1 m, g = 10.2 m/s2 . The equation of environment in joint space is considered as: Ψ (q1 , q2 ) = l12 + l22 + 2l1 l2 cos(q2 ) − r 2 = 0, r = 1.4 m Now define pu = q1 and pc = Ψ (q1 , q2 ). The weighting matrices which are used in performance index are defined as Now our aim is to control the joint q1 and the contact force f in such a way that q1 and f track the desired trajectory q1d and fd , respectively. The desired trajectory of joint 1 is taken as q1d = (−90 + 52.5(1 − cos(1.26t)))π/180 rad and the desired force is taken as fd = 20 N. For the architecture of RBF neural network, we take 10 nodes. The update law weighting matrices are ΓW = 100I10 , Γφ = 100I4 . The performance of the proposed controller is shown in Figs. 1, 2, 3, 4, and 5. From Figs. 3 and 4, it is clear that the proposed controller effectively tracks the desired position trajectory and minimize the tracking error. Figure 5 describes the velocity tracking error. Figures 1 and 2 show the controller efficiency to track the constrained force as well. Here we observe that errors go to zero.
Fig. 1 Force tracking with proposed controller
25 Desired Force Actual Force
Force trajectory (N)
20
15
10
5
0
0
1
2
3
4
5 6 time(sec)
7
8
9
10
544
K. Rani and N. Kumar
Fig. 2 Force tracking error with proposed controller
20 Force Error
Force error
15
10
5
0
−5
Fig. 3 Joint 1 trajectory tracking with proposed controller
0
1
2
3
4
5 6 time(sec)
7
8
9
10
0.5
Joint 1 Trajectory(rad)
0
−0.5
−1
−1.5 Joint1 desired trajectory Joint1 actual trajectory
−2
0
1
2
3
4
5 6 time(sec)
7
8
9
10
6 Conclusion This paper is concerned with HJB-based optimal control for hybrid motion/force control method using neural network for the constrained manipulator. HJB optimization is utilized to propose the optimal control approach. RBF neural network with an adaptive compensator is employed within optimal control to approximate the unknown nonlinear system and the unknown bounds on neural network reconstruction error as well as on friction. Using Lyapunov approach, it is shown that the system is stable and the tracking errors converge to zero asymptotically. Finally, simulation results are carried out for the proposed controller.
Stability Analysis of HJB-Based Optimal Control … Fig. 4 Joint 1 trajectory tracking error with proposed controller
545
1 Joint 1 Angular Error
Joint 1 error
0.5
0
−0.5
Fig. 5 Joint 1 velocity tracking error with proposed controller
0
1
2
3
4
5 6 time(sec)
7
8
9
10
2 Joint 1 Angular Velocity Error
Joint 1 velocity error
1.5
1
0.5
0
−0.5
−1
0
1
2
3
4
5 6 time(sec)
7
8
9
10
References 1. Chen, Y.P., Chang, J.L.: Sliding-mode force control of manipulators. Proc. Natl. Sci. Council Rep. China Part A Phys. Sci. Eng. 23, 281–288 (1999) 2. Johansson, R., Spong, M.: Quadratic optimization of impedance control. In: Proceedings of the IEEE International Conference on Robotics and Automation, vol. 1, pp. 616–621 (1994) 3. Wang, D., Liu, D., Li, H., Ma, H.: Neural-network-based robust optimal control design for a class of uncertain nonlinear systems via adaptive dynamic programming. Inf. Sci. 282, 167–179 (2014) 4. Zhang, H., Song, R., Wei, Q., Zhang, T.: Optimal tracking control for a class of nonlinear discrete-time systems with time delays based on heuristic dynamic programming. IEEE Trans. Neural Netw. 22(12), 1851–1862 (2011)
546
K. Rani and N. Kumar
5. Liu, D., Wei, Q.: Finite-approximation-error-based optimal control approach for discrete-time nonlinear systems. IEEE Trans. Cybern. 43(2), 779–789 (2013) 6. Yang, S., Chundi, M. : Robust and optimal control for robotic manipulator based on linear parameter neural networks. In: Proceedings of the 38th Conference on Decision and Control IEEE, pp. 2174–2179 (1999) 7. Kim, Y., Lewis, F., Dawson, D.: Intelligent optimal control of robotic manipulators using neural networks. Automatica 36, 1355–1364 (2000) 8. Cheng, T., Lewis, F., Kalaf, M.: Fixed final time constrained optimal control of non linear system using neural network HJB approach. IEEE Trans. Neural Netw. 18(6), 1725–1737 (2007) 9. Panwar, V., Sukavanam, N.: Design of optimal hybrid position/force controller for a robot manipulator using neural network. Math. Probl. Eng. 23, 1–23 (2007) 10. Kumar, N., Panwar, V., Sukavanam, N., Sharma, S., Borm, J.: Neural network based hybrid force/position control for robot manipulators. Int. J. Precis. Eng. Manuf. 12(3), 419–426 (2011) 11. He, W., Chen, Y., Yin, Z.: Adaptive neural network control of an uncertain robot with full-state constraints. IEEE Trans. Cybern. 46(3), 620–629 (2016) 12. Zhou, Q., Li, Wu,H.C., Wang, L., Ahn, C. K.: Adaptive fuzzy control of nonlinear systems with unmodeled dynamics and input saturation using small-gain approach. IEEE Trans. Syst. Man Cybern. Syst. 47(8), 1979–1989 (2017) 13. Rani, K., Kumar, N.: Design of intelligent hybrid force and position control of robot manipulator. Procedia Comput. Sci. 125, 42–49 (2018) 14. Rani, K., Kumar, N.: Intelligent controller for hybrid force and position control of robot manipulators using RBF neural network. Int. J. Dyn. Control 7, 767–775 (2019) 15. Murray, R.N., Li, X., Sastry, S.S.: A Mathematical Introduction to Robotic Manipulator. CRC Press Boca Raton Fla USA (1994) 16. Kumar, N., Rani, M.: Neural network-based hybrid force/position control of constrained reconfigurable manipulators. Neurocomputing 420, 1–14 (2020) 17. Kumar N, Rani M. : A new hybrid force/position control approach for time-varying constrained reconfigurable manipulators. ISA transactions (2020) 18. Slotine, J., Li, W.: Applied Nonlinear Control, vol. 125, pp. 42–49. Prentice-Hall (1991)
RBF Neural Network-Based Terminal Sliding Mode Control for Robot Manipulators Ruchika and Naveen Kumar
Abstract The primarily focus of this paper is to develop a novel terminal sliding mode controller based on RBF neural network. To improve the control performance, a nonlinear term is included in control operation. The neural network system is adopted for estimation of nonlinear components. The neural network reconstruction error and bound on unstructured uncertainties are compensated with the help of adaptive control. Finally, simulation is performed with given manipulator that will provide the benefit of given method. Keywords RBF neural network · Adaptive bound · Terminal sliding mode control · Finite time
1 Introduction Tracking control of robot manipulators has gained major attention in some past years. For the control of robot manipulators, various control methods have been adopted [1]. Model-based feedback control and well-known computed torque control techniques have been carried out with the full awareness of dynamics [2]. Aforesaid methods need vast knowledge of dynamical model, and unmodeled dynamics are not accepted. Therefore, the problem of trajectory tracking is a very important task for controlling robotic manipulators. So for controlling robots, it is easier to use non-model-based control methods. In actual implementations, manipulator’s model comes across with many configurable and unregulated variability. This type of variability causes tracking error and system’s vulnerability [3]. To improve tracking performance, an efficient robust control approach is being used as an effective approach [4, 5]. But the basic requirement of these methods is regression matrix which is complicated to derive [6]. In [7–10], a sliding mode control-based robust approach Ruchika · N. Kumar (B) National Institute of Technology Kurukshetra, Kurukshetra, Haryana 136119, India Ruchika K. M. Govt. (P. G) College Narwana, Jind, Haryana 126116, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_45
547
548
Ruchika and N. Kumar
is developed to control uncertain system. Owing to simpler structure of traditional SMC assured only asymptotic stability and slow convergence rate [11]. In this case, overall execution is improved by nonlinear rearrangeable manifolds called as terminal sliding mode [12, 13]. Zhao [14] developed an efficient approach which can make system states converge to zero in a finite time without explicitly requirement of dynamic system. In spite of robust nature of sliding mode control, execution of this type of controller needs advance knowledge about system dynamics. But in workable situation, accurate apprehension about system is unfeasible due to system variability. Recently a variety of intelligent techniques has been used to compensate for offline component without prior knowledge [15–18]. In literature, there are various algorithms which are the combination of neural network and sliding mode control [19]. A terminal sliding mode control (TSMC) scheme was proposed for manipulators based on neural network, and the dynamics of the actuators was also considered [20]. An adaptive terminal sliding mode control scheme was presented for tracking control of manipulators utilizing radial basis function neural networks [21]. Encouraged by previous work this paper focuses on the limited time management for the control of manipulator. Within this work, RBFNN is integrated with TSMC to achieve the high precision control of manipulator. TSMC has been introduced to provide faster convergence, finite time, and high accuracy control. NN is used to do away the results of system variability and error caused due to network reconstruction, with the help of adaptive compensator. Moreover, the proposed method increases the convergence rate and enhances the capability of manipulator structures. The paper is prepared as follows: In segment 2, dynamics of manipulator is reduced to interpret the given system and its foremost capabilities. Segment 3 gives a detailed account of controller design. Part 4 offers steadiness evaluation of the system. Simulation work is carried out in segment 5. Subsequently, phase 6 completes the all procedure. Simulation results are included in Sect. 5 followed by conclusion in Sect. 6.
2 Dynamic Model of Robot Manipulator Consider n-DOF manipulator and its derived dynamics equation is introduced as [22]: ˙ θ˙ + G(θ) + F(θ) ˙ + Td = τ (1) D(θ)θ¨ + V (θ, θ) ˙ ∈ Rn×n matrix conwhere D(θ) ∈ Rn×n is bounded symmetric mass matrix, V (θ, θ) n ˙ are the vector cortains the terms related to centripetal force, G(θ) ∈ R and F(θ) responding to gravitational force and friction force, respectively, Td ∈ Rn represents unknown bounded disturbances, and τ ∈ Rn represents the input vector of torque. Concurrently, the following presumptions are used in a controlled system and these presumptions are described as: Property 1 D(θ) is positive symmetric bounded matrix.
RBF Neural Network-Based Terminal Sliding Mode Control …
549
˙ − 2V ) matrix satisfies the skew symmetric property and is very useful Property 2 (D in Lyapunov stability analysis. Assumption 1 The friction force given with finite constants b1 and b2 has upper bounding functions. ˙ ˙ = b1 + b2 θ. F(θ) Assumption 2 Td ≤ b3 for some unknown positive constants b3 .
3 Controller Design 3.1 Terminal Sliding Mode Design With the supremacy of fast convergence rate and high robustness, we design the sliding mode control in terms of tracking error e(t) = θd (t) − θ(t), with θd (t) as desired trajectory, is explicate as [23]: s(t) = e˙ + Γ1 e(t) + Γ2 sign(e)λ
(2)
where Γ1 , Γ2 ∈ Rn×n , λ = r/s, r, s > 0 takes only integral values and satisfies the relation r < s < 2r. The derivative of s w.r.t time t is denoted as: s˙j = e¨j + Γ1j e˙j + Γ2j ekj ekj =
λ|ej |λ−1 e˙j , 0,
(3)
ej = 0 ej = 0
The dynamic formulation in the form of sliding function is given as: ˙ D˙s = −Vs − τ + ν(x) + Td + F(θ)
(4)
˙ θ˙d + Γ1 e + Γ2 sign(e)λ ) + G(θ) where ν(x) = D(θ)(θ¨d + Γ1 e˙ + Γ2 ek ) + V (θ, θ)( T T and the input vector for this problem is chosen as x = [eT , e˙ T , θdT , θ˙d , θ¨d ]T . The following fundamental result will be utilized in the further analysis: Lemma 1 Consider a differentiable function Q(x) defined on a neighborhood U ⊂ ˙ QΦ ≤ 0 ∀ x ∈ U where β > 0 and 0 < Φ < 1, Rn of the origin and satisfies Q+β then the settling time required to approach Q(x) = 0 is finite [24].
550
Ruchika and N. Kumar
3.2 RBF Neural Network-Based Controller Design RBF neural network has been proven to be popular among nonlinear control methods. The shape of RBFNN made up of an input layer; hidden layer; output layer, and the output layer (final) has weighted sum of output from hidden layer. The design of RBF (kernel function) neural network is expressed as [25] ν(x) = υ T ς(x) + (x)
(5)
where υ ∈ RN ×n introduced as a weight matrix, the vector ς(·) : R5n → RN represents the known basis, (·) : R5n → Rn considered as the reconstruction error. The Gaussian-type function mathematically introduced as [26]: ςi (xi ) = e[−xi − ci 2 /σi 2 ] i = 1, 2, . . . , N
(6)
Here ci and σi are denoted as center vector and radius, respectively. Moreover each component of ς satisfying the inequality 0 < ςi < 1. Replacing the value of ν(x), the dynamics of the robot is represented as follows: ˙ + Td + (x) D˙s(t) = −Vs − τ + υ T ς(x) + F(θ)
(7)
3.3 Adaptive Bound After using the assumptions (1) and (2) and taking < N , we have ˙ + b3 + N ˙ + Td + (x) ≤ b1 + b2 θ F(θ)
(8)
˙ + b3 + N as an adaptive bound and after modification, we Define χ = b1 + b2 θ get ˙ ˙ 1 1][b1 b2 b3 N ]T = ω T (θ)ψ (9) χ = [1 θ where ω ∈ Rk is vector of known function of joint velocities θ˙ and ψ ∈ Rk is a parameter vector. Considering δ˙ = −αδ, δ(0) = design constant > 0, we opt for adaptive compensator to reduce the impact of disturbances is introduced as follows: ξ=
χˆ 2 s χs ˆ +δ
(10)
The laws for the weights matrix and parameter vector are choosen as: υ˙ˆ = Γυ ς(x)sT
and
˙ ψˆ = Γψ ωs
(11)
RBF Neural Network-Based Terminal Sliding Mode Control …
551
In order to attain the goal of desired tracking, the neural network-based control torque input is selected as: τ = υˆ T ς(x) + Ksign(s) + K1 sign(s)λ + ξ
(12)
Using (12) and (10) in (7) gives the following reduced closed-loop dynamics: D˙s = −Vs + υ˜ T ς(x) − Ksign(s) − K1 sign(s)λ + 2 ˙ + Td + (x) − χˆ s F(θ) χs ˆ +δ
(13)
where υ˜ = υ − υ. ˆ
4 Stability Analysis Taking into account (1), with designed control input (12), also the adaptive laws are choosen as (11), then the neural network parameter errors, control signals are bounded and the convergence time of error is finite as t → ∞. Proof To prove this, the following Lyapunov function is introduced as: P=
1 1 1 T ˜ + δ s Ds + tr(υ˜ T Γυ−1 υ) ˜ + tr(ψ˜ T Γψ−1 ψ) 2 2 2 α
(14)
where υ˜ = υ − υˆ and ψ˜ = ψ − ψˆ After differentiating, we may get 1 ˙ ˙˜ ˙˜ + tr(ψ˜ T Γ −1 ψ) P˙ = sT Ds + sT D˙s + tr(υ˜ T Γυ−1 υ) ψ 2 δ˙ + α
(15)
˙ Employing (13) in (15) and by making use of υ˙˜ = −υ˙ˆ , ψ˜˙ = −ψˆ and δ˙ = −αδ, one may obtained 1 ˙ − 2V )s − sT (Ksign(s) + K1 sign(s)λ ) P˙ = sT (D 2 2 2 ˙ + Td + (x)) − χˆ s +sT υ˜ T ς(x) + sT (F(θ) χs ˆ +δ T −1 ˙ T −1 ˆ˙ ˜ ˆ − tr(ψ Γψ ψ) − δ −tr(υ˜ Γυ υ)
(16)
552
Ruchika and N. Kumar
By use of property 2 and update law (11) and Eq. (16) can be represented as P˙ = −sT Ksign(s) − sT K1 sign(s)λ + sT υ˜ T ς(x) ˙ + Td + (x)) − tr(υ˜ T ς(x)sT ) +sT (F(θ) χˆ 2 s2 −tr(ψ˜ T ωs) − −δ χs ˆ +δ
˙ + Td + (x)) P˙ = −sT Ksign(s) − sT K1 sign(s)λ + sT (F(θ) T ˆ 2 2 (ω ψ) s −ψ˜ T ωs − −δ ˆ (ω T ψ)s +δ
(17)
(18)
Owing to Eq. (8)–(10), we get ˙ + Td + (x)) ≤ s(F(θ) ˙ + Td + (x)) sT (F(θ) ˜ ≤ χs = ω T ψs = ω T (ψˆ + ψ)s On employing the above expression, (18) yields: ˆ P˙ ≤ −sT Ksign(s) − sT K1 sign(s)λ + (ω T ψ)s 2 T ˆ 2 (ω ψ) s − −δ ˆ (ω T ψ)s +δ
≤ −sT Ksign(s) −
δ2 ˆ (ω T ψ)s +δ
P˙ ≤ −λmin s ≤0 From (14) and (19), it shows that P > 0 and P˙ ≤ 0, demonstrate the overall stability in Lyapunov sense. Because of this s, υ, ˜ ψ˜ are bounded. Now from (19), we can ¨ ¨ are bounded. As a result D ˙ is uniformly conclude that P ≤ 0. From (13) s˙ and D continuous. Now making use of Barbalat’s lemma concludes that all the tracking error must go to zero as t → ∞ and hence the stability of complete system.
RBF Neural Network-Based Terminal Sliding Mode Control …
553
In order to prove finite time stability, we introduce the following function: P=
1 T s Ds 2
1 ˙ P˙ = sT Ds + sT D˙s 2
(19) (20)
1 ˙ − 2V )s − sT (Ksign(s) + K1 sign(s)λ ) P˙ = sT (D 2 ˙ + Td + (x)) + sT υ˜ T ς(x) + sT (F(θ) −
χˆ 2 s2 χs ˆ +δ
(21)
χˆ 2 s2 P˙ ≤ −λmin s − sT K1 sign(s)λ − χs ˆ +δ ˙ + Td + (x) +sυ T ς(x) + sF(θ)
(22)
The Gaussian function defined above is always between 0 & 1, that is, between 0 ≤ ς(x) ≤ 1 and also the boundedness of s, , F resulted that ˙ + Td + (x) ≤ h1 + h2 = hm sυ T ς(x) + sF(θ)
(23)
From (22), we have P˙ ≤ −λmin s − sT K1 sign(s)λ + hm
(24)
Choose the selection of h in such a manner that λmin > hm P˙ ≤ − ≤− ≤− =−
n i=1 n i=1 n i=1 n i=1
si Ki |si |λ sign(si )
(25)
si sign(si )Ki |si |λ
(26)
|si ||si |λ Ki
(27)
Ki |si |λ+1
(28)
554
Ruchika and N. Kumar n
Kmin |si |λ+1
(29)
n μ μ 1 2 = Kmin ms ¯ i2 m ¯ 2 i=1
(30)
≤−
i=1
n 1 2 μ ms ¯ ) = −η( 2 i i=1
(31)
= −ηP μ
(32)
P˙ + ηP μ ≤ 0
(33)
where μ = (1 + λ)/2, η = Kmin ( m2¯ )μ and Kmin = min{Ki } Since P is defined to be positive definite and P˙ is negative definite, and hence the closed-loop system is asymptotically as well as finite time stable.
5 Simulation Results In order to prove the relevance and quality of the design, the control scheme and theoretical inference are verified by simulation studies are performed over a 3 dof microbot robot manipulator. For detailed expressions of dynamical model of microbot robot manipulator, one can refer to [27]. The simulation process was performed for 45 s in MATLAB environment. With these results, proposed controller has been produced with trajectory tracking errors, joint angle, and joint velocity in Figs. 1, 2 and 3. It is obvious with these figures that the tracking errors are quickly converging to zero levels and the tracking is stable, which validates the proposed approach.
6 Conclusion This paper examines radically different approach for the problem of trajectory tracking control of rigid robot manipulator based upon dynamic terminal sliders. Our analysis demonstrated that the proposed scheme yield superior performance with partial model information. The novelty of this paper lies in the nonlinear sliding surface, which results in a robust finite time control scheme with results that reflect the limited time convergence. radial basis function (RBF) neural network compensates the uncertainties of the system and the effects of network reconstruction error with the help of adaptive compensator. The complete system is shown to be asymptotically as well as finite time stable. It can be concluded that proposed scheme is more powerful and more systematic than the other schemes.
tracking error (radian)
RBF Neural Network-Based Terminal Sliding Mode Control …
555
1 Joint 1
0 −1
0
5
10
15
20
25
30
35
40
45
tracking error (radian)
time (sec)
1 Joint 2
0 −1
0
5
10
15
20
25
30
35
40
45
tracking error (radian)
time (sec)
5 Joint 3
0 −5
0
5
10
15
20
25
30
35
40
45
time (sec)
Joint Angle (radian)
Fig. 1 Tracking error performance of RBFNN-based TSMC 5 0 Joint 1
−5
0
5
10
15
20
25
30
35
40
45
Joint Angle (radian)
time (sec)
1 0 Joint 2 −1
0
5
10
15
20
25
30
35
40
45
Joint Angle (radian)
time (sec)
2 Joint 3
0 −2
0
5
10
15
20
25
time (sec)
Fig. 2 Joint angle performance of RBFNN-based TSMC
30
35
40
45
Ruchika and N. Kumar Joint Velocities (radian\sec)
556
20 Joint 1
0 −20
0
5
10
15
20
25
30
35
40
45
Joint Velocities (radian)
time (sec)
20 Joint 2
0 −20
0
5
10
15
20
25
30
35
40
45
Joint Velocities (radian)
time (sec)
50 Joint 3
0 −50
0
5
10
15
20
25
30
35
40
45
time (sec)
Fig. 3 Joint angular velocities performance of RBFNN-based TSMC
References 1. Peng, L.M., Woo, P.Y.: Neural-fuzzy control system for robotic manipulator: IEEE Control Syst. Mag. 2(1), 53–63 (2002) 2. Song, Z., Yi, J., Zhao, D., Li, X.: A computed torque controller for uncertain robotic manipulator systems. Fuzzy Sets Syst. 154, 208–226 (2005) 3. Lewis, F.L., Dawson, D.M., Abadallah, C.T.: Robot Manipulator and Control. Taylor and Francis (2004) 4. Choi, H.S.: Robust control of robot manipulators with torque saturation using fuzzy logic. Robotica 19(6), 631–639 (2001) 5. Slotine, J.-J.E.: The robust control of robot manipulators. Int. J. Rob. Res. 4(4), 49–64 (1985) 6. Slotine, J.-J.E., Li, W.: On the adaptive control of robot manipulators: Int. J. Robot. Res. 6(3), 49–59 (1987) 7. Bailey, E., Arapostathis, A.: Simple sliding mode control applied to robot manipulators. Int. J. Control 25(4), 1197–1209 (1987) 8. Capisani, L.M., Ferrara, A.: Trajectory planning and second-order sliding mode motion/interaction control for robot manipulators in unknown environments: Ind Electron IEEE Trans. 59(8), 3189–3198 (2012) 9. Moldoveanu, F.: Sliding mode controller design for robot manipulators. Bull. Transilvania Univ. Brasov 7(2), 97–104 (2014) 10. Edwards, C., Spurgeon, S.K.: Sliding Mode Control Theory and Applications. Taylor and Francis (1998) 11. Lanzon, A., Richards, R.J.: Trajectory/force control of robot manipulators using sliding mode and adaptive control. Proceedings of the American Control Conference (San) 3, 1940–1944 (1999) 12. Tan, C., Yu, X.H., Man, Z.H.: Terminal sliding mode observers for a class of nonlinear systems. Automatica 46(8), 1401–1404 (2010)
RBF Neural Network-Based Terminal Sliding Mode Control …
557
13. Tang, Y.: Terminal sliding mode control for rigid robots. Automatica 34(1), 51–56 (1998) 14. Zhao, D., Li, S., Gao, F.: A new terminal sliding mode control for robotic manipulators: Int. J. Control 82(10), 1804–1813 (2009) 15. Lewis, F.L., Jagannathan, S., Yesilidirek, A.: Neural Network Control of Robot Manipulators and Nonlinear Systems. Taylor and Francis (1999) 16. Panwar, V., Kumar, N., Sukavanam, N., Borm, J.H.: Adaptive neural controller for cooperative multiple robot manipulator system manipulating a single rigid object. Appl. Soft Comput. 12, 216–227 (2012) 17. Kim, J., Kumar, N., Panwar, V., Borm, J.H., Chai, J.: Adaptive neural controller for visual serving of robot manipulators with camera-in-hand configuration. J. Mech. Sci. Technol. 26(8), 2313–2323 (2012) 18. Kumar, N., Borm, J.H., Panwar, V., Chai, J.: Tracking control of redundant robot manipulators using rbf neural network and an adaptive bound on disturbances. Int. J. Precis. Eng. Manuf. 13, 1377–1386 (2012) 19. Hoai, H.K., Chen, S.C.: Simulation and implementation of a sliding mode control for a brushless DC Motor with RBFNN and disturbance observer. In: International Automatic Control Conference (CACS), pp. 1–6 (2019) 20. Wang, L., Chai, T., Zhai, L.: Neural-network-based terminal sliding-mode control of robotic manipulators including actuator dynamics. IEEE Trans. Indust. Electron. 56(9), 3296–3304 (2009) 21. Tran, M., Kang, H.: Adaptive terminal sliding mode control of uncertain robotic manipulators based on local approximation of a dynamic system. Neurocomputing 228, 231–240 (2017) 22. Panwar, V.: Wavelet neural network-based h∞ trajectory tracking for robot manipulators using fast terminal sliding mode control. Robotica 35(7), 1488–1503 (2017) 23. Bhat, S.P., Bernstein, D.S.: Finite-time stability of continuous autonomous systems. SIAM J. Control Optim. 38(3), 751–766 (2000) 24. Bhat, S.P., Bernstein, D.S.: Finite-time stability of homogeneous systems. Proc. Am. Control Conf. 21(3), 2513–2514 (1997) 25. Park, J., Sandberg, J.W.: Universal approximation using radial-basis function networks. Neural Comput. 3, 246–257 (1991) 26. Alqaisi, W.K., Brahmi, B., Saad, M., Ghommam, J., Nerguizian, V.: Adaptive sliding mode control based on RBF neural network approximation for quadrotor. In: 2019 IEEE International Symposium on Robotic and Sensors Environments (ROSE), pp. 1–7, Ottawa, ON, Canada, 2019 (2019) 27. Kumar, N., Borm, J.H., Panwar, V., Chai, J.: Enhancing precision performance of trajectory tracking controller for robot manipulators using rbfnn and adaptive bound. Appl. Math. Comput. 231, 320–328 (2014)
An In-Memory Physics Environment as a World Model for Robot Motion Planning Navin K. Ipe and Subarna Chatterjee
Abstract A 2D replica of real-world terrain was created in a physics simulation environment, allowing a robot to “imagine” a simulated version of itself navigating the terrain. The physics of the environment simulates the movement of robot parts and its interaction with objects, thus avoiding the need for explicitly programming various calculations. Since the complexity of motion increases with each degree of freedom of the robot’s joints, the utility of uniform randomness was also investigated as an alternative to computational intelligence algorithms for exploring the fitness landscape. Such techniques potentially simplify the algorithmic complexity of programming multi-jointed robots and could adapt by dynamically adjusting simulation parameters, on encountering environments with varied gravity, viscosity or traction. Keywords Machine learning · Computational intelligence · Differential evolution · Particle swarm optimization · Robotics · Uniform randomness
1 Introduction The animal brain constructs a complex imagined world model within which it simulates actions by combining various memories. Imagination thus offers a powerful method of evaluating various actions and scenarios before executing them. Biological brains appear to model phenomena using specialized cells and vast data storage [1, 2] to record and predict phenomena via an imagination that accounts for context and expected outcomes. Learnings from prior investigations into intelligence [3] led to the conclusion that the creation of an intelligent machine necessitated the machine to be an embodied consciousness that could experience the world around it. Only then, would the machine be capable of associating its experiences with the experiences of humans, thus helping it understand the meaning of objects, phenomena and words. As a first step toward building such capabilities, this paper investigated the possibility of simulating some aspects of the real world via a 2D physics simulation N. K. Ipe (B) · S. Chatterjee M S Ramaiah University of Applied Sciences, Bengaluru, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_46
559
560
N. K. Ipe and S. Chatterjee
environment in which a simulated robot navigated various obstacles to cross a finishline. The robot “imagines” itself performing various motor actions and selects the best action to perform in the “real world”. Section 2 presents related work, Sect. 3 presents the design decisions and test environment, Sects. 4 and 5 present the reasoning behind the trial runs and the inferences derived. The paper concludes with Sect. 6.
2 Related Work The animal brain models spatial perception using specialized mechanisms like proprioception [4], place cells [5], dead reckoning [6], spatial view cells [7], grid cells [8], border cells [9], speed cells [10] and head direction cells [11]. Various researchers have attempted modeling such imagination in the memories of robots [12, 13], but the robots and the environments chosen were simple, and computational intelligence (CI) does not appear to be used for imagination. Techniques that do utilize CI algorithms tend to use it for navigation, path planning, locomotion [14–16] and joint positioning [17]. CI algorithms have also been used for navigation on uneven and varied terrain [18–20], while Gaussian and probabilistic models have been used to decide appropriate footholds on terrain [21]. ANNs have also been used to provide controller action via a multi-objective differential evolution (DE) method [22]. Such architectures lack the capability to handle multi-dimensional scenarios and need to be explicitly programmed for each functionality. Even embodied and context-aware architectures [23, 24], artificial neural networks (ANN) [25] and CI used for robot gaits [26] are yet to achieve success in solving large problem spaces posed by real-world situations. This is especially true of ANNs which depend so heavily on trained weights, that over-fitting, under-fitting and the vanishing gradient problem [27] pose serious limitations. Even concepts like neural network dropouts or long short-term memory (LSTM) afford the algorithm only a limited memory that is insufficient to represent complex phenomena, some of which need more prominent representation than others, as evidenced by human homunculi [28]. Therefore, a simple, memory-efficient method is designed and presented via this paper.
3 Design Decisions 3.1 Robot and Environment The limbs of a robot need to rotate at various angles to be versatile when tackling uneven terrain. Coupled limbs can even move similar to a linear piston. The general design of the robot is shown in Fig. 1a. Two motors are attached to the chassis, and
An In-Memory Physics Environment as a World Model …
561
Fig. 1 Design decisions
two L1 limbs are attached to each motor. A second set of motors is attached to the free ends of the limb, to which two L2 limbs are attached. The rotation of the motor attached to the chassis causes L1 to move and inadvertently moves L2. Motor rates range from 0 to ±6 (no unit). The robot was initialized in a 2D physics simulation environment named PyMunk, initialized with an acceleration due to gravity of 9 m/s, the chassis weighing 5 kg and limbs weighing 0.5 kg each. Directly above the environment (called the “real world”), a duplicate environment called “imaginary world” is initialized, containing the same terrain as the “real world”. Ideally, the “imaginary world” terrain should consist of objects perceived by the robot’s sensors. The imaginary world represents the robot’s perception of the “real world”, and the robot “imagines” performing actions in the imaginary world by simulating multiple replica’s of itself. The robot was given four possible directions of motion: up, down, left and right, based on angles of equal proportion. However, during initial tests, it was noted that motions towards the right (which would take the robot closer to the finish line) were composed of up and down motions when navigating complex terrain, so a broader angle was assigned to the rightward motion as shown in Fig. 1b. The robot was programmed only to deal with static terrain. Worlds with flat ground, randomly distributed rectangular obstacles of varying sizes (helped test the possibility of the robot getting stuck within concave blockades, notches in terrain and pits), randomly distributed spheres of varying sizes (tested the robot’s capability to handle curved surfaces) and alternating gaps (tested the capability to climb high obstacles and then squeeze through a short tunnel-like gap) were created, as shown in Fig. 2). All surfaces were assigned a unitless friction of value 20. The algorithm was run for n trials, and in each trial of each terrain that required random obstacles, the obstacles were randomly generated and saved on disk. Therefore, when the same terrain needed to be run again with another algorithm, it could be loaded from disk and re-used. This ensured that various algorithms were tested with the same terrain, but also ensured that each trial was run with a new random terrain.
562
N. K. Ipe and S. Chatterjee
(a) Flat ground
(b) Rectangular obstacles
(c) Spherical obstacles
(d) Alternator obstacles
Fig. 2 The 2D simulation environments
3.2 Storage and Algorithms Robots were initialized in the imaginary world with the sameangles and position as the real robot R. Each imaginary robot I = I1 , I2 . . . , I p in a population of size p moved its limbs using various motor rates, for one second, and simulated for g generations. The motor rates generating the greatest magnitude of movement in the positive x direction (the fittest robot I F ) were utilized to move the real robot’s limbs for one second. If the robot got stuck in the same position for five consecutive attempts, the algorithm switched to a state where it performed a random motion each second, for five seconds. This was found to be very effective to get the robot un-stuck. When un-stuck, the algorithm switched to the normal motion and continued until the robot crossed the finish line L x . All random values used were generated with uniform random functions based on the Mersenne Twister random number generator. The differential evolution (DE) algorithm presented as Algorithm 2 and the particle swarm optimization (PSO) algorithm presented as Algorithm 3 are a little different from convention, since the fitness values could be calculated only after all I robots had completed motion. A uniform random function random(0, 1) generated probability values ranging from 0 to 1. Ten percent (rr = 0.1) of the population was randomly reinitialized to avoid being stuck at local optima. In PSO, the fitness of
An In-Memory Physics Environment as a World Model …
563
a robot at time t was I f (t) and was compared with the fitness values calculated in the previous run of PSO, depicted as I f (t − 1). Also stored, were the personal best motor rates, depicted by b p . Algorithm 1 Robot motion Step 1: Initialize R. Initialize s = 0. Step 2: Initialize I with angles of I = angles of R. Step 3: Store (x, y) positions of R as PR . Step 4: For gen = 1 to g Step 5: Assign motor rates to I based on random values or Algorithm 2 or Algorithm 3 and run for one second. Step 6: Store motor values of I F and set angles of I = angles of R. Step 7: N ext gen Step 8: Set angles of R = angles of I F and run for one second. Step 9: If PR variation < 20 pixels, s = s + 1, else s = 0. Step 10: If s == 5 goto Step 11 else goto Step 12. Step 11: Set random angles for R. Move motors for one second. Repeat Step 11 five times. Step 12: If x of PR > L x , end program else goto Step 2.
Algorithm 2 Modified Differential Evolution Step 1: Determine I F . Step 2: Set rr = 0.1, cr = 0.3, βv = 2, rβ = 1/40. Step 3: For each I p in I Step 4: If I p == I F , goto Step 3 else goto Step 5. Step 5: Randomly select 3 robots I1 , I2 , I3 from I , excluding I F . Step 6: Get motor rates of selected robots r p , r1 , r2 , r3 . Step 7: If random(0, 1) > rr , set random motor rates. Goto Step 10 else goto Step 8. Step 8: If random(0, 1) ≤ cr , motor rates = r1 + r ound (βv × (r2 − r3 )). Step 9: If motor rates out of range of ±6, motor rates = r p , else r p =motor rates. Step 10: If βv > rβ , βv = βv − rβ . Step 11: N ext I p
4 Trials The objective of the robot was to maximize the distance it moved in the positive x direction in each attempt to reach the finish line. Trial 4.3 was programmed with the added functionality of moving randomly to exit a stuck position. The time taken by each robot is shown in Table 1.
564
N. K. Ipe and S. Chatterjee
Algorithm 3 Modified Particle Swarm Optimization Step 1: Determine fittest robot I F and fitness I f for all I . Step 2: Set rr = 0.1, c1 = 1, c2 = 2. Set velocities v and personal best rates b for all robots to 0. Step 3: For each I p in I Step 4: If I p == I F , goto Step 3 else goto Step 5. Step 5: Get motor rates of current robot r p and fittest robot r F . Step 6: If random(0, 1) > rr , set random motor rates. Goto Step 13 else goto Step 7. Step 7: If I f (t) > I f (t − 1), I f (t − 1) = I f (t) and b p = r p . Step 8: C = c1 × random(0, 1) × b p − r p . Step 9: S = c2 × random(0, 1) × r F − r p . Step 10: v p = v p + C + S. Step 11: r p = r p + v p Step 12: Clamp values of v p and r p . Step 13: N ext I p . Table 1 Trials Gen. (g) Popu. ( p) 5
4
30
5
2
30
30
30
Terrain
Avg. time (s), random
Avg. time (s), DE
Avg. time (s), PSO
Flat Rectangles Flat Rectangles Spheres Flat Rectangles Spheres Alternator Flat Rectangles Spheres Alternator
18.43 143.12 13.6 46.34 28.99 22.17 115 43.0 210.2 18.08 67.36 36.0 132.55
31.38 – 16.77 51.01 28.56 24.33 90.43 38.17 248.23 17.05 82.56 41.85 146.5
24.8 – 17.29 62.79 27.42 28.4 144.13 45.73 214.9 17.51 101.47 46.95 130.71
4.1 Trials with g = 4, p = 5 On flat ground, a small population and few generations gave good results, while the uniform random function consistently located better global optima, compared to DE and PSO. However, in the rectangular obstacles terrain, locating optimal solutions was tougher, and it took longer to reach the finish line. As a result, a decision was taken to increase the number of generations to 30.
An In-Memory Physics Environment as a World Model …
565
4.2 Trials with g = 30, p = 5 Running a greater number of generations helped locate better global optima, resulting in the robot reaching the finish line faster in the flat terrain, the rectangles terrain and the spheres terrain. During this trial run, the results of the spheres terrain indicated that running 30 generations may be unnecessary, and fewer generations may suffice if a larger population could explore larger expanses of the fitness landscape. Therefore, the alternator terrain trial was not run, and instead trials with two generations and a population size of 30 were planned.
4.3 Trials with g = 2, p = 30 One of the reasons the rectangles terrain took long to complete, was due to robots being stuck in concave spaces at the beginning of the terrain, before getting unstuck and moving toward the finish line. One of the reasons they got stuck was the suboptimal identification of the best motion, due to fewer generations. Trial 4.2 had 30 × 5 = 150 searches on the fitness landscape per attempt, but the current trial had only 2 × 30 = 60 searches. It was also interesting to note a trend that in all trials until this stage, the uniform random algorithm gave a better result than DE or PSO in a majority of the runs.
4.4 Trials with g = 30, p = 30 In order to improve the search for a better global optimum, the number of generations and population was kept at 30. A higher number would have been more beneficial, but was limited at 30, due to the computational load. The searches per attempt were 30 × 30 = 900, which helped locate better motion at each attempt.
5 Inferences 5.1 Uniform Randomness Is Effective CI algorithms locate a near-optimal solution in a fitness landscape by exploiting areas with good fitness. However, when the fitness landscape is extremely large and computational resources are scarce, it may be simpler and more beneficial to utilize uniformly random functions. Table 1 lists the average time taken for the real robot to reach the finish line. Utilizing individual time of each run (before averaging), a
566
N. K. Ipe and S. Chatterjee
Fig. 3 Probability density functions of random, DE and PSO
hypothesis test was designed to test whether there was a significant difference or advantage to using CI algorithms, as compared to uniform randomness. Null Hypothesis H0 : In highly multi-dimensional fitness landscapes, uniform randomness could locate equally good local optima as the CI algorithms, thus resulting in completion times that are more-or-less similar. Alternate Hypothesis H1 : CI algorithms would produce a significant improvement in results since they explore the fitness landscape near any local optima, while continuing to explore globally too, so the completion times of DE and PSO should show a significant difference from the uniformly random results. Inference 1: Figure 3 depicts a positive skewed distribution of the time the real robot took to reach the finish line. The skews were 2.13, 2.16 and 3.77 for random, DE and PSO, respectively. A 0.05 confidence interval was considered when utilizing a one-way Mann–Whitney rank to test the hypothesis, resulting in p-value = 0.378 for random versus DE and p-value = 0.241 for random versus PSO. H0 was not rejected, thus proving that the utilization of uniformly random numbers is viable. This was also observed by other authors [29]. Inference 2: A larger number of generations and a larger population can locate better global optima, resulting in more efficient and effective robot locomotion.
An In-Memory Physics Environment as a World Model …
567
5.2 Physics Simulations Are a Viable Alternative The fitness landscape for a robot that can move its limbs at any angle, is highly multi-dimensional. Each limb could move with one among 130 motor rates, ranging between values of −6 and +6. Each L1 motion caused L2 to move and from each such L2 position, L2 could perform its own motion. Besides, each motor motion began at one of 360 possible limb angles, and the motion of the robot depended on which limb made contact with the terrain. Motion was affected by the number of contact points, motor rates and angles of contact. Given such a vast range of motion and possibilities, it is evident why biological creatures have a limited range of motion, and why certain specific motions are repeated frequently, even though they may not be optimal. The success of the simulations shows that this approach could indeed be considered a viable alternative to explicit programming, since when the fitness landscape is so vast for such a simple robot, the complexity of programming the movements for highly multi-jointed robots would be a much more cumbersome task, especially given the various terrains and varied effects of gravity, viscosity or buoyancy. Physics simulation algorithms could also be re-designed to calculate physical interactions of one second, within a fraction of a second, thus allowing multiple simulations of a large population of robots to be performed in fractions of a second, multiple times.
6 Conclusion This paper demonstrated that an embodied consciousness (a robot) could navigate various kinds of terrain, by simulating various possibilities of the action and terrain. Moreover, a simple uniform random number generator was capable of locating global optima, and a larger number of generations was found to give better results. In future work, a full-fledged event-based memory for intelligent machines [30] could augment or even replace the need for utilizing physics simulations to create an in-memory world model.
References 1. Sargolini, F., Fyhn, M., Hafting, T., McNaughton, B.L., Witter, M.P., Moser, M.B., Moser, E.I.: Conjunctive representation of position, direction, and velocity in entorhinal cortex. Science 312(5774), 758–762 (2006) 2. Nikoli´c, D.: The brain is a context machine. Rev. Psychol. 17(1), 33–38 (2010) 3. Ipe, N.: Facts and anomalies to keep in perspective when designing an artificial intelligence (2020) 4. Winter, J., Allen, T.J., Proske, U.: Muscle spindle signals combine with the sense of effort to indicate limb position. J. Physiol. 568(3), 1035–1046 (2005)
568
N. K. Ipe and S. Chatterjee
5. O’Keefe, J., Burgess, N., Donnett, J.G., Jeffery, K.J., Maguire, E.A.: Place cells, navigational accuracy, and the human hippocampus. Philos. Trans. R. Soc. Lond. Ser. B: Biol. Sci. 353(1373), 1333–1340 (1998) 6. Whishaw, I.Q., Hines, D.J., Wallace, D.G.: Dead reckoning (path integration) requires the hippocampal formation: evidence from spontaneous exploration and spatial learning tasks in light (allothetic) and dark (idiothetic) tests. Behav. Brain Res. 127(1–2), 49–69 (2001) 7. Rolls, E.T.: Spatial view cells and the representation of place in the primate hippocampus. Hippocampus 9(4), 467–480 (1999) 8. Doeller, C.F., Barry, C., Burgess, N.: Evidence for grid cells in a human memory network. Nature 463(7281), 657–661 (2010) 9. Barry, C., Lever, C., Hayman, R., Hartley, T., Burton, S., O’Keefe, J., Jeffery, K., Burgess, N.: The boundary vector cell model of place cell firing and spatial memory. Rev. Neurosci. 17(1–2), 71 (2006) 10. Kropff, E., Carmichael, J.E., Moser, M.B., Moser, E.I.: Speed cells in the medial entorhinal cortex. Nature 523(7561), 419–424 (2015) 11. Taube, J.S., Muller, R.U., Ranck, J.B.: Head-direction cells recorded from the postsubiculum in freely moving rats. i. description and quantitative analysis. J. Neurosci. 10(2), 420–435 (1990) 12. Blum, C., Winfield, A.F., Hafner, V.V.: Simulation-based internal models for safer robots. Front. Robot. AI 4, 74 (2018) 13. Rockel, S., Klimentjew, D., Zhang, L., Zhang, J.: An hyperreality imagination based reasoning and evaluation system (hires). In: 2014 IEEE International Conference on Robotics and Automation (ICRA). pp. 5705–5711. IEEE (2014) 14. Tang, B., Zhu, Z., Luo, J.: Hybridizing particle swarm optimization and differential evolution for the mobile robot global path planning. Int. J. Adv. Rob. Syst. 13(3), 86 (2016) 15. Patle, B., Pandey, A., Parhi, D., Jagadeesh, A., et al.: A review: on path planning strategies for navigation of mobile robot. Defense Technol. 15(4), 582–606 (2019) 16. Yang, Z.Y., Juang, C.F.: Evolutionary locomotion control of a hexapod robot using particle swarm optimized fuzzy controller. In: 2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC). pp. 3861–3866. IEEE (2014) 17. Rokbani, N., Benbousaada, E., Ammar, B., Alimi, A.M.: Biped robot control using particle swarm optimization. In: 2010 IEEE International Conference on Systems, Man and Cybernetics, pp. 506–512. IEEE (2010) 18. Janrathitikarn, O., Long, L.N.: Gait control of a six-legged robot on unlevel terrain using a cognitive architecture. In: 2008 IEEE Aerospace Conference, pp. 1–9. IEEE (2008) 19. Hauser, K., Bretl, T., Latombe, J.C., Harada, K., Wilcox, B.: Motion planning for legged robots on varied terrain. Int. J. Robot. Res. 27(11–12), 1325–1349 (2008) 20. Nguyen, Q., Agrawal, A., Da, X., Martin, W.C., Geyer, H., Grizzle, J.W., Sreenath, K.: Dynamic walking on randomly-varying discrete terrain with one-step preview. In: Robotics: Science and Systems, vol. 2 (2017) 21. Plagemann, C., Mischke, S., Prentice, S., Kersting, K., Roy, N., Burgard, W.: Learning predictive terrain models for legged robot locomotion. In: 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3545–3552. IEEE (2008) 22. Teo, J., Abbass, H.A.: Coordination and synchronization of locomotion in a virtual robot. In: Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP’02, vol. 4, pp. 1931–1935. IEEE (2002) 23. Kotseruba, I., Gonzalez, O.J.A., Tsotsos, J.K.: A review of 40 years of cognitive architecture research: Focus on perception, attention, learning and applications, pp. 1–74. arXiv preprint arXiv:1610.08602 (2016) 24. Ye, P., Wang, T., Wang, F.Y.: A survey of cognitive architectures in the past 20 years. IEEE trans. Cybernet. 48(12), 3280–3290 (2018) 25. Tchircoff, A.: The mostly complete chart of neural networks, explained. Towards Data Science, pp. 1–29 (2017) 26. Rong, C., Wang, Q., Huang, Y., Xie, G., Wang, L.: Autonomous evolution of high-speed quadruped gaits using particle swarm optimization. In: Robot Soccer World Cup, pp. 259–270. Springer (2008)
An In-Memory Physics Environment as a World Model …
569
27. Hochreiter, S.: The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 6(02), 107–116 (1998) 28. Penfield, W., Boldrey, E.: Somatic motor and sensory representation in the cerebral cortex of man as studied by electrical stimulation. Brain 60(4), 389–443 (1937) 29. McGerty, S., Moisiadis, F.: Are evolutionary algorithms required to solve sudoku problems? In: Fourth International Conference on Computer Science and Information Technology, pp. 365–377 (2014) 30. Ipe, N.: Context and event-based cognitive memory constructs for embodied intelligence machines (2020)
Motion Model and Filtering Techniques for Scaled Vehicle Localization with Fiducial Marker Detection Kyle Coble, Akanshu Mahajan, Sharang Kaul, and H. P. Singh
Abstract This paper studies the fusion of control inputs and IMU data for developing a kinematic bicycle motion model for the KTH smart mobility lab small-vehiclesfor-autonomy (SVEA) platform. This motion model is filtered with relative pose estimates between a camera and fiducial markers, using both an extended Kalman filter and a particle filter. The developed motion models and filters are implemented on SVEA vehicles and are tested in the smart mobility lab. Pose estimates from the motion model and filters are compared against ground truth, determined by a motion capture system with sub-millimeter accuracy. The results presented provide the necessary base for development of automated vehicle control technologies on the SVEA platform with perception based on the detection of fiducial markers. Keywords Filtering techniques · Extended Kalman filter · Particle filter · Fiducial markers · Localization · Motion modeling
1 Introduction Detection of fiducial markers using computer vision (CV) is an effective method for estimating the six degree-of-freedom pose of mobile robots in closed environments, including laboratories, warehouses, and test driving tracks. Robotics companies and educational research labs alike utilize fiducial markers as artificial landmarks for robot localization due to the accuracy and minimal hardware required for impleK. Coble (B) · A. Mahajan · S. Kaul KTH Royal Institute of Technology, Stockholm, Sweden e-mail: [email protected] A. Mahajan e-mail: [email protected] S. Kaul e-mail: [email protected] H. P. Singh Cluster Innovation Centre, University of Delhi, New Delhi, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_47
571
572
K. Coble et al.
mentation. For these reasons, the KTH smart mobility lab has begun to use fiducial markers, specifically Aruco markers [1], for localization on their small-vehicles-forautonomy (SVEA) platform. This paper builds on their prior work, which enabled relative pose estimates between an RGB webcam and a fiducial marker in the camera’s frame of view. Using this technology, the pose of a mobile robot could be estimated by detecting stationary fiducial markers with known positions. Conversely, the robot’s pose can be estimated by placing a single fiducial marker on the mobile robot and detecting the marker with a stationary camera in a known location. Further, the robot’s pose can be estimated by a camera on a second, more intelligent, mobile robot with knowledge of its own pose. The detected pose is filtered with control input signals and inertial measurement unit (IMU) data from SVEA vehicles with the extended Kalman filter (EKF) and the particle filter (PF). This is done with the goal of determining the configuration that best maximizes localization accuracy and implementation flexibility, while minimizing hardware requirements on the SVEA vehicles. The remote localization of a mobile robot from a stationary camera has applications including smart intersections optimizing vehicular and pedestrian traffic, as in [2]. The remote localization of a mobile robot from another mobile robot has applications including autonomous vehicle (AV) platooning, as in [3, 4]. For these reasons, the localization of a remote-controlled SVEA vehicle, both through the lens of a stationary camera and of a camera mounted on a second mobile robot, was pursued in this paper. Successes include: • Developing an accurate motion model for the SVEA vehicle platform, based on the kinematic bicycle model using PWM control inputs, with optional fusion of IMU data. • Aruco-based localization packages using both EKF and PF are now readily implementable for various sensor configurations and use cases on SVEA vehicles. • Relative localization of a mobile robot based on the pose of another mobile robot has been successfully implemented. The hypothesis investigated in this paper is whether a mobile robot can be reliably localized using visual detection of fiducial markers as artificial landmarks. Additionally, this paper investigates the hypothesis of whether the kinematic bicycle model can be used to accurately predict motion of small car-like robots.
2 Related Work In order to solve the problem of localization in autonomous systems, a number of methods have been proposed. In the earlier years, localization was considered as a side effect when operating the robot under uncertainty. In that period, localization was considered as a passive phenomenon and [5] introduced the concept of active localization in mobile robots. This method was further developed in [6] where the concept of probabilistic robotics was revisited. This method treated the inherent
Motion Model and Filtering Techniques for Scaled …
573
uncertainty in robot perception, thereby necessitating the use of filters in robot localization. Filtering methods involve estimating the robot’s current state (and potentially landmarks) using the set of all measurements up to the current time [7, 8]. When the process model and measurements are linear and the noise is white and Gaussian, Kalman filtering [9] provides an optimal method (minimum mean squared error) for state estimation. The Kalman filter can be further improved through hybridization with a Bayesian approach [10]. But most practical robots have nonlinear system dynamics and many sensor models are also nonlinear. For such cases, an EKF can be utilized, which uses Taylor series expansion in order to linearize the system [11]. However, since the nonlinear system is linearized about the current state estimate, the EKF is, at best, only a locally stable observer. For more complex and higher-order nonlinear systems, non-parametric filters such as a particle filter can be used [12]. In recent years, localization using vision positioning technology based on artificial landmarks has had a significant impact. Compared to natural landmarks, the use of artificial landmarks avoids large calculations and has higher real-time performance [13]. Aruco markers [1] (2014) are modern variants of earlier tags like ARTag [14] (2005) and AprilTag [15] (2011). The utilization of Aruco markers for mobile robots is shown in [16]. Our work extends this idea, and we develop a mobile robot with localization based on Aruco marker detection.
3 Theoretical Background This section describes and references the prior research that this paper is built upon.
3.1 Kinematic Bicycle Motion Model The kinematic bicycle model (Fig. 2, Sect. 4.1) was selected over the dynamic bicycle model because it is simpler to implement, requires less computation, and yields more accurate results across a wide range of speeds [17]. The kinematic model only requires system identification of two parameters, far fewer than the dynamic model. Additionally, exclusion of linear and angular acceleration terms in the kinematic bicycle model reduces integration, and thus computation, requirements. Finally, the kinematic model was found to be appropriate for modeling low-speed and zerospeed vehicle operation, the typical realms of operation for SVEA vehicles. For these reasons, the recommendation in [17] to use the kinematic bicycle model was followed for modeling the motion of the SVEA vehicles.
574
K. Coble et al.
3.2 Fiducial Detection Fiducial markers provide a relatively simple to implement and low-cost method for landmarked localization of mobile robots, as shown in [13]. Visual sensors have various limitations that may increase localization error [18], however, the lower computational expense of localizing from visual detection of fiducial markers, as compared to ranging sensors such as Lidar, motivates their use in this paper. Through geometric analysis of image frames containing fiducial markers, the relative poses between cameras and markers can be calculated. With knowledge of either the marker’s or the camera’s coordinates in a global or local reference frame, the other’s coordinates can be calculated. This paper differs from the recommendations given by Mutka et al. [13] in that square fiducial markers (Fig. 1) were chosen over circular markers due to the smart mobility lab’s prior implementation using square markers with the SVEA platform.
3.3 Filtering Techniques Popular choices for nonlinear filters used in mobile robot localization include the extended Kalman filter (EKF), the unscented Kalman filter (UKF), and the particle filter (PF) [19]. All three choices can be used to estimate the state (position) of a mobile robot by predicting the motion of the robot, typically with dead reckoning,
Fig. 1 CV detection of a square fiducial marker mounted on a SVEA
Motion Model and Filtering Techniques for Scaled …
575
and correcting the error in this prediction with observational measurements of the robot’s environment [11]. Both motion prediction and observational measurements are fallible and must be filtered to handle the inherent uncertainty present when determining the position of a mobile robot. The EKF offers a relatively simple and lower computational implementation for filtering nonlinear state estimates given Gaussian noise and an initial pose estimate. The UKF provides more adaptability to high degrees of nonlinearity than the EKF does, but has higher computational requirements. The PF is not affected by nonlinear or multi-modal distributions, which makes it an ideal filtering choice in global localization problems. The PF filter was found in the simulations conducted by Konatowski et al. [19] to minimize root-meansquare-error as compared to the UKF and EKF. Considering this, it was determined to implement both the EKF, a simpler implementation as a proof of concept, and the PF, for the accuracy and the flexibility to expand to global localization.
4 Methodology 4.1 Kinematic Bicycle Motion Model As mentioned in Sect. 3.1, the kinematic bicycle model was chosen to model the motion of the SVEA vehicles based on the key algorithms presented in [17]. The equations of motion for a front wheel drive vehicle following the kinematic bicycle model are expressed in Fig. 2. Where x and y are the coordinates of the vehicle center of mass, v and a are the speed and acceleration of the vehicle, ψ is the inertial heading angle, and β is the angle of the center of mass velocity relative to the longitudinal axis of the car. l f and lr represent the front and rear axle distance from the center of mass. The front wheel
Fig. 2 Kinematic bicycle model equations and diagram [17]
576
K. Coble et al.
steering angle is represented by δ f , while the rear wheel steering angle, δr , is set to zero for the front-steered SVEA vehicles. During our experiments, the SVEA platform reported the speed and steering control inputs as 8-bit integers in the range of [−127, 127]. However, there is a dead zone in the speed control on the range [−20, 20] where the motors do not have enough power to propel the vehicle in either direction. By accounting for maximum speeds and steering angles, the reported 8-bit speed and steering values are converted into relevant linear and angular velocity units through the kinematic bicycle model and formatted as a TwistWithCovarianceStamped ROS message type.
4.2 Filtering Techniques EKF and PF algorithms are well established and, in this paper, are based on those in [11]. Further explanation of the particle filter can be found in [20]. Illustrations of the implemented filter configurations can be seen in Figs. 3 and 4. All inputs entering the left side of a filter block are configured as process/prediction steps, and all inputs entering the bottom are configured as observation updates. Note that all inputs into the EKF node were used as observation measurements due to limitations of the robot_localization EKF package, despite intuition suggesting that IMU and control inputs should be used in a prediction step. The implementation of the particle filter uses the fused control and IMU data from the EKF, dubbed filtered odometry, to calculate prediction steps.
Fig. 3 Extended Kalman filter flowchart
Fig. 4 Particle filter flowchart
Motion Model and Filtering Techniques for Scaled …
577
5 Experimental Implementation 5.1 Experimental Structure Figure 5 demonstrates the logic used to obtain the pose estimates, the true pose, and the estimation error as a SVEA vehicle is driven around the smart mobility lab. IMU/Control estimate represents the filtered odometry obtained from fusing IMU data with control inputs and can be thought of as the pose estimate from the motion model. The Qualisys motion capture system gives a sub-millimeter accurate SVEA 1 true pose, referred to as true pose. Qualisys also gives the true pose of the stationary camera used in most experiments and the second SVEA vehicle used in the two vehicle (Mobile Camera) experiments. The detected SVEA 1 pose is the estimate of the first SVEA vehicle’s pose based on fiducial marker detection and the camera’s or second SVEA’s true pose. This pose is filtered with the IMU/Control estimate to give the filtered SVEA 1 pose and is compared against the true pose for the error calculation. This structure, relying on the ROS tf2 package, enables all poses to be visualized and compared in the Global/Map coordinate frame. The implementation discussed in this paper was dependent on a few open-source ROS packages. This includes the KTH smart mobility lab’s SVEA_starter, the robot_localization package [21], and the fiducials package [22].
5.2 SVEA Description The small-vehicles-for-autonomy (SVEA) platform of the KTH smart mobility lab was used to implement and test the motion models and localization filters developed in this paper. One representative configuration of the SVEA vehicles can be seen in Fig. 6. The fiducial marker placed on the side of the vehicle enables stationary cameras, or those mounted on other SVEA vehicles, to estimate the pose of this SVEA vehicle. This marker can also be affixed to the rear of the vehicle, as a license plate would be, for the two vehicle tests. This lends itself to the idea of future
Fig. 5 Experimental setup—coordinate frame diagram
578
K. Coble et al.
Fig. 6 SVEA vehicle with relevant components labeled
integration with automatic license plate detection [23]. The reflective markers are used by the infrared motion capture system, qualisys, to determine the pose of the vehicle in the global/ map reference frame. This pose is dubbed the True Pose and is the benchmark by which the accuracy of all pose estimates is measured.
5.3 Pose Filters The robot_localization package contains an EKF node that was used in this paper for all EKF applications. Necessary configurations for the EKF node include sensor types, sensor message formats, covariance values, and more. The particle filter is implemented based on the algorithm in [11] with sensor inputs as shown in Fig. 4. The robot_localization EKF node is used to fuse control inputs, converted using the kinematic bicycle model, and raw IMU data into a Filtered Odometry message. This message is used for linear motion prediction/process updates, with Gaussian process noise, in the particle filter. This was done to take advantage of the fusion of control and IMU data completed while implementing the EKF and to make the particle filter more configurable for other applications. The use of the Odometry ROS message type here enables other process update input sources to be implemented with this particle filter node. Both multinomial and systematic resampling methods were implemented, based on [11], while the multinomial method was used during all experimental data collection for consistency.
Motion Model and Filtering Techniques for Scaled …
579
5.4 Covariance Values Selection of proper covariance values for all sensors, processes, and updates was critical for proper calibration and operation of both filters. Covariance values for raw IMU data came from the manufacturers data sheet, while values of the control inputs were manually tuned to give approximately equal weight between IMU and control inputs. Observation covariance values were typically set lower than the true accuracy of the fiducial observation pose estimates. This is so the observation updates could correct the poor pose estimates after long periods of occlusion of the fiducial marker from the cameras frame of view. Process covariance values for the particle filter were calculated and tuned based on measured error of the motion model and adjusted for 30 Hz frequency of process updates.
6 Experimental Results This section details the experimental results obtained with various configurations of physical setups, localization filters, and parameter settings. In all experiments, SVEA vehicles were driven around using remote control, and no knowledge of the True Pose determined by the Qualisys motion capture system was used in the pose estimates. The paths taken by all relevant pose estimates and the True Pose are tracked over time and displayed in X-Y plots, as seen in Figs. 7, 8, 9, and 10. All error measurements in this section are reported as the Euclidean distance (omitting heading angle) from the relevant pose estimate to the True Pose.
6.1 Extended Kalman Filter Figure 7 demonstrates an implementation of the EKF filter with high process covariance values, on the order of 10x the values used for the well-tuned motion models seen in Fig. 8. The motion model is very inaccurate, reaching error values up to 2.9 m. The observation update from fiducial marker detection corrects the Filtered Pose estimate by 1.5 m to an error of only 0.2 m. These results show a successful integration of the fiducial detection observation updates with the EKF filter. The 0.2 m Filtered Pose error will be seen to occur regularly and comes from miscalibration of static coordinate transforms, such as the transform representing the mounting location of the fiducial marker on the SVEA.
580
Fig. 7 Extended Kalman filter localization with high process covariance
Fig. 8 Localization with well-tuned motion models
K. Coble et al.
Motion Model and Filtering Techniques for Scaled …
581
Fig. 9 Results of interesting test cases
6.2 Motion Model and Particle Filter Figure 8a demonstrates the high level of accuracy of the well-tuned motion model during long periods of occlusion. As detailed in Sect. 3.1, this model was obtained by fusing IMU data with control inputs, converted through a kinematic bicycle model. After driving the majority of a ∼2.5 m diameter loop over ∼10 s, the Filtered Odometry pose estimate can be seen to have an error of 1.0 m from the True Pose. The particle filter corrects this error when the fiducial marker is detected, resulting in a Filtered Pose estimate error of 0.2 m. Figure 8b shows a similar level of error, 1.1 m, obtained by excluding IMU data from the motion model and using only control signals for the prediction steps. Contrarily, the motion model accumulates massive error when excluding the control inputs and using only IMU data for the motion model, as seen in Fig. 9a. This shows that a reasonable motion model can be obtained by only using control inputs with a kinematic bicycle model.
6.3 Interesting Filter Failures Figure 9 contains the results of two test cases in which the EKF and particle filter failed due to improper configuration. Figure 9a depicts a test case in which the control inputs were excluded from the motion model. Even when stationary, the IMU consistently reports accelerations in the X and Y directions due to sensor bias and probable misalignment of the sensor. Without frequent zero velocity updates from the control signals canceling this error, the motion model assumes the vehicle is moving based on these accelerations, and the pose estimate moves accordingly. Figure 9b depicts a test case in which a large amount of process noise is introduced to the particle filter. Process covariance values of 0.15 m, when multiplied by a 30 Hz update rate, lead to a 4.5 m/s prediction estimate covariance spread. This exceeds the maximum speed of the SVEA vehicle, ∼1.5 m/s, which leads to the pose estimate being determined primarily by noise. Since the Gaussian noise is centered on zero, the filtered pose estimate does not stray very far from the origin.
582
K. Coble et al.
6.4 Two Vehicle (Mobile Camera) For the final experiments, the stationary camera detecting the fiducial marker is replaced by a camera mounted on a second SVEA vehicle following behind the first SVEA. The fiducial marker is mounted on the rear of the first SVEA in the spirit of a license plate. The second SVEA estimates the pose of the first SVEA by using knowledge of its own pose, control, and IMU data from the first SVEA, and relative pose calculations when the fiducial marker is detected. Figure 10 demonstrates the results obtained from one such experiment in which a particle filter with a high particle count of 15,000 particles is used for localization. The Filtered Pose estimate is corrected at two different instances when the fiducial marker is detected. The effectiveness of this implementation can be understood by comparing the final Filtered Odometry pose estimate error of 3.4 m to the final Filtered Pose estimate error of 0.2 m.
7 Experimental Conclusions In these experiments, the extended Kalman filter and the particle filter were successfully implemented on the KTH smart mobility lab’s small-vehicles-for-autonomy (SVEA) platform. In the base case, the prediction/process update was calculated by fusing raw IMU data and control input signals through a kinematic bicycle model. To test the merit of each of these sensory inputs in the motion model, a filtering process was run with each input individually omitted. Considering the approximations made in the conversion of the 8-bit control signals, it was anticipated that the IMU-only model would outperform the control-only model. Surprisingly, the controlonly motion model (Fig. 8b) performed as well as the model with both control and IMU inputs fused (Fig. 8a), while the IMU-only model accumulated massive error (Fig. 9a). This is significant, as removal of the IMU further reduces hardware requirements and cost for the ongoing work studying fleets of autonomous mobile robots with minimal hardware. In terms of filtering effectiveness, the EKF and PF proved to obtain equally accurate pose estimates when properly calibrated with smooth SVEA driving. The experimental results detailed in Sect. 6 are summarized in Table 1. As can be seen in Table 1, the successful filter implementations typically report a filtered pose localization error of 0.2 m. This error comes from miscalibration of static coordinate transforms, such as the transform representing the mounting location of the fiducial marker on the SVEA. Another potential error source could be intermittent miscalibration of the camera stemming from the auto-focus feature of the webcam used. These errors could be corrected in future implementations through more accurate static transform determination and using a camera with a fixed focal length.
Motion Model and Filtering Techniques for Scaled …
583
Fig. 10 Mobile camera, high particle count filter Table 1 Localization error comparison Experiment Motion model error (m) EKF—CTRL only EKF—IMU only EKF—high proc Cov PF—high proc Cov PF—well tuned PF—two SVEAs
1.1 ∞ 1.5 2.4 1.0 3.4
Filtered pose error (m) Reference Figs. – – 0.2 2.3 0.2 0.2
Fig. 8b Fig. 9a Fig. 7 Fig. 9b Fig. 8a Fig. 10
584
K. Coble et al.
8 Further Scope Further possible expansions of the framework developed in this paper include vehicle platooning with small or full scale vehicles [3, 4] and smart intersections optimizing vehicular and pedestrian traffic [2]. One could also integrate automated license plate detection, as in [23], to the platooning application. Further, the use of fiducial markers could be replaced with object recognition methods, as those discussed in [24] and 3D reconstruction of the recognized objects [25]. Finally, this framework could be used to study the effects of error sources in localization performance from various variance and bias sources including, but not limited to, camera miscalibration, IMU bias, or poor motion modeling. Acknowledgements All the equipment and setup to carry out the work were provided to us by the Integrated Transport Research Lab (ITRL), KTH Royal Institute of Technology. This paper was completed as part of the EL2320 Applied Estimation course at KTH, with feedback provided by Associate Professor John Folkesson. Additional support and assistance were provided by Frank Jiang and Tobias Bolin from KTH ITRL.
References 1. Garrido-Jurado, S., Muñoz-Salinas, R., Madrid-Cuevas, F., Marín-Jiménez, M.: Automatic generation and detection of highly reliable fiducial markers under occlusion. Patt. Recognit. 47, 2280–2292 (2014) 2. Mallika, H., Vishruth, Y.S., Venkat Sai Krishna, T., Biradar, S.: Vision-based automated traffic signaling. In: Pant, M., Kumar Sharma, T., Arya, R., Sahana, B., Zolfagharinia, H. (eds) Soft Computing: Theories and Applications, (Singapore), pp. 185–195, Springer Singapore (2020) 3. Winkens, C., Paulus, D.: Long range optical truck tracking. In: ICAART, pp. 330–339 (2017) 4. Winkens, C., Fuchs, C., Neuhaus, F., Paulus, D.: Optical truck tracking for autonomous platooning. In: Azzopardi, G., Petkov, N. (eds) Computer Analysis of Images and Patterns, (Cham), pp. 38–48, Springer International Publishing (2015) 5. Burgard, W., Fox, D., Thrun, S.: Active mobile robot localization. In: Proceedings of the Fifteenth International Joint Conference on Artifical Intelligence, Volume 2, IJCAI’97, (San Francisco, CA, USA), pp. 1346–1352, Morgan Kaufmann Publishers Inc. (1997) 6. Thrun, S.: Probabilistic algorithms in robotics. AI Mag. 21, 93 (2000) 7. Anderson, B., Moore, J.: Optimal Filtering. Prentice-Hall, Information and System Sciences Series (1979) 8. Grewal, M.S.: Kalman Filtering, pp. 705–708. Berlin, Heidelberg: Springer Berlin Heidelberg (2011) 9. Kalman, R.: A new approach to linear filtering and prediction problems. Trans. ASME J, Basic (1960) 10. Taya, N.: Improved parameter estimation of smart grid by hybridization of kalman filter with bayesian approach. In: Pant, M., Sharma, T.K., Verma, O.P., Singla, R., Sikander, A. (eds) Soft Computing: Theories and Applications, (Singapore), pp. 1107–1115, Springer Singapore (2020) 11. Thrun, S., Burgard, W., Fox, D.: Probabilistic Robotics. The MIT Press, Cambridge, Massachusetts (2006) 12. Gustafsson, F., Gunnarsson, F., Bergman, N., Forssell, U., Jansson, J., Karlsson, R., Nordlund, P.: Particle filters for positioning, navigation, and tracking. IEEE Trans. Sig. Process. 50(2), 425–437 (2002)
Motion Model and Filtering Techniques for Scaled …
585
13. Mutka, A., Miklic, D., Draganjac, I., Bogdan, S.: A low cost vision based localization system using fiducial markers. In: IFAC Proceedings Volumes, vol. 41, no. 2, pp. 9528–9533 (2008). 17th IFAC World Congress 14. Fiala, M.: Artag. A Fiducial Marker System Using Digital Techniques 2, 590–596 (2005) 15. Olson, E.: Apriltag: A Robust and Flexible Visual Fiducial System, pp. 3400 – 3407 (2011) 16. Baˇcík, J., Durovsky, F., Fedor, P., Perdukova, D.: Autonomous flying with quadrocopter using fuzzy control and Aruco markers. Intell. Serv. Robot. 10 (2017) 17. Kong, J., Pfeiffer, M., Schildbach, G., Borrelli, F.: Kinematic and dynamic vehicle models for autonomous driving control design. In: 2015 IEEE Intelligent Vehicles Symposium (IV), pp. 1094–1099 (2015) 18. Singh, R., Nagla, K.: Comparative analysis of range sensors for the robust autonomous navigation—a review. Sens. Rev. vol. ahead-of-print (2019) 19. Konatowski, S., Kaniewski, P., Matuszewski, J.: Comparison of estimation accuracy of ekf, ukf and pf filters. Ann. Navig. 23, 12 (2016) 20. Singh, R., Nagla, K.: Improved 2d laser grid mapping by solving mirror reflection uncertainty in slam. Int. J. Intell. Unmanned Syst. 6 (2018) 21. Moore, T., Stouch, D.: A generalized extended kalman filter implementation for the robot operating system. In: Proceedings of the 13th International Conference on Intelligent Autonomous Systems (IAS-13). Springer (2014) 22. Vaughan, J., Agrawal, R.: Simultaneous localization and mapping using fiducial markers. https://github.com/UbiquityRobotics/fiducials, http://wiki.ros.org/fiducials (2020) 23. Shah, S., Rathod, N., Saini, P.K., Patel, V., Rajput, H., Sheth, P.: Automated indian vehicle number plate detection. In: Ray, K., Sharma, T.K., Rawat, S., Saini, R.K., Bandyopadhyay, A. (eds) Soft Computing: Theories and Applications, (Singapore), pp. 453–461, Springer Singapore (2019) 24. Goel, R., Sharma, A., Kapoor, R.: State-of-the-art object recognition techniques: a comparative study. In: Pant, M., Sharma, T.K., Verma, O.P., Singla, R., Sikander, A. (eds) Soft Computing: Theories and Applications, (Singapore), pp. 925–932, Springer Singapore (2020) 25. Khurana, A., Nagla, K.S., Sharma, R.: 3d scene reconstruction of vision information for mobile robot applications. In: Pant, M., Sharma, T.K., Verma, O.P., Singla, R., Sikander, A (eds) Soft Computing: Theories and Applications, (Singapore), pp. 127–135, Springer Singapore (2020)
Analysis of Liver Disorder by Machine Learning Techniques Sushmit Pahari and Dilip Kumar Choubey
Abstract In the current scenario, the classification methods are needed to reduce the possible errors. It will help the physicians to take suitable decisions with speedy manner. Here, the prime motto of this paper is to achieve an efficient classification method for liver disease. So, authors have used random forest, support vector machine and AdaBoost methods on the Indian Liver Patient Disease (ILPD) data set where random forest gives the highest accuracy of 93%. Finally, authors would like to conclude that the proposed classification methods have not improved but sustained the accuracy compared to the existing and could also be implemented in other medical diseases. Keywords Classification · Random forest · Support vector machine · AdaBoost · Liver disorder · Machine learning
1 Introduction The liver is the biggest interior organ of the body. It assumes a noteworthy role in the transmission of blood all through the parts of our body. The degree of most chemical substances in our blood is managed by the liver. It helps in the digestion of the liquor, tranquilizers and helps in annihilating lethal substances. The liver can be tainted by parasites, infections and causes pain of inflammation and also helps in lessening its capacity. It can possibly keep up to the standard capacity, in any event, when a small part of it has been harmed or damaged. Be sure to oneself or in any case, it is very critical to analyse liver disorder which can be S. Pahari (B) Department of Computer Science and Engineering, Indian Institute of Information Technology Bhagalpur, Bhagalpur, India e-mail: [email protected]; [email protected] D. K. Choubey Department of Computer Science and Engineering, Indian Institute of Information Technology, Bhagalpur, India e-mail: [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_48
587
588
S. Pahari and D. K. Choubey
able to build the patient’s endurance rate. Expert doctors are needed for different assessment tests to analyse the liver disorder; however, it does not guarantee the correct as well as accurate results. Research fascination is developing in the field of machine learning and discovery of its knowledge so as to navigate information in point-by-point volume. The whole data framework had been stored in various databases that contain significant concealed information which helps in assisting with the improved decision-making process. Supervised way of classification is one of the primary strategies to separate information from various databases where the set of different prepared examples based for training purpose are known already. As a matter of fact, classification is a double procedure as it comprises of two stages: first is training stage where with the assistance of classifier algorithm the data set is trained and the other is the testing stage where testing of the classifier is done for the analysis of its performance by utilizing various samples of the test set taken from the data set. Prediction accuracy is a criterion to evaluate the performance of a classifier. The precision of the classification portrays the level of occurrences that are effectively characterized. Different kinds of algorithms based on classification are there which incorporates support vector machine (SVM), discriminant analysis, closest neighbour algorithms and so on. These kinds of algorithms based on classification apply to various small as well as the large number of clinical data sets. The job of learning from the huge data sets is exhausting. Some data sets contain a large number of attributes yet to choose an appropriate subset of attributes and features. In most of the previous work, various algorithms are applied to liver data sets to figure out the best algorithm for the diagnosis of this disease in a very accurate manner. The point of this research is to utilize diverse classification algorithms on different types of liver data sets for the utilization of algorithms and collecting data which relies upon the pre-processing of crude form of data. Set of information is utilized for the learning process where classifying algorithms used to carry out the training part. Data is utilized in a unique way for surveying the exhibition of a completely specified form of classifier. Our primary objective is to prepare the ideal performing model where accuracy measure assists in finding the best model that is used to produce the best information and how well the selected model will function later on. We acquire a few information from the organization as well as from the Internet itself for an arbitrary arrangement of earlier applications, both those which were named high potential (positive models) and the individuals who were not (negative models). We mean to discover a portrayal that is shared by all the positive models and by none of the negative models. At that point, if another application happens, we can utilize this portrayal to decide whether the new application ought to be considered ‘high potential’. A subset of genuine information is given to the information researcher. The information incorporates an adequate number of positive and negative guides to permit any potential calculation to learn. The information on which researcher tries different things with various calculations before choosing those which best fit the preparation data. It will be critical to quantify the blunder of any learning classifying algorithm that is viewed as implementable. An output based on binary values (YES or NO, 1 or 0) to show whether the calculation has
Analysis of Liver Disorder by Machine Learning Techniques
589
ordered the information occasion as positive or negative. Utilizing our prior model, the calculation would say that the application is ‘high potential’ or it is not. This is especially valuable if there is no desire for human intercession in the dynamic procedure. Researchers Choubey et al. [1–10] and Bala et al. [11, 12] have used many applications of machine learning, soft computing and data mining techniques for the classification of diabetes and thunderstorm, respectively. The rest of the paper is signified as follows: Sect. 2 contains Motivation, Related Work is included in Sect. 3, Sect. 4 covers Data set Description, Sect. 5 describes the Proposed Methodology and Sect. 6 is the Experimental Results and Discussion, while Sect. 7 is Conclusion and Future Work.
2 Motivation Patients having liver disorder problems have been ceaselessly expanding because of the enormous consummation of alcohol, inhaling of toxic gases, consumption of adulterated food items, drug intake. Lately, liver disorder had become a serious as well as a complicated issue because of its expansion rate had been severely increased in many countries. To conquer this, machine learning algorithms are trained using Indian Liver Patient Disease (ILPD) data set [13] in such a manner that we easily analyse liver disorder among patients.
3 Related Work There are already several existing methods which have been implemented for classification of the liver disorder. Here, the existing work has been stated either for the particular liver disorder or related to the implemented methods in this paper. Ramana et al. [14] executed various classification algorithms on the AP data set from the UCI Machine Learning Repository. The functioning of pre-processing and formation of a predictive-based analytical model is executed by using K-NN, random forest and C5.0 algorithms. In this region, region of curve (ROC) helps in outlining the relationship between the two types of entities, i.e. sensitivity and specificity achieving the accuracy of 75.19%. It is used to perform various performance proportions from which the rate of accuracy will be different from endpoints. On the liver disorder data set by using particle swarm optimization (PSO) tool, it can learn any process and kind of nonlinearity as well as complex relationships. Lie et al. [15] stated that with the use of genetic algorithms, particle swarm optimization given model needs various processors which can form to different parallel processing power in terms of their respective model structure with achieving the highest accuracy rate of 78.18%.
590
S. Pahari and D. K. Choubey
Veena et al. [16] applied Naïve Bayes classifier, K-NN and multilayer perceptron and had stated that these above-mentioned models are very easy for the implementation process, fast execution and need a small amount of training data for the proper functioning and has achieved 71.59%. Anisha et al. [17] extensively worked on image data set with the help of computeraided diagnosis (CAD) had applied SVM classifier as well as feature extraction and there they had shown results that support vector machine (SVM) cannot be implemented on a locally attributed optimal value which is sometimes not easy for understanding with an accuracy level of 81.7%. Kumar et al. [18] did research on the MRI cancer data set which had performed the clustering method of K-means which gives immense help in making the best decision in terms of base of experimentation on MATLAB and has achieved 82% accuracy rate. Haque et al. [19] proposed classification algorithms such as random forest, artificial neural network where these models are very much flexible and their accuracy level is high and has achieved a high percentage of accuracy around 75.86%. Patel et al. [20] proposed different classification algorithms from the mentioned above review work such as neural networks, support vector machine (SVM) and multilayer perceptron to make a two-way classification-based frame where threelayer system structure can be formed with 71.41% accuracy. Xian [21] proposed the introduction of fuzzy logic where it acts as a graphical structure where every point given in the input space has been mapped into the form of membership value lies within 0 and 1. It has been used for the classification of ultrasonic-based images of the liver alongside support vector machine (SVM) with a commendable level of accuracy of 97%. Tiwari et al. [22] proposed algorithms that are completely different from the above-mentioned researchers, i.e. data mining and association rule mining which resemble precise calculations of these classification algorithms and achieved the accuracy measure of 97.94%. Mala et al. [23] worked on magnetic resonance imaging (MRI) and computed tomography (CT) data set and concluded about the backpropagation as well as probabilistic neural network and results into the utilization of large form of databases becomes flexible and assures of the best performance for the improvement of whole system structure process. Luk et al. [24] examined algorithms like artificial neural network, decision trees with the help of MATLAB and concluded the relevant information which can deliver coding results even if there is any loss of data with the highest accuracy of 96.67%. Neshat et al. [25] proved that the use of artificial intelligence as well as fuzzy logic can be able to handle all verses of information that can help make solutions of complicated problems with accuracy measure of 91% with the help of MATLAB. Lin [26] thoroughly worked on ICD–9 Code data set which has been used in the form of medical billing and coding criteria for the description of liver diseases, injuries as well as liver-related any other symptoms with the help of classification and regression trees (CART) and case-based reasoning (CBR) techniques which have been used to stored tuples in the form of a database which will provide solutions to
Analysis of Liver Disorder by Machine Learning Techniques
591
the given problems and has the capability of giving best accuracy measure of 92.94% which builds the construction of rules that lies between predicted values and target values. Keltich et al. [27] discovered that the results of the formulation work are not much difficult and require less amount of statistical data for artificial intelligence and Naïve Bayes algorithms with the help of Waikato Environmental Knowledge Analysis (WEKA) with 79% level of accuracy. Stoean et al. [28] worked specifically on support vector machine (SVM) and genetic algorithms which help in the training of trained tuples and make them more flexible and making it as a part of self-reliant tuple which makes decision-making the process easier and more flexible and achieved the accuracy of 87%. Kant et al. [29] proposed the framework of K-means clustering and Atkinson Index where through K-means, it can be able to extract some meaningful information from a particular field of given data set with accurate precision and specificity measure of 88% and 93%, respectively. Vadali et al. [30] concluded the methods of support vector machine (SVM) and data mining techniques which are always used to produce extraction of images, the transmission of data from one field to another field for exploration of more results with the help of MATLAB of accuracy level 86.7%. Virmani et al. [31] initiated support vector machine (SVM), genetic algorithms with the help of computer-aided diagnosis (CAD) tool for the complete representation of liver sores present in every corner with achieving an accuracy of 88.8%. Pachauri et al. and Verma et al. [32–37] have used soft computing methods for making intelligent systems in context of control of fermentation process, formulations, predictions, etc.
4 Dataset Descriptions The Indian Liver Patient Disease (ILPD) data set [13] obtained from the UCI Repository was utilized to assess prediction based on algorithms in a target to reduce pressure on doctors or physicians. This data set is having ‘416 liver patient records’ and ‘167 non-liver patient records’ from the north-east direction of Andhra Pradesh, India. The data set column is being referred to as a class label that is being used to separate the groups into the patient having the liver disease or not. In this data set, there are a total of 584 samples of which 417 do not have liver disorder while 167 have it. The different types of the attributes of the ILPD data set is mentioned in Table 1. The patient whose age has exceeded 89 is listed as the category of being aged as ‘90’.
592
S. Pahari and D. K. Choubey
Table 1 Data set description S. no.
Attribute
Description
Attribute
1
Age
Age of the patient
Numeric
2
Gender
Gender of the patient
Nominal
3
Total bilirubin (TB)
Total quantity of total bilirubin in patient
Numeric
4
Direct bilirubin (DB)
Total quantity of direct bilirubin in patient
Numeric
5
ALKPHOS alkaline phosphotase
Total amount of ALP enzymes in patient
Numeric
6
SGPT alamine aminotransferase
Total amount of SGPT in patient
Numeric
7
SGOT aspartate aminotransferase
Total amount of SGOT in patient
Numeric
8
Total proteins (TP)
Total amount of protein present in Numeric patient
9
Albumin (ALB)
Total amount of Albumin (ALB) in patient
Numeric
10
Albumin and globular ratio (A/G)
Fractional value of albumin and globular in patient
Numeric
5 Proposed Methodology Classification of data consists of two stages: one is the training stage and the other is the testing stage. The training stage indicates that the classifier has been assembled by specific algorithms based on classification which is only used for the trained tuples, whereas in the testing stage, the analytical performance measure of given classifier has been assessed with the set of testing tuples. In our study, various classification algorithms have been taken into consideration which consists of random forest (RF), support vector machine (SVM) and AdaBoost classifier.
5.1 Random Forest Random forest is an ensemble learning method of carrying out both regression and the tasks which are related to the classification with the help of multiple kinds of decision trees and that process is known as bootstrap aggregation is popularly known as bagging method. The main objective behind this method is to add different types of decision trees and the combination of those decision trees which helps in detecting the final result instead of depending on a single decision tree.
Analysis of Liver Disorder by Machine Learning Techniques
5.1.1 1. 2. 3. 4.
593
Algorithm
Choose a non-specific ‘K’ data ends from the set of training tuples. Construct the decision tree which has been joined with ‘K’ data ends. Select the number of ‘N’ trees we want to construct and keep repeating step 1 and step 2. For the new as well as selective data end, construct each one of the ‘N’ number of trees which helps in predicting the result of given end named as ‘Y’ for the data end and allocate the new and fresh data end, then the calculation of average forms the result of predictable values of ‘Y’ end. The working of the proposed approach is as mentioned in Fig. 1.
5.2 Support Vector Machine Support vector machine (SVM) is firstly proposed by vapnik [38] in 1995. SVM is a supervised learning algorithm which analyses data, recognizes patterns and is used for classification as well as regression analysis. SVM constructs hyperplane or decision surface that classifies the data with a largest margin. The decision surface that maximizes the margin will minimize the generalization error. SVM classifies the linearly separable data. In this paper, linear SVM algorithm has been used.
5.2.1
Linear SVM
Linear SVM was developed to separate the two classes that belong to either one side of the margin of hyperplane or the other side. Given labelled training data as data points of the form: M = {(u 1 , v1 ), (u 2 , v2 ), · · · , (u n , vn )}
(1)
where vn = 1/−1, this constant denotes the class which will belongs to the point un , n = Number of data sample. Each un is a p-dimensional real vector or is a set of training tuples with associated class labels vn . The SVM classifier first maps the input vectors into a decision value and then performs the classification using an appropriate threshold value. To view the training data, the hyperplane is divided or separated, which can be defined as: Mapping: w T · u + b = 0
(2)
where w a p-dimensional vector or weight is vector and b is a scalar. The vector W points perpendicular to the separating hyperplane. The problem of finding the best
594
S. Pahari and D. K. Choubey
Fig. 1 Working of proposed approach
hyperplane among the number of separating hyperplanes can be solved by maximal marginal hyperplane. The offset parameter b allows increasing the margin. As is known that the training data is linearly separable, hyperplanes are constructed, and it tried to maximize the distance between hyperplanes. The distance between two hyperplane is 2/w, thus we need to minimize w by ensuring that for all i either. w.u i − b ≥ 1 or w.u i ≤ −1
(3)
Analysis of Liver Disorder by Machine Learning Techniques
595
To calculate the margin, two parallel hyperplanes are constructed, one on each side of the separating hyperplane, which is ‘pushed up amongst’ the two classes. A good separation is achieved by the hyperplane that has the largest distance to the neighbouring data points of both classes, since in general the larger the margin the lower the generalization error of the classifier. The hyperplane is found by using support vectors and margins.
5.3 AdaBoost Classifier Boosting is the kind of an ensemble learning process which helps in constructing the well-built classifier. It can be done by constructing the model with the help of weakly constructed models in a linear condition. At first, a given model has been constructed from the training portion of the given data. Then, the other model has been constructed which always attempts to rectify the mistakes from the first trained model. This method will be continued until all the trained models are added with the help of a complete training data set.
5.3.1
Data Learning from AdaBoost Model
AdaBoost is the best algorithm that is used to boost the measure of performance of decision-based trees on dual classification where problems based on this kind of classification have been resolved. AdaBoost has been used for boosting the performance of any classification algorithm based on machine learning. The use of the AdaBoost classifier is best when it comes to the consideration of weak learners. These models constitute near-perfect accuracy measures based on a classification problem. AdaBoost algorithm is one of the most common as well as best suited algorithms where decision trees are of one level because in these cases, these decision trees are generally short and usually contains single decision process for the method of classification and popularly known as decision stumps. Weight(xi ) =
1 n
(4)
where x i is the ith rate of training instance and ‘n’ is the number of instances based on training. The trained model has a misclassification rate. It can be calculated as: Error =
(Correct − N ) N
(5)
Here, error constitutes the misclassification rate, and the number of training instances is corrected with the predicted value. Modified use of the weighted form of training instances
596
S. Pahari and D. K. Choubey
Error =
Sum(W (i) ∗ terror (i)) Sum(W )
(6)
In the wake of training a classifier, AdaBoost relegates weight to every misclassified item that is assigned with higher weight so that it shows up in the preparation subset of the next classifier with higher probability. H (x) = sign
T
αth(t)x
(7)
λ=1
After every classifier is prepared, the weight is appointed to the classifier to dependent on accuracy. A progressively precise classifier is assigned higher weight with the goal that it will have more effect in the ultimate result. Dt+i =
Di (1) exp(−αt yi ht (xi )) Zt
(8)
6 Experimental Results and Discussion In this study, we have done an analysis of the Indian Liver Patient Disease (ILPD) data set [13]. In this section, results obtained from the proposed algorithms, i.e. AdaBoost classifier, support vector machine (SVM) and random forest (RF), are discussed. In this experimental implementation, the whole data set has been divided into the categorization of the training data set and the testing data set. The experimental study has been done in Python programming language and libraries used are pandas and scikit learn. After the implementation, we can determine that the accuracy measure of random forest, support vector machine and AdaBoost classifier is 71.83%, 70.68% and 66.09%, respectively, as shown in Table 2. So, random forest is the best model to determine the correct prediction of liver disorder. Executing applied algorithms as well as their calculations by utilizing various parameters like accuracy, specificity, sensitivity, positive prediction value (PPV) and negative prediction value (NPV), our research work had completely focused on the ‘accuracy’ parameter which gives the optimum value of our research work. Below are the attributes to determine the accuracy: Table 2 Accuracy of the algorithms
Algorithms
Accuracy (%)
ROC
Random forest
93
0.92
Support vector machine
79.14
0.79
AdaBoost
73.42
0.74
Analysis of Liver Disorder by Machine Learning Techniques
597
True negative (TN): Total number of cases successfully differentiated as fit. False negative (FN): Total number of cases inaccurately differentiated as fit. True positive (TP): Total number of cases thriving differentiated as fit. False positive (FP): Total number of cases inaccurately differentiated as fit. Accuracy: The accuracy or precision is defined as the ability to isolate the patient and sound cases precisely. We must ascertain true positive and true negative in extent for all assessed cases. The mathematical formula which is being represented to define what is accuracy is: Accuracy =
TP + TN TP + FP + TN + FN
In Table 2, random forest provides the highest accuracy and ROC. So, the ROC figure of random forest is shown in Fig. 2. The comparisons of our proposed methods with existing methods are as mentioned below in Table 3. The main objective or the motivation behind our research work is to vary the performance measures of different machine learning classification algorithms for a distinct type of data sets. We can observe from our related work section discussed above. The number of machine learning algorithms has been applied by many researchers on distinct data sets. Very few research scholars had stated that individual models
Fig. 2 ROC of random forest
598 Table 3 Comparison of proposed methods with existing methods for Liver disorders data set
S. Pahari and D. K. Choubey Source
Method
Accuracy (%)
Author’s Study
Random forest
93
Support vector machine
79.14
AdaBoost
73.42
LSSVM
60.00
LSSVM with fuzzy weighting
94.29
Liver FRPCA1
68.25
Liver FRPCA2
67.88
Liver FRPCA3
70.25
Liver original
63.09
Liver original
66.50
AWAIS
70.17
GA-AWAIS
85.21
Comak et al. [39]
Luukka [40]
Ozsen and Gunes [41] Polat et al. [42] Seera and Lim [43]
AIRS
81.00
Fuzzy-AIRS
83.38
FMM
67.25
FMM-CART
92.61
used to perform in the best way in terms of performance measure. But, on the other side, few researchers had claimed that integrated models used to perform better in comparison to individual models. Our study had examined different models related to machine learning and performs accurate as well as sensitive analysis of these kinds of disease diagnoses. Random forest (RF) had outperformed the other classification algorithms. It is not essential that these results will be the same for all distinct data sets. Some research specialists had stated artificial neural network (ANN) is much better in giving the best accuracy while performing with respect to the given data set. The performance of the classification algorithm completely depends on the way of training and testing of the data set. The above discussion that we have discussed so far is that accuracy of any classification model can be improved, and it can or cannot be improved as the accuracy of the model sometimes depends on the basic components which can be chopped down.
7 Conclusion and Future Work Great innovation as well as the advancement of technology in the medical field had made the work of research very easy and compatible. Nowadays, doctors, specialists and other health workers face some issues for the diagnosis of liver disorder. In
Analysis of Liver Disorder by Machine Learning Techniques
599
this research work, for the detection of liver disorder more accurate and effective, classification models have been determined which indicate the concerned person is suffering from a liver disorder or not. The performance of classification models has been detected by a parameter named as accuracy. After experimental results, we can conclude that specific classification algorithm cannot predict high accuracy measure results for all distinct liver data sets. Another valid point we can conclude that it is not necessary that the integrated type of models will not always give a better performance level in comparison to the individual model. So, no algorithm whether its integrated one or the individual one can be said ideal. The performance measure of the classification algorithm is completely dependent on the type of data set, observation details and dimensionality. The proposed models may be used to help clinical professionals in the exact analysis of liver infection adequately and proficiently. The kinds of implemented classifier methods may be valuable for some specialists to distinguish the opportunity of infection and to give an appropriate clinical remedy to treatment. Numerous specialists are utilizing these kinds of machine learning strategies to unravel these clinical difficulties and afterward dissect classification algorithms to decide the utilization of best models based on the data sets and research centre elements. The used methodology may be applied in some other medical diseases.
References 1. Choubey, D.K., Paul, S.: Classification techniques for diagnosis of diabetes: a review. Int. J. Biomed. Eng. Technol. (IJBET) 21(1), 15–39 (2016) 2. Choubey, D.K., Paul, S., Sandilya, S., Dhandhania, V.K.: Implementation and analysis of classification algorithms for diabetes. Curr. Med. Imag. Rev. 16(4), 340–354 (2020) 3. Choubey, D.K., Paul, S.: GA_MLP NN: a hybrid intelligent system for diabetes disease diagnosis. Int. J. Intell. Syst. Appl. (IJISA) 8(1), 49–59 (2016) 4. Choubey, D.K., Paul, S.: GA_RBF NN: a classification system for diabetes. Int. J. Biomed. Eng. Technol. (IJBET) 23(1), 71–93 (2017) 5. Choubey, D.K., Tripathi, S., Kumar, P., Shukla, V., Dhandhania, V.K.: Classification of diabetes by Kernel based SVM with PSO. Recent Patents Comput. Sci. 12(1), 1–14 (2019) 6. Choubey, D.K., Kumar, M., Shukla, V., Tripathi, S., Dhandhania, V.K.: Comparative analysis of classification methods with PCA and LDA for diabetes. Curr. Diab. Rev. 16(1), 1–18 (2020) 7. Choubey, D.K., Kumar, P., Tripathi, S., Kumar, S.: Performance evaluation of classification methods with PCA and PSO for diabetes. Netw. Modeling Anal. Health Inf. Bioinf. 9(1), 1–30 (2019) 8. Choubey, D.K., Paul, S., Dhandhania, V.K.: Rule Based Diagnosis System for Diabetes. Biomed. Res. 28(12), 5196–5209 (2017) 9. Choubey, D.K., Paul, S.: GA_SVM-A classification system for diagnosis of diabetes. In: Handbook of Research on Nature Inspired Soft Computing and Algorithms, pp. 359–397. IGI Global (2017) 10. Choubey, D.K., Paul, S., Dhandhania, V.K.: GA_NN: an intelligent classification system for diabetes. In: Springer Proceedings AISC Series, 7th International Conference on Soft Computing for Problem Solving-SocProS 2017, Indian Institute of Technology, Bhubaneswar, India, 23–24 Dec (2017). Chapter 2: Soft Computing for Problem Solving. Advances in Intelligent Systems and Computing 817, Vol. 2, pp. 11–23. Springer (2019)
600
S. Pahari and D. K. Choubey
11. Bala, K., Choubey, D.K., Paul, S.: Soft computing and data mining techniques for thunderstorms and lightning prediction: a survey. In: International Conference of Electronics, Communication and Aerospace Technology (ICECA 2017), vol. 1, pp. 42–46. IEEE, RVS Technical Campus, Coimbatore, Tamil Nadu, India, 20-22 Apr 2017 12. Bala, K., Choubey, D.K., Paul, S., Lala, M.G.N.: Classification techniques for thunderstorms and lightning prediction-a survey. In: Soft Computing-Based Nonlinear Control Systems Design, pp. 1–17. IGI Global (2018) 13. Dataset URL. https://archive.ics.uci.edu/ml/datasets/ILPD+(Indian+Liver+Patient+Dataset) 14. Ramana, B.V., Boddu, R.S.K.: Performance comparison of classification algorithms on medical datasets, p. 01400145. IEEE (2019) 15. Lin, J.J., Chang, P.-C. (2010). A particle swarm optimization based classifier for liver disorders classification, pp. 63–65. IEEE (2010) 16. Veena, G.S., Sneha, D., Basavaraju, D., Tanvi, T.: Effective analysis and diagnosis of liver disorder, pp. 86–90. IEEE (2018) 17. Anisha, P.R., Reddy, C.K.K., Prasad, L.V.N.: A pragmatic approach for detecting liver cancer using image processing and data mining techniques, pp. 352–357. IEEE (2015) 18. Kumar, S.S., Moni, R.S., Rajeesh, J.: Liver tumor diagnosis by gray level and countourlet coefficients texture analysis, pp. 557–562. IEEE, (2012) 19. Haque, R., Islam, M., Sumon Reza, M., Hasan K.: Performance evaluation of random forests and artificial neural networks for the classification of liver disorder (2018) 20. Patel, O.P., Tiwari, A.: Liver disease diagnosis using quantum-based binary neural network learning algorithm, pp. 425–434. Springer (2015) 21. Xian, G.-M.: An intelligent method of malignant and benign liver tumors for ultrasonography based on GLCM texture features and fuzzy SVM, pp. 6737–6741. Elsevier (2010) 22. Tiwari, M., Chakrabarti, P., Chakrabarti, T.: Performance analysis and error evaluation towards the liver cancer diagnosis using lazy classifiers for ILPD, pp. 161–168. Springer (2018) 23. Mala, K., Sadasivam, V., Alagappan, S.: Neural network-based texture analysis of CT images for fatty and cirrhosis liver classification, pp. 80–86. Elsevier (2015) 24. Luk, J.M., Lam, B.Y., Lee, N.P.Y., Ho, D.W., Sham, P.C., Chan, L., Peng, J., Leng, X., Day, P.J., Fan, S.-T.: Artificial neural networks and decision tree model analysis of liver cancer proteomes, pp. 68–73. Elsevier (2007) 25. Neshat, M., Yaghobi, M., Naghobi, M.B., Esmaelzadeh, A.: Fuzzy expert system design for diagnosis of liver disorders, pp. 252–256. IEEE (2008) 26. Lin, R.-H.: An intelligent model for liver disease diagnosis, pp. 53–62. Elsevier (2009) 27. Keltich, B., Lin, Y., Bayrak, C.: Comparison of AI techniques for prediction of liver fibrosis in hepatitis patients, pp. 1–8. Springer (2014) 28. Stoean, R., Stoean, C., Lupsor, M., Stefanescu, H., Badea, R.: Evolutionary-driven support vector machines for determining the degree of liver fibrosis in chronic hepatitis C, pp. 53–65. Elsevier (2011) 29. Kant, S., Ansari, I.A.: An improved K-means clustering with Atkinson index to classify liver patient dataset, pp. 222–228. Springer (2016) 30. Vadali, S., Deekshitulu, G.V.S.R., Murthy, J.V.R.: Analysis of liver cancer using data mining SVM algorithm in MATLAB, pp. 163–175, Springer (2019) 31. Virmani, J., Kumar, V., Kalra, N., Khandelwal, N.: SVM-based characterization of liver ultrasound images using wavelet packet texture descriptors, pp. 530–543. Springer (2013) 32. Pachauri, N., Singh, V., Rani, A.: Two degree of freedom PID based inferential control of continuous bioreactor for ethanol production. ISA Trans. 68, 235–250 (2017) 33. Pachauri, N., Rani, A., Singh, V.: Bioreactor temperature control using modified fractional order IMC-PID for ethanol production. Chem. Eng. Res. Des. 122, 97–112 (2017) 34. Pachauri, N., Singh, V., Rani, A.: Two degrees-of-freedom fractional-order proportional–integral–derivative-based temperature control of fermentation process. J. Dyn. Syst. Meas. Control 140(7), (2018) 35. Pachauri, N., Yadav, J., Rani, A., Singh, V.: Modified fractional order IMC design based drug scheduling for cancer treatment. Comput. Biol. Med. 109, 121–137 (2019)
Analysis of Liver Disorder by Machine Learning Techniques
601
36. Verma, O.P., Manik, G., Jain, V.K.: Simulation and control of a complex nonlinear dynamic behavior of multi-stage evaporator using PID and Fuzzy-PID controllers. J. Comput. Sci. 25, 238–251 (2018) 37. Verma, O.P., Mohammed, T.H., Mangal, S., Manik, G.: Minimization of energy consumption in multi-stage evaporator system of Kraft recovery process using interior-point method. Energy 129, 148–157 (2017) 38. Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer-Verlag, New York Inc, New York, NY, USA (1995) 39. Comak, E., Arslan, A., Turkoglu, I: A decision support system based on support vector machines for diagnosis of the heart valve diseases. Comput Biol Med. 37, 21–7 (2007). https://doi.org/ 10.1016/j.compbiomed.2005.11.002 40. Luukka, P.: Classification based on fuzzy robust PCA algorithms and similarity classifier. Expert Syst Appl. 36, 7463–7468 (2009). https://doi.org/10.1016/j.eswa.2008.09.015 41. Öz¸sen, S., Güne¸s, S.: Attribute weighting via genetic algorithms for attribute weighted artificial immune system (AWAIS) and its application toheart disease and liver disorders problems. Expert Syst. Appl. 36, 386–392 (2009). https://doi.org/10.1016/j.eswa.2007.09.063 42. Polat, K., Gunes, S.: An expert system approach based on principal component analysis and adaptive neuro-fuzzy inference system to diagnosis of diabetes disease. Digital Signal Proc 17, 702–710 (2007), Elsevier 43. Seera, M., Lim, C.P.: A hybrid intelligent system for medical data classification. Expert Systems with Applications: Int J. 41, 2239-2249 (2014). https://doi.org/10.1016/j.eswa.2013.09.022
Various Techniques of Image Segmentation Reshu Agarwal , Annu Malik , Tanya Gupta , and Shylaja VinayKumar Karatangi
Abstract Image segmentation techniques are essential for any kind of digital image or picture analysis. This paper suggests different methods of image segmentation like threshold, clustering, matching, and edge. Additionally, one method is proposed whose goal is to make the features of image usable to create segments for efficient processing techniques like recognition or compression. This method is used to find the objects and the images edges. Images are segmented or divided into components on the basis of similar characteristics of the pixels. Further, results obtained from the proposed method are a set of outlines that are taken out from the image by setting precise value of threshold. Keywords Image segmentation · Picture · Edge segmentation · Thresholding
1 Introduction The sub-division or partitioning of an image or picture into substantial components, known as image segmentation, is commonly an integral and important step in digital image analysis, digital illustration, medical images analysis and application areas. During this paper, we are going to specialize in strategies that notice the actual pixels that form up associate object. In the history of digital image processing, there have been a great number of techniques for segmentation purposes. We need to categorize these techniques for evaluating the strategies properly. Different techniques can use the same image features in a unique way to categorize image segments differently and defy any kind of singular categorization. Image segmentation is the process of attaching labels to each image segment, such as pixels with a static label attribute. The result of this process may be a series of parts covering the whole picture or number of outlines which are taken out from the picture. All pixels in a certain area R. Agarwal (B) Amity Institute of Information Technology, Amity University, Noida, India A. Malik · T. Gupta · S. V. Karatangi G.L. Bajaj Institute of Technology and Management, Greater Noida, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_49
603
604
R. Agarwal et al.
Fig. 1 a Normal image, b segmented image
are comparable with the relevance of certain calculated characteristics or properties, such as shade, structure and sharpness. Bordering regions differ considerably with the relevance constant. Practiced to a typical image pack of medical images, the following contours, after obtaining image segmentation, generate 3D reconstructions using interpolation algorithms such as cubic execution. The purpose of image segmentation is to group the pixels in a cluster. Segmentation can be used for vision, estimation of occlusion limits in motion intervals or for stereo systems, compression, image processing or information retrieval. We take into account bottom-up picture division. That is, we tend to ignore (top down) contributions from seeing within the division method. For input, we tend to primarily take into account picture brightness, here, though similar techniques are often used with color, motion, and stereo inequality information. An example—In the following picture, concept of division can be understood properly. In the following, Fig. 1a presents the early image, whereas (b) presents the segmented one. There are different methodologies that are accepted and executed in the field of image segmentation. Different image format often requires numerous methods to segment the image. Some of the methods are described below with their suitable explanation.
1.1 Thresholding The simplest technique of splitting images is known as a threshold technique. In a binary image, this technique shows the grayscale image which depends on the threshold value or the clip value, and has a stabilized bar graph. The major thing of this technique is green value (or the values after areas with different levels and selected units). Many stylistic style units are used in a company as well as the large
Various Techniques of Image Segmentation
605
entropy technique, the Otsu technique (maximum variable) and mean K. Recently, portal strategies have been developed for CT imaging. The most important plan, unlike Otsu’s technique, is that the threshold zone unit is derived from the X-rays rather than the image (recreated). The recent strategies recommended the use of nonlinear multidimensional portals based on obscure rules. In these works, each pixel member requires a section based on multidimensional rules derived from formal logical algorithms and biological processes to support the atmosphere and application of image exposure (photo).
1.2 Clustering Methods There is a K means formula that is an unvaried technique which split a picture into K sets. The fundamental formula is: 1. 2. 3. 4.
Select K groups, randomly or admit a heuristic technique, such as K-medias. To the cluster assign every pixel of the image and keep the space between pixels and hence the cluster to a minimum center. Recalculate the centers of the cluster with the average of the pixels in the cluster. Repeat the above two steps until you reach the scenes (i.e., no changes in the pixel group). In this case, the distance is the square or absolute difference between a photographic center and a grouped center. The difference is usually based on location, intensity, texture and color, or a weighted combination of these factors. K is selected manually, randomly or heuristically. This formula is certainly satisfactory, but an optimal resolution cannot be restored. The response standard depends on the first set of groupings and, therefore, on the price of K. A pictorial representation of this technique is shown in Fig. 2.
Fig. 2 a Source image, b after applying K-means having K as 16
(a)
(b)
606
R. Agarwal et al.
(a)
(b)
Fig. 3 a Original image, b image after edge detection
1.3 Edge Detection Edge detection could be a well-developed area of image processes. The boundaries of the area and the boundaries are fully connected as there is usually a sudden change in magnitude at the borderlines of the area. Edge detection techniques were used because of another segmentation technique. Known edges for edge detection are usually broken. To cut an object into a picture, but you want to close the boundaries of the area. The necessary outlines are the borderlines between these objects or a spatial taxon. The spatial taxon is the information granule that consists of a clear, integrated area at abstraction levels between an integrated and hierarchical scene. Contour detection strategies apply to the space taxon zone in the same way that they would apply to a silhouette. In Fig. 3a, there is an original image of a square puzzle, and after applying the edge–detection technique, we can see a silhouette of the actual image that is the outlines of the edges of the actual image in Fig. 3b.
1.4 Dual Clustering Method This methodology can be a fusion of three image attributes: The partition of the image-based histogram analysis is verified by a large number of clusters (objects) and a high gradient of their boundaries. Two zones must be introduced for this: One zone is the one-dimensional graph bar of H = H(B), second zone—double zone of the third dimension of the first image B = B(x, y). You can stay in the main part, but the brightness of the image is calculated based on the calculated circle of the agglomeration limit. For each T, the live MDC = G/(kL) must be calculated (where k is the difference in brightness between the object and the background, L is the length of all edges, and G is the average slope slope). Most MDC defines the segmentation.
Various Techniques of Image Segmentation
607
2 Literature Survey There are many techniques which have been developed for image segmentation [1]. The segmentation’s objective is to place regions with associated signs, such as intensity, texture, color and shape. Picture division is one among the most extensively studying issues in image processing. There are completely automatic approaches which are proposed by many experimenters and even then satisfactory results are not obtained for many cases [2]. The application of image segmentation is broadly used in medical image processing, face detection, pedestrian detection, etc. Image segmentation techniques are often used in pattern recognition and categorization of images in various fields such as agriculture, medicine and medicine [3]. Poor quality of individual images can lead to false and inaccurate results. Image segmentation is an image processing process involving a digital image. This subdivision is divided into parts of a set of pixels or semantic intervals. Image segmentation is an important process in all areas, especially in medical imaging, with excellent clinical value, which improves organ position or pathology and improves diagnostic quality. Picture registration has a very important role in various numbers of practical problems in variety of fields like remote sensing, medical areas and computer vision. In medical areas, image segmentation is mainly used for diagnosis like CT scans and MRI [4]. In the case of skin cancer, there is a need to capture the images of the skin lesions using a dermatoscope. This can be done by segmentation process. There are different ways in which segmentation of these dermoscopic pictures of the skin lesions can be achieved [5]. Speed, accuracy and computational efficiency are the main aim of the image segmentation. A number of algorithms and techniques are developed for the picture segmentation. Segmentation of images is used for object match further. There are many techniques but the edge–detection technique and the threshold technique are the most appropriate techniques for the image segmentation [6]. In segmentation process, an image can be divided into distinct arenas to the purpose of extracting objects of interests from the background of the image [7]. There are too many traditional techniques of image segmentation which are widely used of threshold and cluster segmentation are Otsu and K–Means. One of the most broadly studied problems is image segmentation in the image processing [8]. The division of image or picture division is considered as the most fundamental problems for computers vision. Picture division or image segmentations is basically done to divide an image or to divide an images into areas of interest of with areas of coherent properties. The things in the image of outdoor scenes [9] will be divided into two broad categories; those are structured objects and the unstructured objects. People and vehicles will come under the structured things and whereas the ground or the sky or the trees will come under the unstructured things or objects [10]. So we can divide an image of the any outdoor scene objects into two types which are structured and unstructured types [11]. The unstructured will have the ground, sky or trees, etc. and whereas the structured will have proper shaped objects like vehicles or people. Many methods or techniques have been developed for the picture segmentation or picture division [12]. The Otsu and the K–Means are widely used in the traditional threshold
608
R. Agarwal et al.
and cluster segmentation techniques, respectively. Image segmentation is used for dividing the images for matching the objects among two images. Image segmentation is done with the objective of dividing the regions of the image according to their intensity of different regions [13]. If the image segmentation is done by well, then all another stages in images analysis are made simplest. But, as we can see, success is often only partial when automatic segmentations algorithms are used [14]. However, manual interventions could usually overcome this problem, and by this stage, the systems should already have done most of the part when automatic segmentations algorithms is used [15]. However, manual interventions could usually overcome these problem, and by this stage, the systems should already have done most of the part [16]. In the edge-based segmentation, there are many algorithms introduced which draw images that are semi-automatic, and the rough lines of scientists are smoothed to increase the criteria of image. Also the edge–detection could be made as fullyautomatic but it not necessarily is fully successful. When there is less overlapping in the distribution of the image pixels, then in this case, the threshold technique is the most successful [17]. The process of image segmentations is a critical task in images analysis. Its result is a set of parts covering the entire picture. The characteristics of the images are extracted. As it is a growing filed, many researchers are attracted to work on this field to get new inventions [18–20].
3 Proposed Work Division of pictures or the images is the action of subdividing a picture into many group of pixels areas referring to some appropriate applications. It is based on measures took from the pictures and may be of depth, gray level, brightness, appearance or movement. A set of parts which combine covering the whole picture are the output of this action. All the smallest components of the picture in an area are same on the basis of few qualities or properties, like shade, brightness or appearance. Adjoining area differs on the basis of their same properties. As of now, we know that the threshold technique is the best technique of image segmentation which divides the image according to the amount of thresholds. But we cannot obtain any outline by using this image. Edge detection gives some like a drawing of line of an image. The proposed technique is known as the TED technique which is a combination of threshold and edge detection techniques. In this technique, first we will use the edge detection on the image, and then, we will be dividing the image using threshold technique. We have taken an image that is shown in Fig. 4a, and this is the original picture on which first edge–detection technique will be applied. After applying the edge– detection technique, a gray-scaled image will be formed as shown in Fig. 4b. Now, on the gray–scaled image, we will apply the threshold technique by taking the six different threshold values.
Various Techniques of Image Segmentation
(a)
609
(b)
Fig. 4 a Actual picture, b gray-scaled picture
4 Results The image is fragmented utilizing a progression of choice, and there is no universal segmentation strategy for a wide range of pictures, and furthermore, an image can be sectioned by utilizing unique segmentation strategies. Results obtained from our proposed method suggest that whenever the value of threshold T is small, then 0.65 or bigger than 0.70 after that the outline was not discovered clearly. But when after that threshold values is lying between 0.65 and 0.70 after that the outline or edges are discovered clearly, it is found that the precise borders are coming on T = 0.68 as shown in Fig. 5.
5 Conclusion and Future Scope This paper discusses most methods for evaluating and comparing image segmentation. A strategy for classifying method is specified. We have studied different types of methods comparatively. The estimate of the segmentation is necessary to improve the performance of the current segmentation algorithm and to develop new powerful partition algorithms. This study tries to induce the work in this flow. Some results were obtained from this study with regard to the implementation of various evaluation techniques. So far we do not have any general principle of segmentation, and for calculating the performance of the algorithms of segmentation, empirical techniques are more appropriate and useful than analytical techniques. Discrepancy methods are better under empirical method, and it is comparatively better than the quality methods. Moreover, a new technique, namely the TED technique, is also introduced
610
R. Agarwal et al.
Fig. 5 Segmenting pictures by various threshold values for proposed technique
which stands for threshold edge detection technique. Results obtained from this method determine that our method is more accurate for image segmentation. Future studies on this topic can be done by proposing enhanced methods in which all the characteristics of the images should be considered.
Various Techniques of Image Segmentation
611
References 1. Anghelescu, P., Iliescu, V.G., Mara, C., Gavriloaia, M.: Automatic thresholding method for edge detection algorithms. In: 8th International Conference on Electronics, Computers and Artificial Intelligence (ECAI), pp. 1–4. IEEE, Ploiesti, Romania (2016). https://doi.org/10. 1109/ECAI.2016.7861099 2. Ju, Z.-W., Chen, J.-Z., Zhou, J.-L.: Image segmentation based on edge detection using K-means and an improved ant colony optimization. In: International Conference on Machine Learning and Cybernetics, pp. 297–303. IEEE, Tianjin (2013). https://doi.org/10.1109/ICMLC.2013. 6890484 3. Jain, N., Lala, A.: Image segmentation: a short survey. In: 4th International Conference on the Next Generation Information Technology Summit, pp. 380–384. IEEE, Noida, India (2013). https://doi.org/10.1049/cp.2013.2345 4. Li, Z., Yang, Z., Wang, W., Cui, J.: An adaptive threshold edge detection method based on the law of gravity. In: 25th Chinese Control and Decision Conference, pp. 897–900. IEEE, Guiyang, China (2013). https://doi.org/10.1109/CCDC.2013.6561050 5. Lei, W., Man, M., Shi, R., Liu, G., Gu, Q.: Target detection based on automatic threshold edge detection and template matching algorithm in GPR. In: IEEE 3rd Advanced Information Technology, Electronic and Automation Control Conference, pp. 1406–1410. IEEE, Chongqing, China (2018). https://doi.org/10.1109/IAEAC.2018.8577508 6. Mo, S., Gan, H., Zhang, R., Yan, Y., Liu, X.: A novel edge detection method based on adaptive threshold. In: IEEE 5th Information Technology and Mechatronics Engineering Conference, pp. 1223–1226. IEEE, Chongqing, China (2020). https://doi.org/10.1109/ITOEC49072.2020. 9141577 7. Thakkar, M., Shah, H.: Edge detection techniques using fuzzy thresholding. In: World Congress on Information and Communication Technologies, pp. 307–312. IEEE, Mumbai, India (2011). https://doi.org/10.1109/WICT.2011.6141263 8. ElAraby, W.S., Madian, A.H., Ashour, M.A., Farag, I., Nassef, M.: Fractional edge detection based on genetic algorithm. In: 29th International Conference on Microelectronics, pp. 1–4. IEEE, Beirut, Lebanon (2017). https://doi.org/10.1109/ICM.2017.8268860 9. Li, Z., Wang, J.: An adaptive corner detection algorithm based on edge features. In: 10th International Conference on Intelligent Human-Machine Systems and Cybernetics, pp. 191– 194. IEEE, Hangzhou, China (2018). https://doi.org/10.1109/IHMSC.2018.10150 10. Jie, G., Ning, L.: An improved adaptive threshold canny edge detection algorithm. In: International Conference on Computer Science and Electronics Engineering, pp. 164–168. IEEE, Hangzhou, China (2012). https://doi.org/10.1109/ICCSEE.2012.154 11. Liang, Y., Zhang, M., Browne, W.N.: Image segmentation: a survey of methods based on evolutionary computation. In: Dick, G., et al. (eds.) Simulated Evolution and Learning. Lecture Notes in Computer Science, vol. 8886, pp. 847–859. Springer, Cham (2014). https://doi.org/ 10.1007/978-3-319-13563-2_71 12. Chouhan, S.S., Kaul, A., Singh, U.P.: Image segmentation using computational intelligence techniques: review. Arch. Comput. Meth. Eng. 26, 533–596 (2019). https://doi.org/10.1007/ s11831-018-9257-4 13. Liu, X., Deng, Z., Yang, Y.: Recent progress in semantic image segmentation. Artif. Intell. Rev. 52, 1089–1106 (2019). https://doi.org/10.1007/s10462-018-9641-3 14. De, S., Bhattacharyya, S., Chakraborty, S., Dutta, P.: Image segmentation: a review. In: Hybrid Soft Computing for Multilevel Image and Data Segmentation. Computational Intelligence Methods and Applications, pp. 29–40. Springer, Cham (2016). https://doi.org/10.1007/978-3319-47524-0_2 15. Suresh, K., Srinivasa Rao P.: Various image segmentation algorithms: a survey. In: Satapathy, S., Bhateja, V., Das, S. (eds.) Smart Intelligent Computing and Applications. Smart Innovation, Systems and Technologies, vol. 105, pp. 233–239. Springer, Singapore (2019). https://doi.org/ 10.1007/978-981-13-1927-3_24
612
R. Agarwal et al.
16. Bilbro, G.L., White, M., Snyder, W.: Image segmentation with neurocomputers. In: Eckmiller, R., v.d. Malsburg, C. (eds.) Neural Computers. Springer Study Edition, vol. 41, pp. 71–79. Springer, Berlin, Heidelberg (1989). https://doi.org/10.1007/978-3-642-83740-1_9 17. Chouhan, S.S., Kaul, A., Singh, U.P.: Soft computing approaches for image segmentation: a survey. Multimedia Tools Appl. 77, 28483–28537 (2018). https://doi.org/10.1007/s11042-0186005-6 18. Dautaniya, A.K., Sharma, V.: High-performance fuzzy C-means image clustering based on adaptive frequency-domain filtering and morphological reconstruction. In: Pant, M., Sharma, T., Verma, O., Singla, R., Sikander, A. (eds.) Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computing, vol. 1053, pp. 1221–1234. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-0751-9_112 19. Kalbande, D.R., Khopkar, U., Sharma A., Daftary, N., Kokate, Y., Dmello, R.: Early stage detection of psoriasis using artificial intelligence and image processing. In: Pant, M., Sharma, T., Verma, O., Singla, R., Sikander, A. (eds.) Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computing, vol. 1053, pp. 1199–1208. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-0751-9_110 20. Sharma, M.S., Sharma, J., Atre, D., Tomar, R.S., Shrivastava, N.: Image fusion and its separation using SVD-based ICA method. In: Pant, M., Sharma, T., Verma, O., Singla, R., Sikander, A. (eds.) Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computing, vol. 1053, pp. 933–946. Springer, Singapore (2020). https://doi.org/10.1007/978981-15-0751-9_87
Fog–Cloud-Assisted Internet of Things: A Review of Workload Allocation and Latency Management Techniques Upma Arora and Nipur Singh
Abstract The vast amount of data generated from the various Internet of things (IoT) applications and management of such applications has become a major concern for researchers. Cloud computing can manage these situations but the distance between such applications and cloud data centres create havoc when latency is concerned. For handling such scenarios, where cloud alone cannot handle latency sensitive and realtime data analytics, the role of fog computing comes into the picture. Fog computing works in between the cloud computing and the IoT applications. Working as an intermediary it provides resource management, infrastructure monitoring, and data management. Sensors and actuators provide additional monitoring components for IoT applications like health care, surveillance, etc. This paper discusses the problems with the existing cloud infrastructure as far as IoT application deployment is concerned and how fog computing assists IoT applications for the smooth running. The recent developments specifically related to workload allocation and latency management are the highlight of this paper. Keywords Fog computing · Internet of things (IoT) · Latency management · Cloud computing · Workload allocation
1 Introduction In today’s era, everyone wants things to be connected, which in technological terms is called as Internet of things (IoT). IoT connects things in a way which provides solutions to our day to day problems and makes our fast moving life much easier, but, connecting only things does not serve the whole purpose. So, CISCO came up with the idea of Internet of everything (IoE), which involves connecting not only things but also people, process and data in a more relevant and useful way than ever [1]. With this much coming in the picture, there comes the necessity of managing huge amount of data, workload management and that too within the latency deadlines. U. Arora (B) · N. Singh Department of Computer Science, Gurukul Kangri Vishwavidhyalaya, Haridwar, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_50
613
614
U. Arora and N. Singh
These said requirements made the researchers to think of extending the underlying cloud computing architecture, which provides storage, computing and infrastructure services but with the bottleneck of latency. The extended cloud computing infrastructure is known as fog computing, which is distributed in nature and provides services at the edge of the network [1]. The latency sensitive IoT applications like healthcare monitoring, security, emergency response, etc., can now be deployed on the combined fog–cloud infrastructure so as to reduce the delay and congestion caused by depending on cloud alone. To better understand the fog–cloud infrastructure, we have mainly focused on two perspectives.
1.1 Workload Allocation in Combined Fog–Cloud Environment The workload management and allocation of workload in the combined fog–cloud infrastructure becomes of very much necessity when various devices try to access the IoT applications deployed in such environment all at the same time. There comes a need of proper strategy to assign the workload in such a way so that no device requiring the resources should wait longer than the previously agreed time limits and that too with the proper utilization of resources.
1.2 Latency Management in Combined Fog–Cloud Environment Latency in the fog–cloud infrastructure is the time elapsed between a request of the IoT user and response from the fog–cloud data server. To effectively leverage the capabilities of this combined setup, latency management is the key. The rest of the paper is organized as follows: Sect. 2 points the problems with the cloud computing infrastructure. Section 3 describes the distinguished characteristics of the fog computing. The need for the existence of cloud at the core is given in Sect. 4, whereas, challenges for adopting fog computing infrastructure is listed in Sect. 5. Finally, Sect. 6 concludes the paper.
2 Problems with the Cloud Computing Infrastructure Cloud infrastructure is capable enough to provide software, infrastructure and platform as a service ranging from the very small to the huge requirements of the cloud users. The problem arises when these requirements become very much latency sensitive and that too involving decision making. One of such situations is running IoT
Fog–Cloud-Assisted Internet of Things: A Review of Workload Allocation …
615
applications in the cloud infrastructure. Following are the problem areas with the available cloud infrastructure.
2.1 Distance Between Users and Cloud Data Centre With the existing cloud infrastructure, the distance between the users and the cloud data centres becomes the reason of mainly two problems: Latency. For latency sensitive applications, the huge distance between the user and the data centres creates problem when the response is required in very less time. So with this much less latency requirements, cloud data centre alone cannot function properly [2]. Network Congestion. The more distance, the more are the chances of congestion in the network which leads to slower data transfer rate and ultimately poor quality of experience. Network congestion can be eliminated by putting distributed smaller nodes in the network which can perform the task at the modules level.
2.2 Problem of Scalability Generally, the infrastructure of cloud involves very large servers with high capacity of storage and computation along with the service providers that cater to the need of the users. Once the complete infrastructure is ready, making changes to it for scaling up to the requirements becomes very troublesome.
2.3 Real-Time Analytics and Decision Making Almost all the applications require real-time analytics to some extent so that quick decision can be taken in time, which in turn requires fast calculations and quick response. This is only possible if the nodes of cloud are at less distance from the user.
2.4 Big Data Management With the current cloud computing infrastructure, all the data needs to be sent to the core data centre for even very small calculations and analysis. So managing the huge amount of data requires various data management tools and techniques.
616
U. Arora and N. Singh
2.5 Load Balancing When the traffic is too high, the problem arise for balancing the load coming from various users all at once on a centralized data centre. Having a single data centre as in case of cloud computing worsens the whole quality of experience.
3 Characteristics that Define the Fog Computing Infrastructure Fog computing brings devices closer to the edge of the network thus by making the distance shorter for highly latency sensitive IoT applications. Following are the various characteristics of the fog computing infrastructure:
3.1 Intelligence at the Edge The nodes of fog are equipped with the intelligence and computation capacity that is limited but enough to serve the needs of user before sending it to the cloud data centre for processing. Thus, it lowers the burden on the distant cloud nodes.
3.2 Highly Scalable The fog nodes are not as large and pricy as compared to the cloud nodes, so there is always the scope of adding few more nodes as and when the need arise.
3.3 Fault Tolerant There are various fog nodes distributed across multiple layers for providing the services to the users quickly. In such an infrastructure, if any problem occurs with any of the end nodes, the incoming load can be passed on to the available peer nodes of the same layer or to the nodes of the above layers.
Fog–Cloud-Assisted Internet of Things: A Review of Workload Allocation …
617
3.4 Sensors and Actuators The add-on of sensors and actuators become very much helpful for IoT applications like surveillance camera system, health care, smart homes, smart cities, manufacturing, etc.
3.5 Support for Multi-tenancy with Multiple Applications Multiple tenants can access the services provided by fog computing infrastructure for various applications at the same time. The architecture is such that it is capable enough to cope up with requirements agreed upon during SLA negotiation.
4 The Need for the Existence of Cloud at the Core Though fog computing outperforms cloud computing in terms of providing services to IoT applications, cloud will always remain in the picture. The heavy load of the few applications which require tedious computations have to be passed to the cloud layer of the combined fog–cloud infrastructure. This is because of the fact that the nodes of fog are of limited capacity as compared to the traditional cloud nodes (Table 1).
5 Challenges in Adopting Fog Computing Despite the fact that fog computing offers the appealing extension to the traditional cloud computing, there are still various challenges that need to be overcome to fully accommodate the idea of its development. These challenges can be listed below.
5.1 Configuring Core and Edge Nodes to Fog Nodes Virtually any end device like camera, router, gateway can be made to work as fog nodes but to say the least, these end devices need to be programmed in such a way so that they can match up to the requirements of what is expected from a fog device.
Year
2017
2016
2018
2016
2015
Author
Taneja et al. [3]
Souza et al. [4]
Razazadeh et al. [5]
Deng et al. [6]
Aazam et al. [7]
Input parameters
Fog devices, application modules, CPU load, modules to shift
Service demands resources to provide services, set of slots of resources
Develop a resource Resources with CPU management framework memory, storage and for IoT applications bandwidth requirements, different customer types based on their requirements
To allocate workload on Traffic arrival rate, fog–cloud servers system delay constraints, no. of fog devices, no. of cloud servers
To place modules using simulated annealing module placement (SAMP)
To provide a service allocation problem for combined fog cloud (CFC) architecture
Deploying modules in IoT application fog–cloud infrastructure modules with requirements
Objective
Resource estimation for new and old customers Price estimation for new and old customers
Power consumption delay
Cost, delay, energy consumption
Delay in service allocation
Application latency, energy, consumption, network usage
Output parameters
Limitations
Model for management of resource prediction, resource allocation, pricing
Pioneer work for the study of power consumptions and delay trade-off
Placement of modules on fog nodes with decreased cost, delay and energy consumption
Delay aware access of fog–cloud nodes by various services
(continued)
Heterogeneous service types and device mobility not considered
Optimization is performed from centralized point of view
RAM and BW requirement of modules could also be considered
Resources types, service types, maximum allowed delay not considered
Benchmark for various Only static QoS parameters characteristics of application DAG has been incorporated
Contribution
Table 1 Recent work related to workload allocation and latency management models and techniques in fog–cloud computing
618 U. Arora and N. Singh
Year
2018
2016
2016
2017
Author
Mahmud et al. [8]
Aazam et al. [9]
Aazam et al. [10]
Gupta et al. [1]
Table 1 (continued)
To develop a module placement model for IoT applications
Provide a model for resource estimation
To develop a model for resource management that considers the heterogeneous service types
Management of application modules with awareness of modules properties
Objective
Modules with their MIPS requirements, fog–cloud nodes with their MIPS availability
Service give up ratio, relinquish rate, NET promoter score of customer
Service-oriented relinquish probability, service price, user characteristics, SLA
List of application modules with properties, service delivery deadline, distributed fog nodes with properties
Input parameters
Module to device map
Virtual resource value, Resource estimation for new and old customers
Resource estimation for new and old customers, virtual resource value (VRV) for different service types
Number of context switching, frequency of host modules, time for forwarding modules to nodes, service delivery latency
Output parameters
Placement of modules as nearest as possible
QoE-based resource estimation model applied on Amazon EC2
Efficient utilization of resources, probability of customers, different types of services
Module management considering and other QoS factors
Contribution
Heterogeneous properties of modules not considered
Restricted to static environment only
Types of cloud service customers not taken into account
Run time requirements are not focused
Limitations
Fog–Cloud-Assisted Internet of Things: A Review of Workload Allocation … 619
620
U. Arora and N. Singh
5.2 Deployment of Distributed Applications The IoT applications are a collection of modules that may or may not be interconnected. So to deploy the complete application on to the fog computing environment, these modules have to be deployed on the distributed fog nodes in such a way that the application runs smoothly without any halt.
5.3 SLA Management As fog computing works as an intermediary between the cloud and the IoT, there is a dire need to manage the service level agreement (SLA), on which both the parties, viz. the service provider and the user can agree upon.
5.4 Designing New Policies New set of policies for power aware and priority aware resource management, failure management, resource scheduling, virtualization, latency management, workload allocation and application modules scheduling need to be developed for fog computing environment.
5.5 Security Aspects Being very close to the IoT devices authentication of who is allowed to access the services needs to be done beforehand so that privacy of data is not compromised by unauthorized access. Also, when the number of request is too high, denial of service (DoS) attack can emerge out to be devastating.
6 Conclusion Fog computing along with the available cloud computing techniques for workload allocation and latency management can work wonders for developing and adopting the IoT applications by the users. The latency that used to irritate the QoE while working with the cloud data centre alone can now be handled with the additional layers of fog nodes. Thus, this paper summarizes various aspects of cloud and fog computing particularly related to allocation of workload and latency management.
Fog–Cloud-Assisted Internet of Things: A Review of Workload Allocation …
621
References 1. Gupta, H., Dastjerdi, A.V., Ghosh, S.K., Buyya, R.: iFogSim: a toolkit for modeling and simulation of resource management techniques in the internet of things, edge and fog computing environments. Softw. Pract. Experience 47(9), 1275–1296 (2017) 2. Mahmud, R., Kotagiri, R., Buyya, R.: Fog computing: a taxonomy, survey and future directions. In: Internet of Everything, pp. 103–130. Springer, Singapore (2018) 3. Taneja, M., Davy, A.: Resource aware placement of IoT application modules in fog-cloud computing paradigm. In: 2017 IFIP/IEEE Symposium on Integrated Network and Service Management (IM), pp. 1222–1228. IEEE (2017) 4. Souza, V.B.C., Ramírez, W., Masip-Bruin, X., Marín-Tordera, E., Ren, G., Tashakor, G.: Handling service allocation in combined fog-cloud scenarios. In: 2016 IEEE International Conference on Communications (ICC), pp. 1–5. IEEE (2016) 5. Rezazadeh, Z., Rahbari, D., and Nickray, M.: Optimized module placement in IoT applications based on fog computing. In: Iranian Conference on Electrical Engineering (ICEE), pp. 1553– 1558. IEEE (2018) 6. Deng, R., Rongxing, Lu., Lai, C., Luan, T.H., Liang, H.: Optimal workload allocation in fogcloud computing toward balanced delay and power consumption. IEEE Internet Things J. 3(6), 1171–1181 (2016) 7. Aazam, M., Huh, E.-N.: Fog computing micro datacenter based dynamic resource estimation and pricing model for IoT. In: 2015 IEEE 29th International Conference on Advanced Information Networking and Applications, pp. 687–694. IEEE (2015) 8. Mahmud, R., Ramamohanarao, K., Buyya, R.: Latency-aware application module management for fog computing environments. ACM Trans. Internet Technol. (TOIT) 19(1), 1–21 (2018) 9. Aazam, M., St-Hilaire, M., Lung, C.H., Lambadaris, I.: PRE-Fog: IoT trace based probabilistic resource estimation at fog. In: 2016 13th IEEE Annual Consumer Communications and Networking Conference (CCNC), pp. 12–17. IEEE (2016) 10. Aazam, M., St-Hilaire, M., Lung, C.H., Lambadaris, I.: MeFoRE: QoE based resource estimation at fog to enhance QoS in IoT. In: 2016 23rd International Conference on Telecommunications (ICT), pp. 1–5. IEEE (2016)
Artificial Neural Network, Convolutional Neural Network Visualization, and Image Security Ankur Seem, Arpit Kumar Chauhan, and Rijwan Khan
Abstract Images are the most venerable form of data in terms of security. Daily around 1.8 billion images are added to the Internet and most of the data on the Internet is public. So, it is desired to make sure that images are secure before transmitting through an insecure network like the Internet. Various techniques are used for the protection of these digital images like steganography, watermarking, and encryption. These methods are used for achieving security goals like confidentiality, integrity, and availability. Individually none of the methods can achieve all security goals, so, this paper presents you a blended approach of the above-mentioned methods. A modern approach of using artificial neural network is proposed in this paper. In ANN, we will focus particularly on convolutional neural network (CNN). Since CNN works on the pixels of the image and deals with the subject like object detection, face recognition, so, it will enhance the current methods in terms of efficiency, accuracy, and computational power and will improve the security of image. The method presented in this paper is efficient and secured against the attacks and risks related to data. Keywords Image security · Encryption · ANN · CNN · Watermarking · Steganography
1 Introduction After the invention of the Internet, data became the most important asset of individuals or organizations. Every day we are adding 2.5 quintillion bytes of data to the Internet. A. Seem (B) · A. K. Chauhan · R. Khan Department of Computer Science and Engineering, ABES Institute of Technology, Ghaziabad, India e-mail: [email protected] A. K. Chauhan e-mail: [email protected] R. Khan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_51
623
624
A. Seem et al.
A huge portion of this data is in the form of images. Some of the images contain very important and confidential data like medical reports, bank account details, personal information, and many more. Maintaining the integrity and privacy of these images is a huge problem for the security of the image. Confidentiality means any unauthorized person should not be able to access the images. Integrity means implies that no unauthorized person should be able to modify the image in any way during the transfer of image through an insecure channel [1]. To prevent such attacks, there are many security methods have been used to provide security of image. But, none of the techniques fulfill all the goals of security, so, we will try to know can we use a blended approach of present techniques with the help of ANN and CNN. ANN is inspired by the human brain particularly from the neurons of our brain. An ANN is used for many applications, such as pattern recognition or data classification, with the help of the learning process [2]. This paper is a discussion of the application of ANN and CNN in the field of image security.
1.1 Encryption In encryption, certain algorithm is used to convert data into cipher using key. Once the image is encrypted, it is sent to the destination through medium. At the destination with the help to a key, it is converted to original form. Encryption is mainly classified into two categories symmetric encryption and asymmetric encryption. In symmetric encryption, single key is used for encryption and decryption, whereas, in asymmetric encryption, different keys are used for encryption and decryption. The encryption is dependent on the key, if the algorithm produces a good key then encryption is said to be good. A encryption is said to be good even if the encryption algorithm is known [3]. Performing encryption on image is way more difficult due to the amount of data carried by the image. So, it is better if we apply technology like CNN for increasing the efficiency (Fig. 1).
Fig. 1 Encryption of image
Artificial Neural Network, Convolutional Neural Network Visualization …
625
Fig. 2 Steganography of image
1.2 Steganography Image steganography is a technique in which the secret data or secret image is hidden within on ordinary key [4]. At the sender side, secret key is used for the process of steganography and the same key is used for extracting image from the stego image at the receiver side [5]. It can be used for image or text. It is different from encryption. For image, we hide the image using another image as a cover image. Due to this, the size of the image increases drastically, so, the good algorithm tries to keep the size of the image as small as possible. A good steganography technique should have more data hiding capacity and unidentified secret data [6] (Fig. 2).
1.3 Watermarking It is one of the forms of digital signature which shows the ownership of the person or organization of the document or image. It is classified into categories visible or invisible, both types of watermarking have different purpose. Visible watermarking shows direct ownership or identity (as in currency notes, company logos on the documents, copyright protection), whereas invisible watermarking provides secrecy of message (authentication). The existing techniques of watermarking are based on either spatial or frequency domain [7] (Fig. 3).
626
A. Seem et al.
Fig. 3 Watermarking of image
1.4 Artificial Neural Network (ANN) Modern computers are far better in terms of numeric computation, but we can easily effortlessly solve perceptual problems (object detection, face recognition) due to our biological neural network [8]. So, ANN is inspired by a biological neural network, it is a huge network of single processor unit called artificial neurons which are interconnected with each other. Neurons are present in the layers and neurons of a particular layer are connected with the next and previous layer. The first layer of the neuron is called the input layer, and the last layer is known as the output layer and all the layer between the input and output layer are called hidden layers. The input layer takes input, the hidden layer processes the input, and the output layer gives output. They together form a huge network which will work like a human brain (Fig. 4). These neurons are used for processing, analyzing, and understanding. In neurons, inputs come from many different neurons, and submission (R) takes place in the first half and the second half certain function (f ) is applied to the result of the submission. Then, the calculated value is passed to the next layer and a similar function happens in the next layer until the final answer reaches the output layer, and this process is called forward propagation. During the training phase, we compare the output at the output layer and real value, with the help of comparison changes are made in the whole network. When a model is perfectly trained it is used for the real world (Fig. 5).
1.5 Convolution Neural Network (CNN) It is a deep neural network used for the analyzing, processing of images. These are also known as space invariant or shift-invariant neural networks. In the CNN, image is converted into the matrix of size n × m, some matrix of different sizes is used for decreasing the size of the image and extracting the details from the image. The same process is repeated again and again with different filters for extracting different properties from the image. These extracted values are used for getting the desired result (Fig. 6).
Artificial Neural Network, Convolutional Neural Network Visualization …
627
Fig. 4 Artificial neural network
Fig. 5 Single neuron
2 Related Work 2.1 CNN in Encryption Author [9] presented a security system for encryption of images with the help of CNN. The proposed procedure shows a model where the CNN-chaotic map has been used for improving the performance of encryption of image. The model takes an
628
A. Seem et al.
Fig. 6 a Image after simple gaussian filter, b image after random filter
image as input and produces the random matrix of the same dimension as the image having the random values. Then, the matrix is trained in the CNN network. The outcome obtained is now feed in forward and backward propagation for generating the secret key. Then, this key is used for the encryption of the image. In the proposed model, author uses autoencoder for reducing the noise and offers potential training performance resulting in the generation of the best secret key for the encryption [9]. This method generates the elite key for encryption of image which means it will be very difficult to attack or manipulate the image.
2.2 CNN in Steganography Author [4] proposes a highly efficient network for steganography. In the proposed network, there are five layers, first layer denotes the original image next four denotes the reduced features of the original image to 1/2, 1/16, 1/64, and 1/164, respectively. Generally, in computer vision, the deeper part of the network gets more attention but in this model, there is more convolutional layer in the lower part of the network because, for steganalysis, the embedded steganographic signals are at the pixel level and since the lower part of CNN deals with the pixel analysis so it is more effective than deeper part. After getting the output from all the layers, the final result is sent to the global average pooling layer which reduces the feature map to 1 × 1 and at last value from the global average pool layer is passed to the softmax function.
Artificial Neural Network, Convolutional Neural Network Visualization …
629
2.3 CNN in Watermarking In the proposal, authors [10] have presented auto CNN for digital image watermarking using autoencoder. Two independent images are generated using CNN similar to the original image for getting two sets of codebooks. The two generated image vectors are used for watermarking based on the level of binary watermarking. The proposed method gives a better result over many existing techniques because of its capability of learning and remembering the structure of the input image with high accuracy.
3 Proposed Model The proposed model gives a blended approach considering the importance of confidentiality, integrity, and availability of the image. First of all, a simple CNN filter is applied to the image it will be our first layer security, it depends on the user which filter to use. Then, the generated image is encrypted with the help of an elite key generated by CNN-based encryption model. In the next step, it is passed from the CNN-based steganographic model and covered by another image at the last depending on individual visible, invisible, or both kinds of marking will be applied for ensuring the ownership using the CNN-robust digital image watermarking system.
4 Algorithm 1. 2. 3. 4. 5. 6. 7. 8.
Take the image passes it through the CNN filter Encrypt the resulted image The encrypted image is altered with the least significant bits of the cover image to perform steganography Watermarking of stego image Sending and receiving the image De-watermarking and confirming the ownership Removal of cover image 8. Removal of encryption from the image Removing the simple filter.
5 Flowchart See Fig. 7.
630
Fig. 7 Flowchart of the proposed method
A. Seem et al.
Artificial Neural Network, Convolutional Neural Network Visualization …
631
Fig. 8 a Main image, b final image, c main image histogram, and d final image histogram
6 Result Analysis The result of the proposed method can be easily analyzed by calculating the difference between the original and final image. The two most common methods used for the evaluation are peak signal–noise ratio (PSNR) and mean squared error (MSE). The difference between the two images is very low which cannot be distinguished by human eyes (Fig. 8). The histogram of image shows the relative frequency of occurrence of various gray levels in an image. It can be used for calculating the relativeness of two images.
7 Conclusion and Future Work Due to the increasing number of images in the public domain, the need for image security is increasing day by day. The model proposes a mixed approach with the help of a network that can analyze, learn, and remember the images. CNN increases the efficiency of methods like encryption, steganography, and watermarking. Using a simple filter of CNN for hiding the data of the image according to the user at the first layer of the proposed model enhances the efficiency of the model. Using the above methods in particular order enhances the security of the image many times. The proposed model is highly effective and secure. The proposed method can also be used for text security. With the help of CNN and softmax function, the image can also be converted into a single string of values.
632
A. Seem et al.
Converting image to text may result in loss of some data, but it will increase image security many times. With this, I conclude that the future scope of CNN is very high for image security.
8 Limitation The proposed method has some disadvantages like it may change the quality of an image, and it may distort the message or image. It increases the size of image as we are hiding one image in the other image. Very minute mistakes may make a proper recovery of image very difficult. We hope we will be to overcome these limitations soon.
References 1. Ulutas, M., Ulutas, G., Nabiyev, V.V.: Medical image security and EPR hiding using Shamir’s secret sharing scheme. J. Syst. Softw. 84(3), 341–353 (2011) 2. Schmidt, T.: A Review of Applications of Artificial Neural Networks in Cryptosystems. Department of Computer Science, Ryerson University, Canada 3. Petkovic, M., Jonker, W.: Preface, special issue on secure data management. J. Comput. Secur. 17(1), 1–3 (2009) 4. Xiang, Z., et al.: A new convolutional neural network-based steganalysis method for contentadaptive image steganography in the spatial domain. IEEE Access 8, 47013–47020 (2020) 5. Razzaq, M.A., et al.: Digital image security: fusion of encryption, steganography and watermarking. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 8(5), (2017) 6. Saini, J.K., Verma, H.K.: A hybrid approach for image security by combining encryption and steganography. In: 2013 IEEE Second International Conference on Image Information Processing (ICIIP-2013). IEEE (2013) 7. Nezhadarya, E., Wang, Z.J., Ward, R.K.: A Robust Image Watermarking Based on Multiscale Gradient Direction Quantization. IEEE Trans. Inf. Forensics Secur. 6(4), 1200–1213 (2011) 8. Jain, A.K., Mao, J., Mohiuddin, K.M.: Artificial neural networks: a tutorial. Computer 29(3), 31–44 (1996) 9. -Maniyath, S.R., Thanikaiselvan, V.: An efficient image encryption using deep neural network and chaotic map. Microprocess. Microsyst. 103134 (2020) 10. Haribabu, K., Subrahmanyam, G.R.K.S., Mishra, D.: A robust digital image watermarking technique using autoencoder based convolutional neural networks. In: 2015 IEEE Workshop on Computational Intelligence: Theories, Applications, and Future Directions (WCI). IEEE (2015)
A Study on RPL Protocol with Respect to DODAG Formation Using Objective Function Sakshi Garg , Deepti Mehrotra , and Sujata Pandey
Abstract The Routing Protocol for Low power and Lossy networks (RPL) utilizes the Objective Function (OF) to form a Destination-Oriented Directed Acyclic Graph (DODAG) using a set of metrics. The key role of OF is to determine the best parent and ideal path to reach the destination. However, introducing an efficient OF in Low Power and Lossy Networks (LLNs) presents a considerable test. This paper presents a survey of existing DODAG formation strategies in LLNs. We highlight the advantages and disadvantages of the listed approaches. Further, it presents the classification and justification of considered metrics. Then, an interesting comparison of the DODAG-forming strategies and the considered reviewed papers is proposed. Finally, the inputs are summarized by emphasizing on the challenges that can be explored by LLN researchers for future findings. Also, this article will help the researchers in gaining a better insight of RPL protocol, OF and DODAG formation for future works. Keywords Internet of things · Low power · Lossy networks · RPL · DODAG · Objective function · Node metrics · Link metrics
1 Introduction Internet of Things (IoT) [1] is a revolution for the current era, from applications that connect computing devices, digital and mechanical devices, objects, to people who possess unique identifiers. IoT has altered agriculture [2], automation, transportation, healthcare industry entirely with the aid of mobile and internet connectivity advancements, to make our lives more easy and secure [3]. The sensors/motes used in the deployment have restricted resources, low power and lossy in nature. Perhaps, this generates the substantial need for a routing protocol for LLNs. Such protocol S. Garg (B) · D. Mehrotra · S. Pandey Amity University, Noida, Uttar Pradesh, India S. Pandey e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_52
633
634
S. Garg et al.
was introduced by IETF ROLL working group called RPL [4]. RPL facilitates a flexible working environment to manage the network, induce topology changes and simulate the real scenario. In this article, we primarily focus on the DODAG formation strategies using a set of metrics. Selection of optimal path to build DODAG has evolved and attracted many researchers but, the lack of a survey in literature related to DODAG forming strategies motivated us to frame this article. We aim at providing deep insight about the RPL protocol, OF and DODAG building approaches to the LLN database and future seeking researchers in this area. This article gives the systematic analysis of the state of art for RPL. It also confines the field to discussing previous works that focusses on the selection of metrics to present a more organized survey. The main contributions are: First, this paper presents a comprehensive survey of the RPL protocol concerning to DODAG approaches considered. Second, the classification of metrics and the possible set of combinations that are explored in literature by the researchers are discussed. Third, the standard DODAG formation and the proposed techniques in previous works with a comprehensive comparative analysis are presented. Last but not least, the challenges in RPL for future research and new possibilities are highlighted. Section 2 gives an overview of related literature. Section 3 discusses the standard RPL DODAG formation and its different proposed approaches and presents the comparative analysis of these approaches. Section 4 lists the used metrics to select optimal path and explains their selection criteria. Section 5 discusses the major challenges of RPL. Findings of this study is presented in Sect. 6. Lastly, Sect. 7 is the conclusion of this study.
2 Literature Work 2.1 Related Work RPL gained popularity in the research community after being introduced by IETF ROLL working group as a solution to LLN issues. Since then, RPL has been the prime focus of researchers and so has attracted many researchers to propose and analyze its optimizations. However, till date, only few works have focused on the core of RPL which is Objective Function optimization. Like In survey [5], authors discussed the key features of RPL. Though, the article failed to list ways to overcome the limitations of the protocol. Another survey [6] primarily discussed the security aspect of RPL. Perhaps, security is a major concern but the root lies in defining the optimal OF to form DODAG which is found missing. In [7], authors presented a comparison of standard OF (OF0 and MRHOF) in different network setting. Their findings are evaluated in low-density network, whereas RPL tends to compromise more in high density networks, thus it makes the work counterproductive.
A Study on RPL Protocol with Respect to DODAG Formation …
635
Surveys like [8–10] gave an overview and highlighted the need to improve RPL and stated its limitations but lacked to discuss the efforts made to overcome the shortcomings. Authors in [11] proposed a new OF using combined metrics to improve RPL performance based on PDR and overload. Articles [12] listed ways to overcome the challenges of RPL stated in literature and recommended the need to improvise OF. Authors in [13] assessed the performance of RPL in dense and heavy loaded network. Despite improvement in network lifetime, network stability and high control messages remain an open issue. Another survey [14] presented an overview of different techniques used to compute OF. The author in his thesis [15] studies RPL enhancements in three different scenarios and compiled the findings. In [10], authors presented a comprehensive survey focusing on OF until 2019 but did not emphasize on DODAG formation strategies which are discussed in this paper. Our survey is primarily focused on discussing the OF and metrics used. Compared to other existing surveys, this paper gives a deep vision on DODAG formation techniques for best parent selection. This article is first to present a comparison between standard DODAG formation techniques with the proposed ones with the motive to extrapolate the weakness in limitation of RPL. Table 1 summarizes the survey papers discussed here.
3 DODAG Approaches and Comparative Analysis RPL is a Distance Vector Routing Protocol (DVRP) which can support different topologies. Compared to other ad hoc protocols, RPL is the only protocol that supports LLNs due to its flexible OF. Routing is based on Destination-Oriented Directed Acyclic Graphs (DODAGs). In DODAG, all the nodes are directed toward a single destination, i.e., the root node. DODAG uses ICMPv6 control messages to form the spanning tree: DODAG Information Object (DIO), DODAG Information Solicitation (DIS), DODAG Advertisement Object (DAO) and (DAO-Acknowledgement) DAO-ACK [16]. Figure 1 shows the DODAG control message structure.
3.1 Standard DODAG_OF Approaches Objective Function Zero (OFO). The concept of OF was first standardized in 2012 stated in Request for Comment (RFC) 6552 [17]. This OF makes use of single-node metric hop count (HC) to select shortest path to the root. Since hop count is the only deciding factor in selection of the route, some failed routes get selected repeatedly due to shortest path which increases the network latency. Minimum Rank Hysteresis Objective Function (MRHOF). This OF got conceptualized in 2012 stated in RFC 6719 [18]. Since poor link quality is an issue with OF0, MRHOF was introduced to use dynamic link metric like ETX. MRHOF evaluates
636
S. Garg et al.
Table 1 Related literature study Studies related to RPL
Concerns
Objectives
Year
Gaddour and Koubaa [5]
• Message update, control messages • Metrics and constraints • Traffic flow and topology
RPL features, RPL challenges
2012
Pongle and Chavan [6]
• Denial of service and spoofing attacks • Selective forwarding attack • Hello flooding and Blackhole attacks
RPL security
2015
Qasem et al. [7]
• • • •
Packet loss Packet delivery ratio RX Topology
RPL performance
2015
Lamaazi and Benamar [11] • • • •
Objective function PDR Overhead Composite metrics
• RPL performance
2017
Ghaleb et al. [12]
• • • •
Routing maintenance Route optimization Downward routing RPL modes and memory limitations
• Solutions proposed in literature to overcome RPL challenges
2018
Taghizadeh et al. [13]
• Network lifetime • Power depletion • Packet loss
• RPL performance in high density network
2018
Kechiche et al. [14]
• OF computation techniques • Parent selection • Single and composite metrics
• RPL overview
2018
Ghaleb [15]
• Downward routing • Route optimization • Topology
• RPL enhancement
2019
RPL review
2020
Lamaazi and Benamar [10] • Routing modes • Objective function assessment techniques • Single/composite metrics
the cost of the paths to the root using hysteresis mechanism and selects the path with the lowest cost. The advantage of MRHOF is that it can support additive metrics. MRHOF is further utilized the most with two parameters, namely ETX and Energy. However, MRHOF with ETX/Energy is enhancement to MRHOF but computes more delay and high packet loss in the network.
A Study on RPL Protocol with Respect to DODAG Formation …
637
Fig. 1 DODAG control message structure
3.2 Some Proposed DODAG_OF Approaches In paper [19], authors used a static single-node metric with random topology. They considered 40, 80, 100 and 200 senders with 1 sink. They chose PDR, Overhead and Power Consumption as parameters. They made use of the OF0 and MRHOF objective function for their study. Their results showed MRHOF better in all test cases. In paper [20], authors used random topology in both static and mobile environments with 25, 49 and 51 nodes. They considered PDR, Hop count, Power Consumption, and ETX as parameters. Their results showed MRHOF is more reliable. OF0 has better PDR and power consumption. Likewise, authors in [21] performed the same experiment and found same results. The only difference is they considered 50, 65, 75 and 85 nodes. Paper [22], measured PDR, Latency, Power consumption as metric and MRHOF_ETX and MRHOF_Energy and OF0 as objective functions. 20, 40, 60, 80, 100, 120, 40 nodes are considered. Their results stated OF0 better in high density network, while MRHOF_ETX and MRHOF_Energy showed bad performances. In [23], with same static and random topology setup, same parameters PDR and power consumption with 11, 16, 21, 26, 31, 36, 41 and 46 nodes found similar results. The objective functions used are OF0 and MRHOF_ETX. OF0 better in energy consumption and independent of network size while MRHOF_ETX better in PDR. Paper [24] made use of multiple sinks (1, 5, 10) and 35 senders in static environment and random topology. They considered PDR, Energy consumption, Throughput as metrics. They took ETX and HC as objective functions and found ETX to be better than HC in all three parameters. Authors in [25] used 26, 50, 82 nodes in both static and mobile environment with PDR, Hop Count, Power consumption as parameters. They compared MRHOF and OF0 and found similar results that OF0 used less power and had high PDR. They also considered same PDR and power consumption as metrics and found MRHOF better in static setup, while OF0 better in mobile environment. Paper [26] considered PDR, Hop Count, Energy consumption as parameters and OF-EC as objective function which is compared with OF-Fuzzy, ENTOT and ETX. They showed OF-EC better in PDR and Power consumption. While the comparison with OF0 is found missing which is actually not affected by the network size. Likewise, authors in [27], used same parameters and also added delay. They compared their proposed objective function OF_ECF to OF0, MRHOF, and OFFUZZY. Their results showed MRHOF is better in convergence time, OF-ECF in stability and network lifetime, while OF0 consumed less energy.
638
S. Garg et al.
3.3 Comparative Analysis From Sect. 3.2, it could be analyzed that more than 60% of the researchers used random topology while around 28% of the researchers used grid topology. It can also be viewed that static environment is most commonly used for setup by the scholars than mobile. It is also so, because figuring clear simulations in dynamic environment is difficult. Number of considered nodes varied from study to study. While, we found in our analysis that most researchers have claimed that best results are obtained with nodes between 40–60 whereas, after 70 nodes, packet loss, delay, low ETX occurred at a great level due to delayed network convergence. This is also interesting to note that only 1% of the papers considered multi-sink, whereas rest of the simulations are run by considering single sink. Apart, it can further be analyzed from the comparison studied that OF0 consumed less power than MRHOF and is independent of network size. This is a major reason as to why people may prefer to use OF0 as OF. But on contrary, since OF0 uses single link metric, it gave poor PDR. While, 60% of studies claim high PDR in MRHOF and 40% of studies in OF0. Also, it should be noted that MRHOF performed better in static and low-density setup, whereas the results degraded in mobile and high-density frame. Efficient Energy Consumption in OF0 is supported by 60% in literature than MRHOF, MRHOF_ETX and MRHOF_Energy and high PDR in MRHOF, which instantly generates the need to bring an effective OF that can provide both at a time and optimize the results. This can highly improve the rate of convergence in the network providing high QoS (Quality of Service). The performance comparative analysis of the above studies is discussed in Table 2. The table indicates the comparison made by the researchers based on objective functions PDR and ETX. Only, these two parameters are considered because they both are most considered metric by the researchers. Figure 2 gives the idea of highly considered parameters by the researchers in the past in comparison with other parameters.
4 Classification of Metrics The process of building a DODAG requires assigning of rank to the nodes based on OF. The OF is decided based on the metric. This metric can be broadly classified into two types: Link metrics and Node metrics. Table 3 shows the classification of metric into link and node used by researchers as composite metrics in literature and presents the assessment related to OF. Single metric is when one parameter is selected as an OF. Whereas when more than 2 parameters are selected at a time for OF forms a composite metric. From table it can be made that almost 40% of papers used single metric, while 60% made use of composite metric.
A Study on RPL Protocol with Respect to DODAG Formation …
639
Table 2 Comparative performance analysis of DODAG_OF based on PDR and energy consumption Paper considered/DODAG_OF
Parameters
OF0
Sharma and Javavignesh [19]
PDR
Bad
Good
EC
Bad
Good
Pradeksa et al. [20]
PDR
Good
Bad
EC
Good
Bad
PDR
Bad
Good
EC
Bad
Good
PDR
Same
Same
Bad
EC
Same
Same
Bad
PDR
Bad
Good
EC
Good
Bad
Abuein et al. [21] Kechiche et al. [22] Alayed et al. [23] Zaatouri et al. [24]
PDR
MRHOF_ETX
Good
PDR
Same
Same Bad
EC
Good
Lamaazi and Benamar [26]
PDR
Bad
Good
EC
Good
Bad
Lamaazi et al. [27]
PDR
Bad
Good
EC
Good
Bad
No. of research papers considered in percentage
MRHOF_Energy
Good
EC Mardini et al. [25]
MRHOF
80%
Received Packet
60%
Lost Packet
40%
Hop Count ETX
20%
PDR 0%
Used Metric parameters
Throughut
Fig. 2 Representation of usually considered parameters by researchers
5 Challenges This section gives future directions to the researchers who aim at improving and optimizing RPL. All the issues faced during the simulation and testing of RPL optimizations account to be the challenges for now. We found that throughout our study, there have been a lot of approaches in history that tend to improvise RPL-PDR or delay but under restricted environment. For instance, high PDR and less delay is
640
S. Garg et al.
Table 3 Classification of metrics and assessment of objective function No. OFs
Link metrics
1
RSSI-based Ipv6 [28]
RSSI, payload length
2
OF0 [29]
3
OF-ENERGY [30]
4
BF-ETX [31]
5
Node metrics
Achievement Gaps
Simulator
Year
High transmission rate
Packet loss increased with RSSI, Not compared with standard solutions
Contiki OS
2014
Hop count
High PDR end-to-end delay
Compared only with standard ETX
Contiki 2014 OS, Cooja
Hop count, Remaining energy
Less packet loss and delay, Increase in network lifetime
Not Contiki compared to OS the standard OF
ETX
Low latency and less power consumption
Only ETX taken as parameter for evaluation
Contiki 2018 OS, Cooja
SCAOF [32]
Reliability Energy
High reliability, Increase in network lifetime
Tested only in low density network
Contiki OS
6
FM-OF [33]
ETX, RSSI
Hop count
High PDR, Less delay
Compared Contiki 2017 only to OS, Cooja standard OF
7
DQCA-OF [34]
ETX
Hop count, Consumed energy
Better QoS, Increase in network lifetime and less end-to-end delay
Tested only in low density network
Contiki 2018 OS, Cooja
8
FLEA-RPL [35]
ETX
Traffic load, Residual energy
High PDR, Increase in network lifetime and less end-to-end delay
Less efficient in terms of load balancing
Contiki 2018 OS, Cooja
2015
2015
(continued)
A Study on RPL Protocol with Respect to DODAG Formation …
641
Table 3 (continued) No. OFs
Link metrics
Node metrics
Achievement Gaps
Simulator
Year
9
OF-EC [26]
ETX
Hop count, energy
High PDR, Energy efficiency irrespective of network topology
Frequent Contiki 2018 change in OS, Cooja parent makes the link instable
10
OF-ECF [27]
ETX
Hop count, Consumed energy, Forwarding delay
Increase in network lifetime and reliability
Takes more Contiki 2019 convergence OS, Cooja time
11
EB-OF [36]
ETX
Energy depletion
Increase in network lifetime, energy balance in intermediate nodes
Energy balance could be achieved at the cost of links reliability
Contiki 2020 OS, Cooja
attainable with optimizations in low density network, while poor results are observed in high density networks. Now in real scenarios, we aim at achieving high PDR, less delay and increased network lifetime in low- as well as high-density environment. Likewise, power consumption in high-density environment is still a major concern. This fact can be supported by our study as well. RPL performance evaluation is also controversial because the actual data is never available due to security [37] reasons and the simulated data is more like the real scenario but not the actual one. So, the results obtained may slightly vary in real scenarios. Network topology and stabilizing network links to increase network stability is of paramount importance to ensure flawless deployment of the protocol [38] for LLNs. RPL implementation in mobile environment is a tedious task in terms of securing the network. Thus, RPL security and privacy is all together a new domain to be taken up as future work.
6 Results and Discussions We present a statistical assessment of the papers reviewed above. Our analysis showed that maximum researchers prefers the use of composite metrics (more than 60%) than single metrics. The reason being, in single metrics optimization, the parameter chosen as OF gets optimized while the other parameters degrade, eventually resulting in unbalanced parameter distribution. We saw the rise of RPL till date. Almost 12% of the researchers were discussing RPL back in 2012–2014. Since, then it has increased to 42% in 2015–2017. Researchers also emphasized on OF and not just overviews.
642
S. Garg et al.
From 2018 to till date and counting, RPL research has gone manifold. It can be marked that power consumption (30%) and PDR (23%) have been the most targeted areas for majority of the researchers. In our analysis, we also found that more than 95% of the reviewed papers made use of Contiki OS for their simulation and assessment of RPL. Remaining, 3–5% researchers worked on MATLAB, OMNeT++, NS2/3, etc. This is a major rise in the use of Cooja simulator since 2013 due to its flexibility. However, long delay, dynamic topology, links instability, network density, load balancing and RPL security for LLN still remain some open challenges.
7 Conclusion Through this article, we tried to give a deep insight of RPL protocol. We mainly focused on the DODAG formation techniques. In the beginning, we talked about the related study in this area. Then, we classified the metrics that are used to decide OF and to build DODAG. Specifically, we presented a comparative analysis of the standard approaches with the proposed ones to analyze network performance. During this study, we also found that only 18% of researchers are using the standard metric to determine OF, while 82% of researchers are improvising the metric to achieve better results. In particular, we found that improvising on the DODAG formation approach, we can improve network paradigm. Though this statement holds more valid based on simulation results rather than experimental observations. We also noticed that the Contiki OS/Cooja simulator is widely used for simulation of RPL work environment. Despite the available research work, we could conclude that various other combinations of metrics to build DODAG and different implementation modifications can also be exploited to achieve better network performance and energy efficiency. Although, availability of real data is a major limitation of the study due to network privacy and security. Therefore, academicians generate their own data by using tools and simulators. Cloud computing of big-data, energy depletion of nodes, multiple parent selection are some of the other limitations of this study. However, this article will be of interest to researchers working in favor of RPL or LLNs and can further provide future directions.
References 1. Minerva, R., Biru, A., Rotondi, D.: Towards a definition of the internet of things (IoT). IEEE Internet Initiative 1(1), 1–86 (2015) 2. Faridi, M., Verma, S., Mukherjee, S.: Integration of gis, spatial data mining, and fuzzy logic for agricultural intelligence. In: Soft Computing: Theories and Applications, pp. 171–183 (2018) 3. Movassaghi, S., Abolhasan, M., Lipman, J., Smith, D., Jamalipour, A.: Wireless body area networks: a survey. IEEE Commun. Survey. Tutorials 16(3), 1658–1686 (2014) 4. Krishna, G.G., Krishna, G., Bhalaji, N.: Analysis of routing protocol for low-power and lossy networks in IoT real time applications. Procedia Comput. Sci. 87, 270–274 (2016)
A Study on RPL Protocol with Respect to DODAG Formation …
643
5. Gaddour, O., Koubâa, A.: RPL in a nutshell: a survey. Comput. Netw. 56(14), 3163–3178 (2012) 6. Pongle, P., Chavan, G.: A survey: attacks on RPL and 6LoWPAN in IoT. In: 2015 International Conference on Pervasive Computing (ICPC), pp. 1–6 (2015) 7. Qasem, M., Altawssi, H., Yassien, M.B., Al-Dubai, A.: Performance evaluation of RPL objective functions. In: 2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing, pp. 1606–1613 (2015) 8. Paliwal, G., Taterh, S.: Impact of dense network in MANET routing protocols AODV and DSDV comparative analysis through NS3. In: Soft Computing: Theories and Applications, pp. 327–335 (2018) 9. Dass, A., Srivastava, S.: On comparing performance of conventional fuzzy system with recurrent fuzzy system. In: Soft Computing: Theories and Applications, pp. 389–403 (2018) 10. Lamaazi, H., Benamar, N.: A comprehensive survey on enhancements and limitations of the RPL protocol: a focus on the objective function. Ad Hoc Netw. 96, 102001 (2020) 11. Lamaazi, H., Benamar, N.: RPL enhancement using a new objective function based on combined metrics. In: 2017 13th International Wireless Communications and Mobile Computing Conference (IWCMC), pp. 1459–1464 (2017) 12. Ghaleb, B., Al-Dubai, A.Y., Ekonomou, E., Alsarhan, A., Nasser, Y., Mackenzie, L.M., Boukerche, A.: A survey of limitations and enhancements of the ipv6 routing protocol for lowpower and lossy networks: a focus on core operations. IEEE Commun. Surv. Tutorials 21(2), 1607–1635 (2018) 13. Taghizadeh, S., Bobarshad, H., Elbiaze, H.: CLRPL: context-aware and load balancing RPL for IoT networks under heavy and highly dynamic load. IEEE Access 6, 23277–23291 (2018) 14. Kechiche, I., Bousnina, I., Samet, A.: An overview on RPL objective function enhancement approaches. In: 2018 Seventh International Conference on Communications and Networking (ComNet), pp. 1–4 (2018) 15. Ghaleb, B.: Efficient Routing Primitives for Low-power and Lossy Networks in Internet of Things. Doctoral Dissertation. Edinburgh Napier University (2019) 16. Al-Fuqaha, A., Guizani, M., Mohammadi, M., Aledhari, M., Ayyash, M.: Internet of things: a survey on enabling technologies, protocols, and applications. IEEE Commun. Surv. Tutorials 17(4), 2347–2376 (2015) 17. Thubert, P.: Objective function zero for the routing protocol for low-power and lossy networks (RPL) (2012) 18. Gnawali, O., Levis, P.: The minimum rank with hysteresis objective function. In: RFC 6719 (2012) 19. Sharma, R., Jayavignesh, T.: Quantitative analysis and evaluation of rpl with various objective functions for 6lowpan. Indian J. Sci. Technol. 8(19), 1 (2015) 20. Pradeska, N., Najib, W., Kusumawardani, S.S.: Performance analysis of objective function MRHOF and OF0 in routing protocol RPL IPV6 over low power wireless personal area networks (6LoWPAN). In: 2016 8th International Conference on Information Technology and Electrical Engineering (ICITEE), pp. 1–6 (2016) 21. Abuein, Q.Q., Yassein, M.B., Shatnawi, M.Q., Bani-Yaseen, L., Al-Omari, O., Mehdawi, M., Altawssi, H.: Performance evaluation of routing protocol (RPL) for internet of Things. Perform. Eval. 7(7), (2016) 22. Kechiche, I., Bousnina, I., Samet, A.: A comparative study of RPL objective functions. In: 2017 Sixth International Conference on Communications and Networking (ComNet), pp. 1–6 (2017) 23. Alayed, W., Mackenzie, L., Pezaros, D.: Evaluation of RPL’s single metric objective functions. In: 2017 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), pp. 619–624 (2017) 24. Zaatouri, I., Alyaoui, N., Guiloufi, A.B., Kachouri, A.: Performance evaluation of RPL objective functions for multi-sink. In: 2017 18th International Conference on Sciences and Techniques of Automatic Control and Computer Engineering (STA), pp. 661–665 (2017)
644
S. Garg et al.
25. Mardini, W., Ebrahim, M., Al-Rudaini, M.: Comprehensive performance analysis of RPL objective functions in IoT networks. Int. J. Commun. Netw. Inf. Secur. 9(3), 323–332 (2017) 26. Lamaazi, H., Benamar, N.: OF-EC: A novel energy consumption aware objective function for RPL based on fuzzy logic. J. Netw. Comput. Appl. 117, 42–58 (2018) 27. Lamaazi, H., El Ahmadi, A., Benamar, N., Jara, A.J.: OF-ECF: a new optimization of the objective function for parent selection in RPL. In: 2019 International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob), pp. 27–32 (2019) 28. Lee, T.H., Xie, X.S., Chang, L.H.: RSSI-based IPv6 routing metrics for RPL in low-power and lossy networks. In: 2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 1714–1719 (2014) 29. Parnian, A.R., Kharazmi, M.R., Javidan, R.: RPL Routing Protocol in Smart Grid Communication 1 (2014) 30. Todolí-Ferrandis, D., Santonja-Climent, S., Sempere-Payá, V., Silvestre-Blanes, J.: RPL routing in a real life scenario with an energy efficient objective function. In: 2015 23rd Telecommunications Forum Telfor (TELFOR), pp. 285–288 (2015) 31. Sanmartin, P., Jabba, D., Sierra, R., Martinez, E.: Objective function BF-ETX for RPL routing protocol. IEEE Latin Am. Trans. 16(8), 2275–2281 (2018) 32. Chen, Y., Chanet, J.P., Hou, K.M., Shi, H., De Sousa, G.: A scalable context-aware objective function (SCAOF) of routing protocol for agricultural low-power and lossy networks (RPAL). Sensors 15(8), 19507–19540 (2015) 33. Urama, I.H., Fotouhi, H., Abdellatif, M.M.: Optimizing RPL objective function for mobile lowpower wireless networks. In: 2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC), vol. 2, pp. 678–683 (2017) 34. Araújo, H.D.S., Rodrigues, J.J., Rabelo, R.D.A., Sousa, N.D.C., Sobral, J.V.: A proposal for IoT dynamic routes selection based on contextual information. Sensors 18(2), 353 (2018) 35. Sankar, S., Srinivasan, P.: Fuzzy logic based energy aware routing protocol for internet of things. Int. J. Intell. Syst. Appl. 10(10), 11 (2018) 36. Rana, P.J., Bhandari, K.S., Zhang, K., Cho, G.: EBOF: A new load balancing objective function for low-power and lossy networks. IEIE Trans. Smart Process. Comput. 9(3), 244–251 (2020) 37. Aldubai, A.F., Humbe, V.T., Chowhan, S.S.: Analytical study of intruder detection system in big data environment. In: Soft Computing: Theories and Applications, pp. 405–416 (2018) 38. Singh, V., Saini, G.L.: Dtn-enabled routing protocols and their potential influence on vehicular ad hoc networks. In: Soft Computing: Theories and Applications, pp. 367–375 (2018)
An Ensemble Learning Approach for Brain Tumor Classification Using MRI Ranjeet Kaur, Amit Doegar, and Gaurav Kumar Upadhyaya
Abstract Digital image processing is a prominent tool which is used by radiologists to diagnose the complicated tumor. Magnetic resonance imaging, CT scans, X-rays, etc., are examined and analyzed by extracting the meaningful/accurate information from them. Diagnosing the brain tumor with accuracy is the most critical task. The survival of the infected patients can be increased if the tumor is detected earlier. In this research paper, an ensemble approach is proposed to classify the benign and malignant MRI of the brain. The total of 150 image slices from the Harvard Brain Atlas dataset is utilized in the ratio of 60:40 for training and testing the proposed method. Otsu’s segmentation has been applied to segment the tumor from the skull. Then, the hybrid features including shape, intensity, color and textural features of the MRI are extracted. Decision trees, k-nearest neighbor and support vector machine classifiers are applied separately on the feature set. Then, the stacking model is applied to combine the outcome/prediction of each classifier and gives the final result. The proposed methodology is validated on an open dataset and achieved 97.91% average accuracy, 88.89% precision and 94.44% sensitivity. When compared with other existing methodologies, better accuracy is achieved by this approach. Keywords Brain tumor · Brain tumor detection · Ensemble learning · Image processing · Stacking model
R. Kaur (B) · A. Doegar National Institute of Technical Teachers Training & Research (NITTTR), Chandigarh, India e-mail: [email protected] A. Doegar e-mail: [email protected] G. K. Upadhyaya All India Institute of Medical Sciences (AIIMS), Raebareli, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_53
645
646
R. Kaur et al.
1 Introduction Brain tumor detection from magnetic resonance imaging (MRI) and its diagnosis is very crucial task to decrease the mortality rate due to the complex structure of brain. Though there are many approaches to detect the brain tumor, an efficient and robust methodology for detection and classification is still very challenging. Brain tumor can occur in any region of the brain due to the abnormal growth of the cells. Gradual growth of tumor can be seen in some patients’ MRI, and some might have duplicity of cells quickly (within days). It can be detected based on their shape, appearance and intensity levels. Image processing provides the efficient tools to detect the tumor region at early stages. Figure 1 shows brain tumor images from the dataset used in this research work. In this research, the following are the main contributions: • An ensemble approach is proposed for detection and classification of the brain tumor. This methodology consists of various steps, i.e., segmentation, extraction of features, classification, etc. • Otsu technique is used for the purpose of segmentation of the tumor. Then, a hybrid feature set (intensity, texture, shape and color features) is extracted from the dataset. • Each of the three classifiers, i.e., support vector machine (SVM), k-nearest neighbor (kNN) and decision trees are used to predict the class label of the brain tumors. Then, an ensemble approach, i.e., stacking has been applied to combine the predictions of all the three classifiers and predict the final class label of the tumor via majority-voting concept. This article is organized in different sections where Sect. 2 describes the related work, Sect. 3 explains the concept of ensemble learning approach, Sect. 4 shows the proposed methodology, and Sect. 5 discusses the experiments done and the results on the basis of performance parameters.
Fig. 1 Brain tumor [1]
An Ensemble Learning Approach for Brain Tumor …
647
2 Related Work There are different techniques proposed by the researchers to classify the brain tumors in the literature; some of them are discussed below. Trigui et al. utilized SVM and random forest to classify the tumor dataset into three different classes [2]. Three variants of SVM, i.e., linear kernel, radial basis function and cubic kernel are applied for the detection and classification of brain tumor [3]. Back-propagation neural network approach with the combination of genetic algorithm (GA) is used by the author to improve the tumor classification and feature extraction [4]. Morphological operations applied on the images to detect the shape and size of the tumor [5]. Self-organizing maps (SOM) technique is utilized on the dataset of 52 images [6]. Local as well as global contextual features are extracted using the convolutional neural networks (CNN) approach resulting in faster accurate classification [7]. GA is combined with artificial neural network (ANN) and SVM to enhance the accuracy in the classification of brain tumor along with extracting the 71 features [8]. Pereira et al. have proposed the automatic technique in which data augmentation is done with CNN to classify the images [9]. This deeper architecture not only gives the accurate classification but also provides the metadata of the classified data. To discriminate the tumorous and non-tumorous images, testing is performed on Gabor wavelet (GW) features using several classifiers [10]. Kadam et al. used KSVM with gray-level co-occurrence matrix (GLCM) for brain tumor detection [11].
3 Ensemble Learning Approach In this approach, conceptually different (or similar) machine learning classifiers are combined for classification via majority voting. Then, the final class label prediction is done on the basis of the most frequently predicted class label by the classification models. It is also known as hard voting. Mathematically, the class label is predicted via majority voting of each classifier C j as follows: yˆ = mode{ C1 (x), C2 (x), . . . , Cm (x)} i.e., the three classifiers are combined in such a way that the classifier 1, classifier 2 and classifier 3 predicted as class 0, class 0 and class 1, respectively. So, the final class label would be predicted as “Class 0” via majority voting. Figure 2 represents the basic concept of voting approach [12]. yˆ = mode{ 0, 0, 1} = 0
648
R. Kaur et al.
Fig. 2 Ensemble learning approach
4 Proposed Methodology An efficient approach is proposed in this paper to classify the brain tumor using MRI. Due to high contrast and spatial resolution, detection of tumor in MRI is more comprehensible [3]. The proposed methodology comprises three main steps, i.e., segmentation, extraction/selection of features and ensemble-based classification. In segmentation, Otsu’s segmentation [13] has been applied on the MRI images. Then, the hybrid feature set (shape, intensity, texture and color features) is extracted for each brain lesion image. Three classifiers (kNN, decision tree and SVM) are applied on the feature set for tumor classification. Each classifier’s prediction is then combined via ensemble learning approach to get the final prediction. Figure 3 represents the proposed methodology.
4.1 Image Acquisition Harvard Whole Brain Atlas dataset [1] is used in this research work. A total of 150 tumor image slices are used, out of which, 90 are tumorous, and 60 are non-tumorous images. These are available in coronal, sagittal and axial orientations as shown in Fig. 4. Both astrocytoma and meningioma tumor images are included in the dataset. Firstly, the dataset is divided into ratio of 60:40 for training and testing the proposed method, respectively, for validation used in the AutoModel. This model splits the 40% of data in 7 or 5 subsets for further sub-testing. This also has the advantage of cross-validation. The dataset is further divided into two categories, i.e., benign and malignant for both training and testing processes (Fig. 5).
An Ensemble Learning Approach for Brain Tumor …
Fig. 3 Proposed methodology
Fig. 4 MRI orientations a axial, b coronal, c sagittal
649
650
R. Kaur et al.
Fig. 5 MRI dataset
MRI Dataset
Benign Training (60%) Malignant Benign TesƟng (40%) Malignant
4.2 Image Segmentation In image processing, image segmentation is the step in which images are further divided into non-overlapping regions. Different techniques are used for segmentation in practical applications, i.e., thresholding method, edge detection-based techniques, clustering-based techniques, watershed-based techniques, region-based techniques, etc. In our methodology, it is done using Otsu’s thresholding method. It is the automatic region-based segmentation approach which chooses the threshold for minimizing the intra-class variance of the black and white pixels separated by the thresholding operator. Figure 6 shows the result of segmented image.
4.3 Feature Extraction A hybrid feature set which includes shape, intensity, color and texture features is used in this methodology. Principal component analysis (PCA) technique is utilized to reduce the dimensionality of feature space. GLCM method is utilized to examine the textures in an image. It is done by calculating how often a pixel with intensity value i occurs in a specific spatial relationship to a pixel with the value j. In MATLAB, GLCM matrix is made by graycomatrix function, and then, the extraction of statistical measures is done from this matrix. A total of 23 features has been selected for further analysis, i.e., area, centroid, circularity ratio, contrast, convexity, correlation, correlograms, diameter, energy, entropy, Euler2D, inverse difference moment (IDM), histogram, homogeneity, kurtosis, mean, moments, perimeter, root mean square (RMS), skewness, smoothness, solidity, standard deviation and variance. Each of the features has its own significance in brain tumor detection [14].
An Ensemble Learning Approach for Brain Tumor …
651
Fig. 6 Segmentation using Otsu’s thresholding
4.4 Classification After selecting the feature set, the classification strategy is to be chosen for further analysis. In machine learning approach, classification can be done using reinforcement learning, unsupervised learning and unsupervised learning. Supervised learning technique is utilized in this research work. In this technique, the model is constructed by using the training data (labeled data) and is evaluated by the test data, and then, the performance of the algorithm is measured. In practical scenarios, numbers of classifiers are used to classify the data. Three classifiers, i.e., SVM, kNN and decision trees have been utilized in this work. The working principles of these are mentioned in Fig. 7a–c. Support Vector Machine (SVM) In statistical learning domain, SVM classifier is used widely. It is a supervised learning model that analyzes and classifies the data with high dimensions. The main
652
R. Kaur et al.
Fig. 7 a Working principle of SVM [15], b working principle of kNN [16], c working principle of DT [17]
objective of this classifier is to find the hyperplane with maximum margin between two classes [18]. Each point represents each data item in n-dimensional space, where n is number of features. (Here, the value of particular coordinate is the value of each feature.) k-Nearest Neighbor (kNN) kNN is an easier technique used for classification based on the proximity rule. In this, the Euclidean distance is calculated from all the data points. It uses feature similarity to predict the class of the new data point. It works on all the training data in the testing phase.
An Ensemble Learning Approach for Brain Tumor …
653
Decision Trees (DT) DTs are one of the predictive modeling tools that predict the output by learning some decision rules that are inferred from the features of the data. In this, a treelike structure is formed consisting of internal nodes and leaf nodes which represent the feature of dataset and final decision, respectively [19]. Classification/decision rules are represented by the path from root to leaf. DTs divide the feature space into axis using parallel rectangles or hyperplanes. It gives the better results with good accuracy.
4.5 Stacking Model Stacking follows ensemble learning technique which predicts from multiple learning models on the same dataset. The primary goal of the ensemble learning approach is to improve the efficiency of the model which is to be followed to solve the particular problem [20]. Unlike bagging approach [21], the models are different in nature and work on the same training dataset. A single model is to be trained to get the best outcome after combining the predictions from the other participating models [22]. There are two types of models in the architecture of the stacking model, i.e., Level 0 (base model) and Level 1 (meta model). The base models get trained on the dataset, and then, the predictions made by them are used to train the meta model by combining the same. The base model output can be real values (0 or 1) or class labels (benign or malignant) [23]. SVM, kNN and decision trees are used in this methodology. Each classifier works on the same dataset, and then, stacking technique is applied to get the best and final prediction of the tumorous and non-tumorous image. Figure 8 represents the stacking
Fig. 8 Stacking model
654
R. Kaur et al.
model used in this research work. The final prediction comes in the form of class labels, i.e., benign and malignant tumor.
5 Results and Discussion The proposed methodology is implemented in MATLAB 2016 (licensed version at NITTTR, Chandigarh) with the 150 image slices. Certain parameters like average accuracy, precision, recall (sensitivity) and execution time have been used for the evaluation of the suggested methodology. The ensemble approach (stacking model) gives the better results in terms of segmentation and classification with good accuracy in least execution of time. Equations (1)–(3) are used to calculate the performance parameters in the MATLAB 2016. Accuracy = (TP + TN)/(TP + FN + FP + TN)
(1)
Recall(Sensitivity) = TP/(FN + TP)
(2)
Precision = TP/(FP + TP)
(3)
where TP—true positive, FN—false negative, FP—false positive, and TN—true negative values, respectively. The confusion matrix used to evaluate the performance of this model is shown in Table 1. Amin et al. [3] proposed the automatic approach to detect and classify the brain tumor and achieved average accuracy of 97.1%, 91.9% sensitivity, 0.98 area under curve and 98.0% specificity. Sudharani et al. [24] achieved 89.20% accuracy, 88.90% sensitivity and 90% specificity. In this article, the proposed methodology is very effective in predicting the class of the tumor due to the ensemble approach where hybrid feature set is put into different classifiers and the final output is generated on the basis of majority voting. It gives the average accuracy of 97.91%, 88.89% precision and 94.44% sensitivity in the least execution of time. These outcomes of the pro- posed methodology reveal its outperformance as compared to other methods as given in Table 2. Table 1 Confusion matrix
Actual class Predicted class
Positive
Negative
Positive
TP
FN
Negative
FP
TN
An Ensemble Learning Approach for Brain Tumor …
655
Table 2 Comparison of proposed method with the existing methods Existing methods
Results
Amin et al. [3]
Average Accuracy – 97.1%, Sensitivity – 91.9%, Specificity – 98.0%
Sudharani et al. [23]
Accuracy – 89.20%, Sensitivity – 88.90%, Specificity – 90%
Nabizdeh et al. [24]
Accuracy – 91.50%
Subhashini et al. [25]
Accuracy – 91%
Eman et al. [26]
Accuracy – 95.06%
Proposed Method
Accuracy – 97.91%, Sensitivity – 94.44%, Specificity – 98.90%, Precision – 88.89%
6 Conclusion In this research work, an ensemble approach is implemented for the classification of tumor images (benign and malignant) using MRI. The model is first trained using 60% of the images, and then, testing is performed on the remaining 40% of the dataset. SVM, kNN and decision tree classifiers are utilized to predict the tumor class, and then, the final outcome is generated on the basis of majority voting. Accuracy, precision and recall parameters are used for the evaluation of this methodology on the local dataset. This method will be helpful to the radiologists in classifying the tumor. This work can further be extended to sub-classify the malignant tumors with more precision using deep learning techniques.
References 1. Summers, D.: Harvard whole brain atlas: www.med.har-vard.edu/AANLIB/home.html. Last accessed 12-04-2020 2. Trigui, R., Mitéran, J., Walker, P.M., Sellami, L., Hamida, A.B.: Automatic classification and localization of prostate cancer using multi-parametric MRI/MRS. Biomed. Signal Process Control 31, 189–198 (2017) 3. Javeria, A., Sharif, M., Yasmin, M., Fernandes, S.L.: A distinctive approach in brain tumor detection and classification using MRI. Pattern Recogn. Lett. (2017) 4. Hemanth, D.J., Anitha, J.: Modified genetic algorithm approaches for classification of abnormal magnetic resonance brain tumour images. Appl. Soft Comput. 75, 21–28 (2019) 5. Hunnur, S. Raut, Kulkarni. S.: Implementation of image processing for detection of brain tumors. In: International Conference on Intelligent Computing and Control Systems (ICICCS), 2017 6. Chaplot, S., Patnaik, L., Jagannathan, N.: Classification of magnetic resonance brain images using wavelets as input to support vector machine and neural network. Biomed. Signal Process Control 1(1), 86–92 (2006) 7. Havaei, M., Davy, A., Warde-Farley, D., Biard, A., Courville, A., Bengio, Y.: Brain tumor segmentation with deep neural networks. Med. Image Anal. 35, 18–31 (2017) 8. Sachdeva, J., Kumar, V., Gupta, I., Khandelwal, N., Ahuja, C.K.: A package-SFERCBsegmentation, feature extraction, reduction and classification analysis by both SVM and ANN for brain tumors. Appl. Soft Comput. 47, 151–167 (2017)
656
R. Kaur et al.
9. Pereira, S., Pinto, A., Alves, V., Silva, C.A.: Brain tumor segmentation using convolutional neural networks in MRI images. IEEE Trans. Med. Imaging 35(5), 1240–1251 (2016) 10. Nabizadeh, N., Kubat, M.: Brain tumors detection and segmentation in MR images: Gabor wavelet versus statistical features. Comput. Electr. Eng., pp. 286–301 (2015) 11. Kadam, M., Dhole, A.: Brain tumor detection using GLCM with the help of KSVM. Int. J. Eng. Tech. Res. (IJETR) 7(2) (2017) 12. https://rasbt.github.io/mlxtend/user_guide/classifier/EnsembleVoteClassifier/#ensemblevote classifier. Last accessed 28-04-2020 13. Huang, M., Yu W., Zhu, D.: An improved image segmentation algorithm based on the Otsu method. In: 2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, pp. 135–139, 2012 14. Kaur, R., Doegar, A.: Localization and classification of brain tumor using machine learning & deep learning techniques. Int. J. Innov. Technol. Explor. Eng. (IJITEE) 8(9S) (2019) 15. https://www.javatpoint.com/machine-learning-support-vector-machine-algorithm. Last accessed 01-09-2020 16. https://www.datacamp.com/community/tutorials/k-nearest-neighbor-classification-scikitlearn. Last accessed 01-09-2020 17. https://www.javatpoint.com/machine-learning-decision-tree-classification-algorithm. Last accessed 10-04-2020 18. https://www.analyticsvidhya.com/blog/2017/09/understaing-support-vector-machine-exa mple-code/. Last accessed on 26-07-2020 19. https://towardsdatascience.com/decision-tree-in-machine-learning-e380942a4c96. Last accessed 12-04-2020 20. https://www.scholarpedia.org/article/Ensemble_learning. Last accessed on 26-08-2020 21. Lv, Y., et al.: A classifier using online bagging ensemble method for big data stream learning. Tsinghua Sci. Technol. 24(4), 379–388 (2019) 22. Galar, M., Fernandez, A., Barrenechea, E., Bustince H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst., Man, Cybern., Part C (Appl. Rev.) 42(4), 463–484 (2012) 23. https://machinelearningmastery.com/stacking-ensemble-machine-learning-with-python/. Last accessed 26-08-2020 24. Sudharani, K., Sarma, T.C., Prasad, K.S.: Advanced morphological technique for automatic brain tumor detection and evaluation of statistical parameters. Procedia Technol., pp. 1374– 1387 (2016) 25. Nabizadeh, N., John, N., Wright, C.: Histogram-based gravitational optimization algorithm on single MR modality for automatic brain lesion detection and segmentation. Expert Syst. Appl., pp. 7820–7836 (2014) 26. Subashini, M.M., Sahoo, S.K., Sunil, V., Easwaran, S.: A non-invasive methodology for the grade identification of astrocytoma using image processing and artificial intelligence techniques. Expert Syst. Appl., pp. 186–196 (2014)
Multimodal Emotion Recognition System Using Machine Learning and Psychological Signals: A Review Rishu, Jaiteg Singh, and Rupali Gill
Abstract In recent years, the study of emotion has increased due to the interaction of human with machine as it is helpful to interpret human actions and to improve the relationship among humans and machines for developing the software that can understand the human states and can take action accordingly. This paper focuses on a preliminary study on emotion recognition using various psychological signals. Different researchers investigated various parameters which include facial expression, eye gaze, pupil size variation, eye movements using EEG, and deep learning techniques to extract the emotional features of humans. Diverse researchers have proposed a method for detecting emotions by using different psychological signals and achieved reliable accuracy. After a thorough analysis, it has been observed that the best accuracy achieved on the individual emotion detection was 90%. However, this experiment does not help to classify the specific emotion. To classify the specific emotion, the best accuracy achieved was 79.63%, which is a comparable accuracy. Keywords Affecting computing · Neural network (NN) · Machine learning (ML) · Deep learning (DL) · Pupillary diameter (PD) · Electroencephalography (EEG) · Electromyogram (EMG) · Support vector machine (SVM)
1 Introduction To make interactions better between users and computers, various researches have been performed to enhance the multiple abilities of computers such as natural language processing, emotion recognition, and action recognition [1, 2]. Various possible applications cover a vast scope of emotion recognition at a social, personal, and professional level. Emotions are psychological conditions which affect the behavior of human, relations, and the result of the action. They are available in every psychological procedure, and any human movement appearance is joined by enthusiastic encounters. In the past few years, various researchers performed a lot Rishu (B) · J. Singh · R. Gill Chitkara Institute of Engineering and Technology, Chitkara University, Rajpura, Punjab, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_54
657
658
Rishu et al.
of efforts to recognize emotions since it plays a vital role in the interaction system of humans and machines. Earlier researchers worked on non-psychological signals like speech, text, and facial expression [3], but recently, researches are based on psychological signals like pupillary diameter (PD), electroencephalography (EEG), and electromyogram (EMG) which are more reliable and efficient. Among all these techniques, electroencephalography (EEG) tends to be more proficient to record the activities of the brain which can further help to provide the informative characteristics to judge the emotional states [4]. Numerous researches have been done on recognizing the emotions by using EEG [5–7]. Moreover, earlier research [8, 9] has shown that pupil size discriminates after and during different types of emotion stimuli which shows the measurement of variation in pupil size can be considered as a functional input signal. A method for emotion recognition is presented in [10] which is user-independent by using gaze distance, EEG, and pupillary response and achieved accuracy of 68.5% of valence (for three labels) and 76% of arousal (for three labels). Some researchers also try to extract emotional features with facial images by using deep learning [11]. Emotions can also be reflected by the eye movements because they also represent the external features of the user, so taking that in consideration, researchers also try to analyze the movements of eyes in a variety of various experiments and tried to find the correlations between the emotional state and eye movements [12]. Emotional states of different subjects have been studied with the help of gaze position and conduct of pupil size. Their main goal [13] is to generate a mapping between the behavior of gaze and pupil and provoked emotional states with visual stimuli. For input signals, gaze position and dilation of the pupil are considered, which follows some standard protocols. For the measurement of emotional states, an objective assessment was used which was developed by the Center for the Study of Emotion and Attention under the name of International Affective Picture System (IAPS) [14]. Automatic testing of a person’s emotions using psychological signals has found significant importance in the formation of areas that interact with the human–machine to detect emotions. However, the process of classifying human emotions through manual or expert invigilator is difficult process [15]. In this effort, an attempt has been made to differentiate and separate the emotions from the human responses associated with electroencephalography (EEG). Human emotion is a subset of artificial intelligence that understands, preprocessed, and classifies human emotions automatically [16]. Various machine learning/deep learning techniques have been used to classify human emotions through psychological signals. Hence, these features are very important for classification purposes. During emotion classification, training and testing data have been important for ML/DL technique [17]. The data have been collected either from real device or taken from primary and secondary sources. For emotion classification, performance parameters have been used to evaluate the ML/DL technique [18]. The key idea behind this paper is to study or compare the different machine learning/computer vision techniques which can be used for extracting the emotional features, and if anyone wants to research, this domain can have a better idea on which parameters research can be started and in which research the work can be enhanced more.
Multimodal Emotion Recognition System Using Machine Learning …
659
This paper classified into various sections. Section 2 focuses on the motivation for this study. Section 3 describes the related study for emotion recognition using different parameters and psychological signals. Section 4 considers the methodologies used for recognizing emotions. The conclusion will be considered in Sect. 5.
2 Motivation Emotion recognition is an interdisciplinary research area involving several fields that include computer sciences, cognitive science, and psychology. A lot of work has been done in era of emotion recognition using the combination of various parameters like eye gaze, facial expression, pupil dilation, eye blink, and eye movements. According to various researchers [13, 19, 20], pupil dilation, facial expression, eye tracking, and information of eye gaze are an efficient cues to determine emotions, but these studies are only limited to small dataset; however, larger dataset can help to find accurate results by applying the proposed methods.
3 Literature Review While humans communicate with each other, they sometimes used nonverbal cues such as facial expressions, eye gaze, body postures, and hand gestures. Affective computing [21] has changed the way by allowing the humans to interact with computers by recognizing, interpreting, and stimulating the human effects [22] investigated that recognizing emotions, facial expressions alone is not enough so they make use of the information of eye gaze and facial expressions both together of a human being. First they analyze eye gaze and categorize into direct or averted and used knowledge-based method for detecting eye location, their research analysis, the work [23] for facial expression and found that more errors occur due to the confusion in certain emotions like anger and sadness, so to avoid confusion, they categorize facial expression in two sets. To extract features from the eye [24] used eye gaze and pupil size variation. Their work is mainly focused on how eye gaze patterns and pupil size variation are related to the emotional stimulation by using various images from the International Affective Picture System (IAPS). The latest work [25] reported that the pupil size was larger than the neutral simulation after very positive and negative simulation. To use the measurement of pupil size and tracking pupil for detecting the responses of emotions, one must understand the relationship between the emotions and eye features. It has been suggested in the study [26] that the most popular way of measuring the affective and cognitive information is through eye-tracking because detailed estimation can be provided through eye movements that what information an individual is considering rather quickly. The method used in their research is HATCAM which is a head-mounted system in which [27] eye-tracking system is
660
Rishu et al.
placed on the head of the person along with the camera to capture the visual image. The main advantage of using this system is to make the user move freely during the simulation. The main goal of their research is to differentiate between the arousal and neutral states by using the features from eye tracking and pupil size variation. Similar to this task, four emotions were classified by [28], compared different classifiers with 32-channel EEG sensors, and achieved accuracy of 90%; they compared their proposed technique with the existing technique, namely Noldus face reader to classify particular emotion, and successfully achieved 53% accuracy. However, various researchers also try to extract emotional features with the facial image using deep learning, in order to examine the potential of eye-tracking glasses for multimodal emotion recognition [29] the author has collected eye images in collaboration with eye movements and EEG for classifications of emotions by comparing four combinations with two kind of fusion method and three different types of data and achieved better accuracy. To achieve this, they mainly contribute convolution neural network [30] and long short-term memory network (LSTM) [31], to extract features from eye images, after that five class emotion classification evaluated based on two fusion features and feasibility was explored based on eye images and eye movement, and after this, eye images are recorded to form a subset of SEED V [32].
4 Methodology 4.1 Extraction of Features To recognize emotions, various researchers extracted the features from different parameters which include eye movements, eye gaze, facial expression, pupillometry, EEG, and ECG. In the research study [1] the author has extracted the features for differential entropy(DE) by using 256-point Short Time Fourier transform(STFT) with non-overlapping hamming window [33]. A similar method is also used in [34] to extract thirty-three dimensions, pupil diameter, and so on. In order to extract the psychological state of a human (i.e neutral, positive, and negative), an experiment has been conducted in [35] where 5 persons took participation in an experiment of emotion recognition (includes 3 males and 2 females). Various tools and technologies used by different researchers to extract the features by using various parameters. The methodology used by various researchers for extracting the features is illustrated in Fig. 1 in which the interesting features are acquired from image. Different parameters are used by the researchers to extract the features using different tools as shown in Table 1.
Multimodal Emotion Recognition System Using Machine Learning …
661
Fig. 1 Methodology used for extracting the eye features
4.2 Classification After extracting the eye features from various parameters along with the EEG signals than first, we will preprocess this data before any type of classification as shown in Fig. 2. Before classification, the data will be divided into the two phases that are training data and testing data. After the preprocessing is performed, the preprocessed data will be transferred to the classifiers to classify them in the negative, positive, or neutral state. To classify, every researcher has used different classifiers which can obtain an optimal result to achieve better accuracy as shown in Table 1.
4.3 Performance Parameters After classifying the various emotions, it has been observed that different researchers used different parameters and technique to classify emotions and good accuracy was achieved by each researcher as shown in Fig. 3. Year-wise accuracy by different researchers using different parameters has been achieved that shows every parameter can be an efficient cue to recognize the emotions. It has been observed from Fig. 3 that the highest accuracy was achieved by [22] which classifies the six different emotions with the combination of gaze analysis and recognition of facial expressions. This experiment helps to clear the difference between the emotions that are fear and angry, etc. However, every researcher got comparable accuracy in their domain (using
662
Rishu et al.
Table 1 Different parameters for extracting features Reference Data source Merits No.
Demerits
[16]
Facial expression, eye gaze
Achieved better results with eye gaze and facial expression instead of using facial expression alone
Detection rate SVM of few facial expressions can be improved with the proposed method
[18]
Eye gaze, pupil size variation
Proves that eye gaze along with pupil size can be an efficient cue to differentiate different emotional states
No use of psychological signals; however, it can enhance the achieved results
[29]
EEG, eye tracking
Build an emotion recognition model by employing two fusion strategies which help to detect emotion more accurately
Less number SVM of participants participated in experiment
[30]
Eye Comparableresultswith Research movements, the activities of eyes work only pupil limited to eye dilation, tracking and pupil voice invisibility recording only
Hybrid classifier (Gaussian mixture model, SVM)
[11]
Pupil size, gaze position, neural networks
Decision tree, 53.6, neural networks 50.1 (NN)
[13]
Facial Compared the obtained expressions, result with the existing EEG approach of emotion detection and acquired better results
For emotion detection, Less input two measures used features to which are less complex predictors and cheaper for measuring gaze position and size of pupil
Classifier
Accuracy (%) 90
Non-parametric 80 K-nearest neighbor (KNN)
A low-end SVM sensor was used to make experiment affordable; however, results can be enhanced with combination of different psychological sensors
73.59
66
53
(continued)
Multimodal Emotion Recognition System Using Machine Learning …
663
Table 1 (continued) [31]
Facial expression, eye gaze
Shows that for the detection of emotion information, EEG sensor is useful and can give good accuracy
Less features SVM, random have been forest extracted from the different domains of EEG recordings
77.2
[32]
Eye gaze, pupil size variation
Build a system for detecting boredom as less work has been done to detect boredom as it is negative emotion
Limited number of participants participated in the experiment
79.63
SVM, K-nearest neighbors (KNN)
Fig. 2 Classification of features [35]
Fig. 3 Accuracy achieved with different parameters and classifiers in respective year
664
Rishu et al.
different parameters) as compared to previous research on emotion recognition, but if we will see that to detect the specific emotion then the comparable accuracy was achieved, i.e., 79.63% by [1] which is the result of combining the data of eye movements and eye images collected from eye-tracking glasses.
5 Conclusion In this article, we analyze different techniques to recognize emotions in every aspect as it helps to upgrade relationship between human and machine. Various researchers worked on emotion recognition using various parameters like eye gaze, eye movements, pupil dilation, and facial expressions to extract features. Different eye tracker tools have been used to capture the eye movements, and then to classify the extracted features from different parameters, different machine learning techniques have been used, i.e., supervised and unsupervised learning techniques. This study shows that most of the researchers used a supervised method of machine learning, i.e., SVM and achieved good accuracy. As observed, pupil has a strong connection between emotional stimulation and application autonomous which is helpful in recognizing emotions. After analyzing all the techniques from suggested articles, it is recommended that with the help of various parameters like eye gaze, pupil diameter, and facial expressions along with psychological signals, one can achieve good accuracy by applying different learning tools on larger number of participants.
References 1. Guo, J.J., Zhou, R., Zhao, L.M., Lu, B.L.: Multimodal emotion recognition from eye image, eye movement and EEG using deep neural networks. In: Proceedings of Annual International Conference of the IEEE Engineering in Medicine and Biology Society EMBS, pp. 3071–3074 (2019). https://doi.org/10.1109/EMBC.2019.8856563 2. Liu, J., Meng, H., Li, M., Zhang, F., Qin, R., Nandi, A.K.: Emotion detection from EEG recordings based on supervised and unsupervised dimension reduction. Concurr. Comput. 30(23), 1–13 (2018). https://doi.org/10.1002/cpe.4446 3. Calvo, R.A., D’Mello, S.: Affect detection: an interdisciplinary review of models, methods, and their applications. IEEE Trans. Affect. Comput. 1(1), 18–37 (2010). https://doi.org/10.1109/ T-AFFC.2010.1 4. Soleymani, M., Pantic, M., Pun, T.: Multimodal emotion recognition in response to videos (extended abstract). In: 2015 International Conference on Affective Computing and Intelligent Interaction, ACII 2015, 2015, 3(2), 491–497. https://doi.org/10.1109/ACII.2015.7344615. 5. Zheng, W.L., Zhu, J.Y., Peng, Y., Lu, B.L.: EEG-based emotion classification using deep belief networks. In: Proceedings—IEEE International Conference on Multimedia & Expo, vol. 2014-Septe, no. Septmber, 2014. https://doi.org/10.1109/ICME.2014.6890166 6. Nie, D., Wang, X.W., Shi, L.C., Lu, B.L.: EEG-based emotion recognition during watching movies. In: 2011 5th International IEEE/EMBS Conference on Neural Engineering NER 2011, pp. 667–670, 2011. https://doi.org/10.1109/NER.2011.5910636 7. Lin, Y., Wang, C., Wu, T., Jeng, S., Chen, J.: EEG-Based Emotion Recognition In Music Listening : A Comparison of Schemes for Multiclass Support Vector Machine Department of
Multimodal Emotion Recognition System Using Machine Learning …
8. 9.
10.
11.
12.
13.
14. 15. 16.
17.
18.
19.
20.
21. 22.
23.
24.
25.
665
Electrical Engineering, National Taiwan University, Taiwan Cardinal Tien Hospital, Yung-Ho Branch, Taiwan, IEEE, pp. 489–492, 2009, 978-1-4244-2354-5/09. Granholm, E., Steinhauer, S.R.: Pupillometric measures of cognitive and emotional processes. Int. J. Psychophysiol. 52(1), 1–6 (2004). https://doi.org/10.1016/j.ijpsycho.2003.12.001 Partala, T., Jokiniemi, M., Surakka, V.: Pupillary responses to emotionally provocative stimuli. Proc. Eye Track. Res. Appl. Symp. 2000, 123–129 (2000). https://doi.org/10.1145/355017. 355042 Soleymani, M., Asghari-Esfeden, S., Fu, Y., Pantic, M.: Analysis of EEG signals and facial expressions for continuous emotion detection. IEEE Trans. Affect. Comput. 7(1), 17–28 (2016). https://doi.org/10.1109/TAFFC.2015.2436926 Liu, P., Han, S., Meng, Z., Tong, Y.: Facial expression recognition via a boosted deep belief network. Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1805–1812, 2014. https://doi.org/10.1109/CVPR.2014.233. Nummenmaa, L., Hyönä, J., Calvo, M.G.: Eye movement assessment of selective attentional capture by emotional pictures. Emotion 6(2), 257–268 (2006). https://doi.org/10.1037/15283542.6.2.257 Aracena, C., Basterrech, S., Snasel, V., Velasquez, J.: Neural networks for emotion recognition based on eye tracking data. In: Proceedings—2015 IEEE International Conference on Systems, Man, and Cybernetics SMC 2015, pp. 2632–2637, 2016. https://doi.org/10.1109/SMC.201 5.460. L. PJ, International affective picture system (IAPS): affective ratings of pictures and instruction manual. Tech. Rep., 2005. Mahajan, R.: Emotion recognition via EEG using neural network classifier. Adv. Intell. Syst. Comput. 583, 429–438 (2018). https://doi.org/10.1007/978-981-10-5687-1_38 Srivastava, M., Saini, S., Thakur, A.: Analysis and parameter estimation of microstrip circular patch antennas using artificial neural networks. Adv. Intell. Syst. Comput. 583, 285–292 (2018). https://doi.org/10.1007/978-981-10-5687-1_26 Sheth, S., Ajmera, A., Sharma, A., Patel, S., Kathrecha, C.: Design and development of intelligent AGV using computer vision and artificial intelligence. Adv. Intell. Syst. Comput. 583, 337–349 (2018). https://doi.org/10.1007/978-981-10-5687-1_31 Lalwani, S., Sharma, H., Satapathy, S.C., Deep, K., Bansal, J.C.: A survey on parallel particle swarm optimization algorithms. Arab. J. Sci. Eng. 44(4), 2899–2923 (2019). https://doi.org/ 10.1007/s13369-018-03713-6 Matlovic, T., Gaspar, P., Moro, R., Simko, J., Bielikova, M.: Emotions detection using facial expressions recognition and EEG. In: Proceedings—11th International Workshop on Semantic and Social Media Adaptation and Personalization SMAP 2016, pp. 18–23, 2016. https://doi. org/10.1109/SMAP.2016.7753378. Lian, Z., Li, Y., Tao, J.H., Huang, J., Niu, M.Y.: Expression analysis based on face regions in read-world conditions. Int. J. Autom. Comput. 17(1), 96–107 (2020). https://doi.org/10.1007/ s11633-019-1176-9 Picard, R.: Affective Computing. MA MIT Press (1995). Zhao, Y., Wang, X., Petriu, E.M.: Facial expression anlysis using eye gaze information. IEEE Int. Conf. Comput. Intell. Meas. Syst. Appl. Proc., pp. 7–10 (2011) https://doi.org/10.1109/ CIMSA.2011.6059936. Zhao, Y., Shen, X., Georganas, N.D.: Facial expression recognition by applying multi-step integral projection and SVMs. In: 2009 IEEE Instrumentation and Measurement Technology Conference I2MTC 2009, no. May, pp. 686–691, 2009. https://doi.org/10.1109/IMTC.2009. 5168537. Lanatà, A., Armato, A., Valenza, G., Scilingo, E.P.: Eye tracking and pupil size variation as response to affective stimuli: a preliminary study. In: 2011 5th International Conference on Pervasive Computing Technologies for Healthcare and Workshops PervasiveHealth 2011, pp. 78–84, 2011. https://doi.org/10.4108/icst.pervasivehealth.2011.246056. Partala, T., Surakka, V.: Pupil size variation as an indication of affective processing. Int. J. Hum. Comput. Stud. 59(1–2), 185–198 (2003). https://doi.org/10.1016/S1071-5819(03)00017-X
666
Rishu et al.
26. Lohse, G.L., Johnson, E.J.: A comparison of two process tracing methods for choice tasks. Proc. Annu. Hawaii Int. Conf. Syst. Sci. 4(1), 86–97 (1996). https://doi.org/10.1109/HICSS. 1996.495316 27. Lanatà, A., Valenza, G., Scilingo, E.P.: Eye gaze patterns in emotional pictures. J. Ambient Intell. Humaniz. Comput. 4(6), 705–715 (2013). https://doi.org/10.1007/s12652-012-0147-6 28. Giri, J.P., Giri, P.J., Chadge, R.: Neural network-based prediction of productivity parameters. Adv. Intell. Syst. Comput. 583, 83–95 (2018). https://doi.org/10.1007/978-981-10-5687-1_8 29. Ramirez, R., Vamvakousis, Z.: Detecting emotion from EEG signals using the emotive Epoc device. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), 7670 LNAI, 175–184 (2012). https://doi.org/10.1007/978-3-642-351396_17. 30. Hinton, G.E., Krizhevsky, A., Sutskever, I.: Imagenet classification with deep convolutional neural networks (2012) 31. Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18(5–6), 602–610 (2005). https://doi.org/10. 1016/j.neunet.2005.06.042 32. Li, T.H., Liu, W., Zheng, W.L., Lu, B.L.: Classification of five emotions from EEG and eye movement signals: discrimination ability and stability over time. Int. IEEE/EMBS Conf. Neural Eng. NER, vol. 2019-March, pp. 607–610, 2019. https://doi.org/10.1109/NER.2019.8716943. 33. Duan, R.N., Zhu, J.Y., Lu, B.L.: Differential entropy feature for EEG-based emotion classification. Int. IEEE/EMBS Conf. Neural Eng. NER, pp. 81–84, 2013. https://doi.org/10.1109/ NER.2013.6695876. 34. Lu, B.-L. Lu, Y., Zheng, W.-L., Li, B.: Combining eye movements and EEG to enhance emotion recognition (2015) 35. Zheng, W.L. Dong, B.N., Lu, B.L.: Multimodal emotion recognition using EEG and eye tracking data. In: 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society EMBC 2014, pp. 5040–5043, 2014. https://doi.org/10.1109/ EMBC.2014.6944757
Drowsiness Image Detection Using Computer Vision Udbhav Bhatia, Tshering, Jitendra Kumar, and Dilip Kumar Choubey
Abstract Drowsiness of the driver is the significant cause of the road accidents. The need of the hour is to come up with some measures to control it, and our prototype helps us to achieve that. The main objective of this research study is to come up with a solution to curb down road accidents due to fatigue. Drowsiness can be detected through various ways, but we mainly focus on facial detection using computer vision. In this prototype, a driver’s face is captured by our program for analyzing. We apply facial landmark points with the help of a facial detection algorithm to extract the location of the driver’s eyes. Subsequently, the eye moment is recorded as per the specified frame; if the driver closes his eyes more frequently or more than a specified time, then he/she can be declared as drowsy which will eventually lead to triggering of the alarm. With the advancement of technology, automatic self-driving cars are emerging at a fast rate, but still, they need someone’s supervision so we can use the above-mentioned technology in those cars to see if a driver is sleepy or not. If he/she is sleepy, then the car can slow down and stop gradually, on its own and will not go further. Keywords Facial detection · Drowsiness · Computer vision · Facial landmark point · Image processing
1 Introduction In general, road accidents are a very common phenomenon across the world since motorization in India. The rate of road accidents in India is increasing substantially nowadays. A recent study showed that approximately 1214 road accidents happen U. Bhatia · Tshering · J. Kumar School of Advanced Sciences, Vellore Institute of Technology, Vellore, India e-mail: [email protected] D. K. Choubey (B) Department of Computer Science & Engineering, Indian Institute of Information Technology Bhagalpur, Bhagalpur, India e-mail: [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_55
667
668
U. Bhatia et al.
daily. The majority of accidents are due to human error rather than a mechanical error. The reasons consist of drinking and driving, underage driving, wrong lane driving, not conscientious on the road, over speeding, use of mobile, etc. One of the main reasons behind these accidents was that the drivers fall asleep behind the wheel and lose control. Physical fatigue reduces the concentration, alertness, activeness and vigilance of the driver which lead driver to take slow and delayed action as their reflex action time to respond to a situation increases. Drowsiness affects the mental alertness which reduces the ability of the driver to remain focused which increases the chances of human errors. Among many other reasons like underage driving and drunk driving, drowsiness is the only factor which cannot be sought by government intervention through amendment of stringent traffic rules to control the accidents, so the need of the hour is to develop a technique that can measure the driver’s tendency of drowsiness. The present work/prototype will help us to know whether a person is felling drowsy or not. This can be implemented by eye blinking technique, using image processing which will help in measuring the driver’s eye blinking rate and its duration. If the duration of the eyes blinking rate is more than a specified time, then the driver will be declared drowsy and sound alert will be triggered. To achieve the desired outcome, we are using OpenCV with predefined libraries like Scipy, Dlib, playSound, Numpy, Imutils, etc. When the program is executed, it captures the video of the driver’s action using a webcam, and it detects the face of the driver using the Dlib library. After the face detection, the facial landmark points are laid down using the facial_landmarks_predictor library. Eye landmarks are extracted, and distances between the eyes are calculated using the distance formula. The number of frames is calculated, and an alarm is triggered if the frames are more than the specified frames. The words drowsiness, physical fatigue and asleep are interchangeably used in our study. The rest of the article is organized as follows: Drowsiness detection techniques has been discussed in Sect. 1.1, background has been elaborated in Sect. 2, validation of the regression model has been discussed in Sect. 3, related work is presented in Sect. 4, experimental setup has been introduced in Sect. 5, methodology has been stated in Sect. 6, results and discussion have been described in Sect. 7, and conclusions have been committed to Sect. 8.
1.1 Drowsiness Detection Techniques This can be implemented in various ways using artificial neural networks, vehicularbased measure and image processing, but we mainly focus on image processing techniques and computer vision as shown in Fig. 1. Vehicular-based detection can be achieved by analyzing steering wheel movement and standard lane deviation. Image processing-based techniques include the yawning technique, eye blinking technique and template matching technique.
Drowsiness Image Detection Using Computer Vision
669
Fig. 1 Drowsiness detection techniques [1]
2 Background We have collected the ten years data (2005–2015) of road accidents for entire India from the open-access government data portal [2]. The data was thoroughly studied, filtered and arranged to suit the need of our study. This refined data was furthered analyzed using excel regression model for inference (Fig. 2; Table 1).
Fig. 2 Time series analysis for road accidents
670
U. Bhatia et al.
Table 1 Total accident data from 2005 to 2015
Years
Total accidents
2005
439,255
2006
460,920
2007
479,216
2008
484,704
2009
486,384
2010
499,628
2011
497,686
2012
490,383
2013
486,476
2014
489,400
2015
501,423
We have used time series function and graph for analyzing the trend accordingly. From the above table and graph, we can conclude that the road accidents were increasing over the years. We found that drowsiness and drink drive were playing a major role in causing accidents. For further studies, we focused mainly on the drowsiness and alcohol effect state wise data of three years from 2014 to 2016 to generate the inference out of the data. We are using multiple linear regression models as stated below. Y = β0 + β1 X 1 + β2 X 2 + ∈ . where Y = Total number of accidents occurred β0 = Fixed initial arbitrary constant that denotes fixed initial investment in any form (time, money, effort, etc.) β1 = Coefficient associated with variable alcohol X 1 . β2 = Coefficient associated with variable asleep (drowsiness) X 2 . ∈= Irregular components that explain other factors other than alcohol and drowsiness affecting total accidents in the study. Here as per the assumption of the regression model, [∈] = 0 (Table 2). The fitted regression line obtained from the sample observation is given by
y = 2149.4035 + 4.6051x1 + 77.3725x2
This fitted regression line is used for the purpose of predicting about possible available subject to putting the estimated value of x 1 and x 2 in the model. It will also utilize for conducting the errors analysis.
Coefficients
4.605188401
77.37257663
Average Asleep
2149.403577
Average Alcohol
Intercept
15.276556032
4.605188401
2153.54892
Standard error
Table 2 Coefficient of regression model t Stat
5.064790437
1.140137612
0.998075111
P-value
1.52406E−05
0.262436717
0.325505607
46.29218096
−3.612526855
108.4529723
12.82290366
Upper 95% 6530.8318
Lower 95% −2232.024645
Lower 95.0%
46.29218096
−3.612526855
−2232.024645
Upper 95.0%
108.4529723
12.82290366
6530.8318
Drowsiness Image Detection Using Computer Vision 671
672
U. Bhatia et al.
3 Validation of the Regression Model Here, Table 3 summarized the regression statistics based on few measures which are noted below: Since the value of adjusted R square is 0.6128, we can infer that out of total variation in accident cases, the factor under study contributes up to 61.28% and remaining 38.72% of cases being explained by irregular component ∈ in the model, for example, potholes in the road, over speeding, careless driving by underage and unlicensed unskilled driver, using cell phone while driving, bad weather condition, etc.
3.1 Fitting of Line Plot The calculated regression values are fitted into line plot for a better understanding of accidents caused by asleep and alcohol effect as shown below (Figs. 3 and 4). Table 3 Regression statistics
Regression statistics Multiple R
0.796850493
R square
0.634970708
Adjusted R square
0.61284772
Standard error
10,175.19707
Observations
36
Fig. 3 Average asleep line fit plot
Drowsiness Image Detection Using Computer Vision
673
Fig. 4 Average alcohol line fit plot
By analyzing the graph, we can conclude that falling asleep and driving after consumption of alcohol are directly proportional to cause of accidents, since these two factors are positively correlated with accident. In other words, we can also deduce from the above graph that with the increase number of drunk driving and asleep or drowsiness state will lead to more number of accidents.
4 Related Work Kaur [3] shares the different emerging technologies used for detecting drowsiness of eyes while driving vehicles. The author has highlighted on three different kinds of detection based on vehicle, behavioral and physiological. She has also discussed on the detection techniques like ECG and EEG, local binary pattern, steering wheel movement and optical detection. Sharma and Banga [4] described about the various methods for detecting drivers fatigue using video and image of the drivers by analyzing image. They have also shared on use of fuzzy logic and neural network for the detection of various parts of body movements. Singh and Kaur [5] have worked on the image processing technique with MATLAB software for measuring eye blink rates and triggering warning alarm to drivers falling asleep while driving. Musale and Pansambal [6] have worked to overcome narcolepsy and micro-sleep. Their work was based on detection and extraction of face and eyes region and used Raspberry Pi to capture and monitor video and to trigger alarm in case of detection of driver falling asleep.
674
U. Bhatia et al.
Choudhary et al. [7] shared various techniques that can be used for detecting drowsiness in their research. The techniques are based on image processing, artificial neural network based methods, ECG based methods and the vehicular-based methods. Al-Anizy et al. [8] have used image processing technique and Haar face detection algorithm to detect driver’s drowsiness. They have worked on the subject using SVM for classifying the state of eyes, that is, open and close state of eyes for certain period of time. Akrout and Mahdi [9] have worked on image processing and SVM for detecting and confirming the drowsy state of drivers. Alshaqaqi et al. [10] described the procedure to detect the drowsy state of drivers. They focused on detection of eyes location, identification of eyes states and calculation of PERCLOS to confirm drowsiness. Assari and Rahmati [11] have used facial expression and infrared light to detect fatigue state of drivers with hardware system. Fuletra and Bosamiya [12] surveyed various techniques used for detecting the drowsy state of drivers like using image processing methods, neural network methods and ECG methods. Danisman et al. (2010) [13] experimented the drowsiness detection by calculating the eyes blinking rate and showed 94% accuracy with a 1% false positive rate. Devi and Bajaj [14] used eye tracking and blinking rate calculation for detecting drowsiness in their work. Saini and Saini [15] have researched for various tools like optical detection, yawnbased method, eyes blinking methods and steering wheel movement for detecting the drowsiness state of drivers. Flores et al. [16] worked on using the yawn frequency, eye blinking frequency, eye gaze movement, head movement and facial expressions for detecting drowsiness of the drivers. Garcia et al. [17] have followed PERCLOS method: PERcentage of eye CLOsure calculation method to detect driver’s drowsiness. Churiwala et al. [18] used image processing technique to detect parameters like duration of eye closure, frequency of eye blinks, detection of yawning and head rotation to confirm the fatigue state of drivers. Kumar et al. [19, 20] have successfully applied limited layers neural network for denoising image and heuristic cat swarm optimization to train the image filter network. Choubey et al. [21–30] have used many applications of machine learning, soft computing and data mining techniques for the classification of diabetes.
5 Experimental Setup The basic requirements for our prototype are as follows:
Drowsiness Image Detection Using Computer Vision
1. 2. 3. 4.
675
Excel Python OpenCV Dependencies Libraries.
5.1 Microsoft Excel Microsoft Excel is a spreadsheet developed for windows to compute numerical, charts, graphs and statistics. We use MS Excel and its add-on data solver for cleaning, arranging and finally analyzing the data for our project to derive inference for the cause of road accidents. We have computed regression model for independent variables (drowsiness factor and drink driving) to the total accident (dependent variables).
5.2 Python Python is general-purpose, high-level, interpreted and dynamically typed programming language created by Guido Van Rossum. We are using Python as main programming language because of its versatility and capability features.
5.3 OpenCV Open-source computer vision (OpenCV) is indispensable software in our work. It plays the role of the heart in our proposed prototype. OpenCV is an open-source computer vision with loads of powerful libraries or programs freely available to use, reuse and manipulate to suit to our requirement as it aims to provide real-time computer vision. It further leverages the power of software by providing it for free of cost.
5.3.1
OpenCV–Python
It is a connector between OpenCv and Python to solve the computer vision problem. Without this package, OpenCV will not be able to communicate with Python and its program. We have to specify the language with which we are using the OpenCV as OpenCV is compatible with C++, Java and Python.
676
U. Bhatia et al.
5.4 Dependencies Library 5.4.1
NumPy
NumPy stands for numerical Python, which is a Python library that is fundamental for array computing. It also acts as an efficient container for multidimensional generic data.
5.4.2
SciPy
SciPy is a Python library that works along with NumPy for computing numerical data integration and optimization. We use SciPy to calculate the vertical and horizontal length of our eyes to confirm the duration and state of the open eye and closed eye, in other words, it calculates Euclidian distance of eyes in eye aspect ratio using facial landmark points.
5.4.3
Dlib
Dlib is the most important machine learning C++ library which is used to detect the face of the driver and locating the facial landmark on the region of interest. It also helps in extracting region of eyes for calculating EAR to derive the conclusion for the drowsiness of the drivers. In addition to this, we have to download a pre-trained modelshape_predictor_68_face_landmarks.dat file [31].
5.4.4
PlaySound
PlaySound is used to play any audio file in the program like any other audio player. We use this tool for triggering an alarm when the eye aspect ratio goes beyond the specified time frame. This is considered as alert caution to the drivers to gain back consciousness toward the road before the unforeseen predicaments.
5.4.5
Imutils
Imutils is a series of convenience functions which comprises series of function for implementing resizing, transformation, translation, rotation and displaying of images. It is a very useful tool for image processing.
Drowsiness Image Detection Using Computer Vision
677
6 Methodology The proposed prototype is built using the following method, and Fig. 5 explains the system life cycle. In Fig. 6, proposed systems are shown below.
6.1 Video Capture The initial and most important step is to record the live actions of the driver which is done by invoking the webcam. Video capturing is the most important as it will only help us to detect the face. We can refer to the webcam as the eyes of our prototype.
6.2 Face Detection Dlib scans the video thoroughly and then analyzes the object captured by the webcam, and afterward it tries to detect the face from the objects. Once the face is detected, the shape_predictor_landmark helps in applying the facial landmarks so that it becomes easy for us to extract the eyes region (Fig. 7).
6.3 Region of Eyes Extraction Once facial landmarks are applied, it becomes easy to abstract the region of interest as it marks the organs on the face with some corresponding numbers. The left eyebrow can be detected from number 15 to 18, similarly the right eyebrow can be related to numbers 19 to 22. And eyes can be extracted using numbers 23 to 27 and 28 to 32.
Fig. 5 Life cycle
678
Fig. 6 Proposed system ER diagram Fig. 7 Facial landmark plot
U. Bhatia et al.
Drowsiness Image Detection Using Computer Vision
679
Fig. 8 Eye aspect ratio
6.4 Blink of Eyes Calculation Eye blink can be detected by referencing significant facial landmarks. The program uses a facial training set to understand the facial structure. Then the program uses prior to estimating the probable distance between key points. For eye blink, we need to pay attention to eye points 23 to 27 and 28 to 32. The eye aspect ratio is an estimate of the eye-opening state. The eye aspect ratio is generally constant value when the eye is open which is 0.25 but rapidly follows to 0 when the eye is closed. Each eye is represented by 6 (x, y)-coordinates, starting at the left corner of the eye (as if you were looking at the person) and then working clockwise around the eye. It checks 20 consecutive frames, and if the eye aspect ratio is less than 0.25, alert is generated (Fig. 8). The formula for eye aspect ratio is noted below: EAR =
|| p2 − p6 || + || p3 − p5 || 2|| p1 − p4 ||
7 Result and Discussion The very first sign of drowsiness is drooping eyelids, yawning repeatedly or rubbing eyes, blurry vision and nodding head. Drifting from lane, tailgating, feeling restless and irritable are yet some more indication of drowsiness. Our prototype will try to detect the drooping of eyelids for more than an eye aspect ratio which is calculated using the given formula. We have tested our prototype on various people with and
680
U. Bhatia et al.
Fig. 9 Two people eye detection
Fig. 10 Close eye detection alert with spectacle (Image 2)
without wearing spectacles. And it is found that it can detect about 80% of drowsiness state. At the same time, it has also some limitations like not able to detect the region of eyes when the face is tilted sidewise (Figs. 9, 10, 11 and 12).
8 Conclusion The motor accident is rampant these days, and there is a plethora of cases across the world in which researchers had worked on. The majority of the accident occurred are due to human error rather than a mechanical error as per the study. We often hear of drunk driving, over speeding, no seat belt, poor road conditions, bad weather and
Drowsiness Image Detection Using Computer Vision
681
Fig. 11 Close eye detection (Image 3)
Fig. 12 Close eye detection (Image 4)
mechanical failures. But one of the major and yet often unrecognized human errors is driving in a state of drowsiness. The proposed prototype functions by analyzing the captured real-time video image of the driver. It warns the drivers if he/she is in the state of getting drowsy, thereby bringing back to full consciousness and preventing from deadly accidents. Lives of the drivers and passengers could be saved from unforeseen disasters with the help of this little system built in vehicles.
682
U. Bhatia et al.
References 1. Ramzan, M., Khan, H.U., Awan, S.M., Ismail, A., Ilyas, M., Mahmood, A.: A survey on state-of-the-art drowsiness detection techniques. IEEE Access 7, 61904–61919 (2019) 2. https://data.gov.in/dataset-group-name/road-accidents. 3. Kaur, H.: Driver drowsiness detection system using image processing. Int. J. Adv. Comput. Theory Eng. (IJACTE) 4(5), 2319–2526 (2015) 4. Sharma, N., Banga, V.K.: Drowsiness warning system using artificial intelligence. Int. J. Soc., Behav., Educ., Econ., Bus. and Ind. Eng. 4(7), 1771–1773 (2010) 5. Singh, M., Kaur, G.: Drowsy detection on eye blink duration using algorithm. Int. J. Emerg. Technol. Adv. Eng. 2(4), 2250–2459 (2012) 6. Musale, Tejasweeni., Pansambal, B.H.: Real time driver drowsiness detection system using image processing. Int. J. Res. Eng. Appl. Manag. (IJREAM), 2(8) (2016), 2494-9150 7. Choudhary, P., Sharma, R., Singh, G., Das, S., Dhengre, S.G.: A survey paper on drowsiness detection & alarm system for drivers. Int. Res. J. Eng. Technol. (IRJET) 3(12) (2016). 23950072 8. Al-Anizy, G.J., Razooq, M., Nordin, M.:. Automatic driver drowsiness detection using Haar algorithm and support vector machine techniques. Asian J. Appl. Sci. 8(2), 149–157 (2015) 9. Akrout, B., Mahdi, W.: A visual based approach for drowsiness detection. In: IEEE Intelligent Vehicles Symposium, IEEE, pp. 1324–1329 (2013) 10. Alshaqaqi, B., Baquhaizel, A.S., Ouis M.E.A., Boumehed, M., Ouamri, A., Keche, M.: Driver drowsiness detection system. In: International Workshop on Systems Signal Processing and their Applications, pp. 151–155, IEEE(2013) 11. Assari, M.A., Rahmati, M.:. Driver drowsiness detection using face expression recognition. In: IEEE International Conference on Signal and Image Processing Applications, pp. 337–341. IEEE (2011). 12. Fuletra, J.D., Bosamiya, D.: A survey on driver’s drowsiness detection techniques. Int. J. Recent Innov. Trends Comput. Commun. 1(11), 816–819 (2013) 13. Danisman, T., Bilasco, I.M., Djeraba, C., Ihaddadene, N.: Drowsy driver detection system using eye blink patterns, pp. 230–233. IEEE 14. Devi, M.S., Bajaj, P.R.: Driver Fatigue detection based on eye tracking. In: First International Conference on Emerging Trends in Engineering and Technology, pp. 7–12. IEEE (2017). 15. Saini, V., Saini, R.: Driver drowsiness detection system and techniques: a review. Int. J. Comput. Sci. Inf. Technol. 5(3), 4245–4249 (2014) 16. Flores, M.J., Armingo, J.M.: Real-time warning system for driver drowsiness detection using visual information. J. Intell. Robot. Syst., pp 103–125 (2010) 17. García, Bronte, S., Bergasa, L.M., Almazán, J., Yebes, J.: Vision-based drowsiness detector for Real Driving Conditions. In: Intelligent Vehicles Symposium, pp. 887–894. IEEE (2012) 18. Churiwala, K., Lopes, R., Shah, A., Shah, N.: Drowsiness detection based on eye movement, Yawn detection and head rotation. Int. J. Appl. Inf. Syst. (IJAIS) 2(6) (2012), 2249-0868 19. Kumar, M., Mishra, S.K., Choubey, S., Tripathi, S.S., Choubey, D.K., Dash, D.: Cat Swarm optimization based functional link multilayer perceptron for suppression of Gaussian and impulse noise from computed tomography images. Curr. Med. Imaging Rev., Bentham Sci. 16(4), 329–339 (2020) 20. Kumar, M., Jangir, S.K., Mishra, S.K., Choubey, S.K., Choubey, D.K.: Multi-channel FLANN adaptive filter for speckle & impulse noise elimination from color Doppler ultrasound images. In: International Conference on Emerging Trends in Communication, Control and Computing 2020 (ICONC3 2020), IEEE Xplorer Digital Library, pp. 1–4 (2020). 21. Choubey, D.K., Kumar, M., Shukla, V., Tripathi, S., Dhandhania, V.K.: Comparative analysis of classification methods with PCA and LDA for diabetes. Curr. Diabetes Rev., Bentham Science 16(1), 1–18 (2020) 22. Choubey, D.K., Kumar, P., Tripathi, S., Kumar, S.: Performance evaluation of classification methods with PCA and PSO for diabetes. Netw. Model. Anal. Health Inf. Bioinf. 9(1), 1–30 (2019)
Drowsiness Image Detection Using Computer Vision
683
23. Choubey, D.K., Paul, S., Dhandhania, V.K.: Rule based diagnosis system for diabetes. biomedical research. Allied Acad. 28(12), 5196–5209, 2017 (2017). 24. Choubey, D.K., Paul, S.: GA_SVM-A classification system for diagnosis of diabetes. Handbook of Research on Nature Inspired Soft Computing and Algorithms, pp. 359–397. IGI Global (2017) 25. Choubey, D.K., Paul, S., Dhandhania, V.K.: GA_NN: an intelligent classification system for diabetes. In: Springer Proceedings AISC Series, 7th International Conference on Soft Computing for Problem Solving-SocProS 2017, Indian Institute of Technology, Bhubaneswar, India, (December 23–24, 2017), Chapter 2, Soft Computing for Problem Solving, Advances in Intelligent Systems and Computing 817, vol. 2, pp. 11–23, Springer (2019). 26. Choubey, D.K., Paul, S.: Classification techniques for diagnosis of diabetes: a review. Int. J. Biomed. Eng. Technol. (IJBET), Inderscience 21(1), 15–39 (2016) 27. Choubey, D.K., Paul, S., Sandilya, S., Dhandhania, V.K.: Implementation and analysis of classification algorithms for diabetes. Curr. Med. Imaging Rev., Bentham Sci. 16(4), 340–354 (2020) 28. Choubey, D.K., Paul, S.: GA_MLP NN: A hybrid intelligent system for diabetes disease diagnosis. Int. J. Intell. Syst. Appl. (IJISA), MECS, 8(1), 49–59 (2016). 29. Choubey, D.K., Paul, S.: GA_RBF NN: A classification system for diabetes. Int. J. Biomed. Eng. Technol. (IJBET), Inderscience, 23(1), 71–93 (2017). 30. Choubey, D.K., Tripathi, S., Kumar, P., Shukla, V., Dhandhania, V.K.: Classification of diabetes by Kernel based SVM with PSO. Recent Pat. Comput. Sci., Bentham Sci. 12(1), 1–14 (2019) 31. https://dlib.net/face_landmark_detection.py.html
Implementing Deep Learning Algorithm on Physicochemical Properties of Proteins Charu Kathuria , Deepti Mehrotra , and Navnit Kumar Misra
Abstract In recent years, the increase of complex protein data reveals the emergence of deep learning in the data mining field. The number of research has shown it as one of the powerful tools that transform these big data to valuable information or knowledge. The prediction of structure of proteins contributes to its functionality which can be used for drug discovery, medicine design and other important areas. The amino acids physicochemical properties determine the protein structure quality, which can further identify the difference between native and predicted proteins. In this paper, the dataset considered has nine physicochemical properties, and they are used to determine the root mean square deviation (RMSD). A deep learning model is applied to efficiently implement the model, and the performance of the model is evaluated on the basis of root mean-squared error (RMSE) and the value of R-squared (R2 ) which is 3.71 and 0.6327, respectively. Keywords Deep learning · Physicochemical properties · Root mean-squared deviation · RMSD
1 Introduction Deep learning is among the fastest-growing machine learning subfield based on the concepts of artificial neural network. It emphasizes the need of multiple hidden layers to model the large-scale input features into corresponding output. The purpose of deep neural networks is to extract abstract features using computational operations. Nowadays, massive data growth has led deep learning to be one of the most promising artificial intelligence tools in various applications like software vulnerability [1], intrusion detection [2], traffic signal processing [3], stock market analysis [4], big data [5] and disease classification [6]. C. Kathuria · D. Mehrotra (B) Amity University Uttar Pradesh, Noida, Uttar Pradesh 201313, India N. K. Misra Department of Physics, Brahmanand College, The Mall, Kanpur 208004, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_56
685
686
C. Kathuria et al.
Apart from the mentioned applications, deep learning plays a pioneered role in various fields, in which the prediction of protein structure holds a unique position. Matt Spencer et al. in 2014 described the architecture of deep learning network which predicts the secondary structure of proteins named as DNSS and achieved 80.7% accuracy [7]. Rhys Heffernan et al. in 2015 studied an iterative feature which includes dihedrals and backbone angles with solvent accessible Cα atoms with deep learning neural network and achieved an accuracy of 82% for predicting secondary structure [8]. Sheng Wang et al. in 2016 predicted protein structure with deep convolutional neural fields (DeepCNF) using the protein sequences as input features [9]. Yangxu Wang et al. in 2017 created a deep recurrent architecture for predicting protein structure from the sequence of amino acids [10]. Senior AW et al. in 2020 created an Alphafold system with deep learning concept to predict the protein structure even from fewer homologous sequences [11]. In living organisms, protein forms an essential molecule. From past decades, large volume of proteins dataset has evolved, and many statistical or machine learning models are built for determining protein structures. The research in this area is significant as all the biological functions of proteins depend on its defined structure. The prediction of tertiary structure of proteins contributes to its functionality which can be used for drug discovery, medicine design, etc. The amino acids physicochemical properties along with the solvent environment lead to the protein sequence folding in the respective tertiary structure. The biological actions and quality of protein structure rely on these physicochemical properties, which is rigorously used in categorizing predicted structures from the native protein structures. In this paper, deep learning approach is applied on physicochemical properties of proteins to determine the root mean-squared deviation (RMSD). Further Sect. 2 describes the background, followed by the applied model and feature description in Sect. 3, experiment details and results are elaborated in Sect. 4, and last Sect. 5 presents the conclusion.
2 Background In this section, the background of the research work can be categorized into two portions. First part describes the importance of physicochemical properties in structure of proteins through number of research articles. Second part shows the formation of various deep learning models, to predict the structure of proteins with high accuracy.
2.1 Related Work of Physicochemical Properties Piyali Chatterjee et al. in 2010 used multilayer feed forward network in two levels to predict the protein structure. In first level, physicochemical properties along with
Implementing Deep Learning Algorithm on Physicochemical …
687
PSI-BLAST sequence profiles are used for predicting sequence to structure leading to second-level structure to structure predictions. The dataset used is nrDSSP, and overall accuracy of the model reached to 75.58–77.48% [12]. Avinash Mishra et al. in 2013 formed a scoring function from physicochemical properties which is able to capture native like structures with 93% accuracy [13]. Yadunath Pathak et al. in 2016 predicted root mean-squared deviation (RMSD) using machine learning algorithms from different physicochemical properties [14]. Ashwini M. Jani and Kalpit R. Chandpa in 2015 created an SVM model to classify RMSD using CASP dataset [15]. Prashant Singh Rana et al. in 2015 explored different machine learning algorithms on various physicochemical properties to predict the measures of modeled protein structure for quality assessment [16]. Mohammad Saber Iraji and Hakimeh Ameri in 2016 also predicted RMSD values which were near to native values using physiochemical properties with adaptive neuro fuzzy inference system (AFNIS) [17]. Amanpreet Kaur and Baljit Singh Khehra in 2017 estimate the quality of predicted structure of proteins using physicochemical properties in the absence of native structures [18]. Shubham Vishnoi in 2020 constructed a software tool to calculate proteins physicochemical descriptors [19].
2.2 Deep Learning in Proteins Domain Deep learning methods have been widely used in many applications and theories. In recent years, deep learning models have got a lot of attention in bioinformatics. As the model has good prospect and potential in huge bio-data processing, many research scholars have explored deep learning in protein structure prediction field. Table 1 shows the recent developments of deep learning in prediction of protein structures.
3 Materials and Methods The implementation of the model requires dataset with relevant input features as well as output targets. This section describes the dataset used and the methodology applied for the development of the model.
3.1 Dataset Description As mentioned above in this paper, protein tertiary structure physicochemical properties are considered. The dataset consists of 45,730 decoys, and the dataset is created from CASP 5 to CASP 9 experiments. For the present work, it is retrieved from the
688
C. Kathuria et al.
Table 1 Deep learning models in predicting protein structures Author
Year
Dataset
Description
Andrew W. Senior et al.
2020
PDB, CATH, Uniclust and PSI-BLAST
Alphafold system is created which determines protein structures with high accuracy [11]
Iddo Drori et al.
2019
3D coordinates, torsion angles, Q8 secondary structures, etc.
Torsion angles and backbone atom distance matrices are predicted using deep learning and embedding models [20]
Michael Schantz Klausen et al.
2019
PDB
An extended tool NetSurfP-2.0 is presented which can predict structural disorders, secondary structure, backbone dihedral angles and solvent accessibility [21]
Manaz Kaleel et al.
2019
PDB
The RSA predictor, PaleAle 5.0 is described which is a significant improvement over previous models [22]
Mu Gao et al.
2019
PISCES, CASP 12
DESTINI, a computational approach of deep learning is applied for protein structure prediction [23]
Jingxue Wang et al.
2018
PDB, OPM
Deep learning neural network model is created for computational protein design [24]
Yangxu Wang et al.
2017
CullPDB and CB513
A deep recurrent architecture (SSREDN) is created to predict protein structure from the amino acid sequences [10]
Sheng Wang et al.
2016
CullPDB, CB513, CASP Protein secondary structure is 10, CASP 11 and predicted with DeepCNF CAMEO using the protein sequences as input features [9]
Rhys Heffernan et al.
2015
PISCES sever, TR4590 and TS1199
Matt Spencer et al.
2014
PDB, CASP 9 and CASP DNSS model is created which 10 achieved 80.7% accuracy [7]
An iterative feature is presented which includes dihedrals and backbone angles with solvent accessible Cα atoms [8]
Implementing Deep Learning Algorithm on Physicochemical … Table 2 Dataset description
Table 3 Input features
Features
689 Description
Dataset format
CSV file sheet
Number of attributes
9
Number of instances
45,730
Type of problem
Regression
Feature name
Feature description
F1
Total surface area
F2
Non-polar exposed area
F3
Fractional area of exposed non polar residue
F4
Fractional area of exposed non polar part of residue
F5
Molecular mass weighted exposed area
F6
Average deviation from standard exposed area of residue
F7
Euclidian distance
F8
Secondary structure penalty
F9
Spatial distribution constraints
UCI machine learning repository [25]. Table 2 elaborates the description of dataset used for implementation. The nine physicochemical properties considered as input in the dataset are F1– F9, and their details are shown in Table 3. The RMSD parameter, which is the output, determines the deviation of native structure from unknown structures as they measure similarity between two protein structures. It is calculated by matching Cα pairs between two protein sequences.
3.2 Applied Methodology In this paper, a deep learning approach is applied on the set of input features to predict the RMSD. Overview of Deep Learning. With the advent of huge dataset, deep learning has spread its role in many applications in which, as discussed above, bioinformatics plays a pioneer role. The multiple layers available in deep learning architecture are connected in such a way that they transform inputs into amenable features which predict the respective outputs. As compared to other machine learning models, it has many multilayer operational elements in its hidden layers that are capable of finding advanced features from the raw data which is highly complex in nature.
690
C. Kathuria et al.
A huge number of deep learning architectures are available in literature [26, 27]. These architectures are used in different application areas. The key point in performance of all the deep learning-based models is that it requires a huge number of training samples. The enhancement in this performance can further be achieved by increasing number of layers and adjusting parameters. Activation function in the architecture plays an important role as it computes a weighted sum of biases and inputs which finally decide to fire the neuron or not [28]. It is also called as transfer function in neural network. Various activation functions are available, and the function used in it controls the output in different domains. A sigmoid function takes the input from the front layer and transforms the variables into values ranging from 0 to 1 using function g(n) = 1/(1 + e−n ). To improve sigmoid, a zero-centered function is used named as tanh function which ranges the output between −1 and 1 given by the function g(n) = (en − e−n )/(en + e−n ). Softmax function also results in range of 0–1 like sigmoid, but it is used to evaluate probability distribution, whereas sigmoid was used for Bernoulli distribution. With some limitations in tanh function, research leads to rectified linear unit (ReLU) activation function. It is one of the most widely used functions in deep learning models and is defined as g(n) = max (0, n). Softplus function came as a smooth version of ReLU function for more improvements with the relationship as g(n) = log (1 + en ). Maxout function is defined as gj (n) = max (bj + wj . n) as it applies nonlinearity in data using dot product with weight. Applied Model. The deep learning multiple layer schematic diagram along with the concept used in the paper is shown in Fig. 1. In this paper, physicochemical properties are considered as input along with the RMSD value. The features are preprocessed before using the deep layer learning model. In this, various deep learning hidden layers are used to train the model. The dataset is split into training and test data, hence training data is used to train the model, whereas test data is used to validate the model. The basic idea described in the deep layer architecture is that initially input layer activates the model, then each neuron in the hidden layer evaluates the inputs with activation function and spreads the data to another hidden layer (number depends on requirement), and finally the weighted connections optimize the results in output layer.
Fig. 1 Applied model
Implementing Deep Learning Algorithm on Physicochemical …
691
4 Experimental Results The deep learning model is implemented using Keras, which is one of the high-level application programming interfaces that runs on TensorFlow platform. It is an opensource library of neural network which is written in Python. A sequential model of Keras is considered which includes multiple hidden dense layers. The ReLu activation function is considered in initial layers, and the linear activation function is finally used in the last layer. The model is compiled using the compile function, and then fit function is used to evaluate the model. The model gives the result in terms of loss (Mean_squared_error) and metric (Mean_absolute_error) parameter. Mean_squared_error (MSE) is used to measure the estimator quality as it is calculated using target and predicted values, and its value near to zero is considered better. Mean_absolute_error (MAE) evaluates the average magnitude of errors in a prediction set. Figures 2 and 3 show the training and validation MSE and MAE, respectively, for the deep learning model implemented. Fig. 2 MSE for training and testing data
Fig. 3 MAE for training and testing data
692
C. Kathuria et al.
The model is a regression model so the measures which determine the efficiency of the model are root mean-squared error (RMSE) and R-squared (R2 ). RMSE evaluates the average error of the model, whereas R2 can be defined as proportion of outcome variation. The above model results in RMSE value of 3.71 and R2 value as 0.6327.
5 Conclusion To determine the quality of protein structure, various chemical and physical properties are considered. In this paper, physicochemical properties are used to predict the RMSD value, which determines the quality of protein tertiary structures. The dataset is collected from the UCI machine learning repository. The deep learning concept applied here results in efficient performance. The model is evaluated on R-squared and root mean-squared error. As deep learning model efficiently implements the model, in future, different architectures of deep learning can be applied on other physical and chemical properties of proteins.
References 1. Singh, S.K., Chaturvedi, A.: Applying deep learning for discovery and analysis of software vulnerabilities: a brief survey. In: Soft Computing: Theories and Applications, pp. 649–658 (2020). 2. Karthi, R.: Development of intrusion detection system using deep learning for classifying attacks in power systems. In: Soft Computing: Theories and Applications, pp. 755–766. Springer, Singapore (2020). 3. Bordia, B., Nishanth, N., Patel, S., Kumar, M.A., Rudra, B.: Automated traffic light signal violation detection system using convolutional neural network. In: Soft Computing: Theories and Applications, pp. 579–592. Springer, Singapore (2020). 4. Barai, A.K., Jain, P., Kumar, T.: NSE Stock prediction: the deep learning way. In: Soft Computing: Theories and Applications, pp. 783–791. Springer, Singapore (2020). 5. Gheisari, M., Wang, G., Bhuiyan, M.Z.A.: A survey on deep learning in big data. In: 2017 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC), vol. 2, pp. 173–180. IEEE (2017). 6. Pathak, K. C. and Kundaram, S. S. Accuracy-Based Performance Analysis of Alzheimer’s Disease Classification Using Deep Convolution Neural Network. In Soft Computing: Theories and Applications (pp. 731–744). Springer, Singapore. (2020). 7. Spencer, M., Eickholt, J., Cheng, J.: A deep learning network approach to ab initio protein secondary structure prediction. IEEE/ACM Trans. Comput. Biol. Bioinf. 12(1), 103–112 (2014) 8. Heffernan, R., Paliwal, K., Lyons, J., Dehzangi, A., Sharma, A., Wang, J., Sattar, A., Yang, Y., Zhou, Y.: Improving prediction of secondary structure, local backbone angles and solvent accessible surface area of proteins by iterative deep learning. Sci. Rep. 5(1), 1–11 (2015) 9. Wang, S., Peng, J., Ma, J., Xu, J.: Protein secondary structure prediction using deep convolutional neural fields. Sci. Rep. 6(1), 1–11 (2016) 10. Wang, Y., Mao, H., Yi, Z.: Protein secondary structure prediction by using deep learning method. Knowl.-Based Syst. 118, 115–123 (2017)
Implementing Deep Learning Algorithm on Physicochemical …
693
11. Senior, A.W., Evans, R., Jumper, J., Kirkpatrick, J., Sifre, L., Green, T., Qin, C., Žídek, A., Nelson, A.W., Bridgland, A., Penedones, H.: Improved protein structure prediction using potentials from deep learning. Nature 577(7792), 706–710 (2020) 12. Chatterjee, P., Basu, S., Nasipuri, M.: Improving prediction of protein secondary structure using physicochemical properties of amino acids. In: Proceedings of the International Symposium on Biocomputing, pp. 1–8 (2010). 13. Mishra, A., Rao, S., Mittal, A., Jayaram, B.: Capturing native/native like structures with a physico-chemical metric (pcSM) in protein folding. Biochimica et Biophysica Acta (BBA)Proteins and Proteomics 1834(8), 1520–1531 (2013). 14. Pathak, Y., Rana, P.S., Singh, P.K., Saraswat, M.: Protein structure prediction (RMSD≤ 5 Å) using machine learning models. Int. J. Data Min. Bioinf. 14(1), 71–85 (2016) 15. Jani, A.M., Chandpa, K.R.: Protein tertiary structure classification based on its physicochemical property using neural network and KPCA-SVM: a comparative study. Int. J. Appl. Sci. Eng. 3(1), 1–11 (2015) 16. Rana, P.S., Sharma, H., Bhattacharya, M., Shukla, A.: Quality assessment of modeled protein structure using physicochemical properties. J. Bioinf. Comput. Biol. 13(02), 1550005 (2015) 17. Iraji, M.S., Ameri, H.: RMSD protein tertiary structure prediction with soft computing. IJ Math. Sci. Comput. 2, 24–33 (2016) 18. Kaur, E.A., Khehra, B.S.: Quality assessment of modelled protein structure using backpropagation and radial basis function algorithm. Int. J. Sci. Res. Manag. 5(7), 6019–6033 (2017) 19. Vishnoi, S., Garg, P., Arora, P.: Physicochemical n-grams tool: a tool for protein physicochemical descriptor generation via Chou’s 5-step rule. Chem. Biol. Drug Des. 95(1), 79–86 (2020) 20. Drori, I., Thaker, D., Srivatsa, A., Jeong, D., Wang, Y., Nan, L., Wu, F., Leggas, D., Lei, J., Lu, W., Fu, W.: Accurate protein structure prediction by embeddings and deep learning representations. arXiv preprint arXiv:1911.05531 (2019). 21. Klausen, M.S., Jespersen, M.C., Nielsen, H., Jensen, K.K., Jurtz, V.I., Sønderby, C.K., Sommer, M.O.A., Winther, O., Nielsen, M., Petersen, B., Marcatili, P.: NetSurfP-2.0: improved prediction of protein structural features by integrated deep learning. Proteins: Struct., Funct., Bioinf., 87(6), 520–527 (2019). 22. Kaleel, M., Torrisi, M., Mooney, C., Pollastri, G.: PaleAle 5.0: prediction of protein relative solvent accessibility by deep learning. Amino Acids 51(9), 1289–1296 (2019). 23. Gao, M., Zhou, H., Skolnick, J.: DESTINI: A deep-learning approach to contact-driven protein structure prediction. Sci. Rep. 9(1), 1–13 (2019) 24. Wang, J., Cao, H., Zhang, J.Z., Qi, Y.: Computational protein design with deep learning neural networks. Sci. Rep. 8(1), 1–9 (2018) 25. Dua, D., Graff, C.: UCI machine learning repository, 2017. https://archive.ics.uci.edu/ml 37 (2019). 26. Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015) 27. Dargan, S., Kumar, M., Ayyagari, M.R., Kumar, G.: A survey of deep learning and its applications: a new paradigm to machine learning. Arch. Comput. Methods Eng., pp. 1–22 (2019). 28. Nwankpa, C., Ijomah, W., Gachagan, A., Marshall, S.: Activation functions: comparison of trends in practice and research for deep learning. arXiv preprint arXiv:1811.03378 (2018).
Locking Paradigm in Hierarchical Structure Environment Swati, Shalini Bhaskar Bajaj, and Vivek Jaglan
Abstract The paper presents a multi-version granular structure, an extension of the multiple granularity locking mechanisms. The existing hierarchical locking carries out its operation at an either coarse or fine granular level that does not support enhanced concurrency on different locking modes. The core problem lies in the locking waiting time that a read or a write operation undergoes when the conflict occurs for the same set of resources in the given hierarchical structure. To overcome such limitations, we proposed multi-versioning to support simultaneous execution of share and update transactions by providing a suitable version at each requesting mode. We further cover a detailed description of different cases that exhibit our work by converting some of the non-compatible modes into the compatible mode of operations. Keywords Synchronization · Compatibility · Locking · Hierarchical structure · Consistency
1 Introduction The multi-granularity locking protocol is a concept that has being originated from the database management systems [1, 2] that supports thread synchronizations. The lock granularity [3, 4] in the hierarchical structures defines the size of data items being that is being guarded by a lock for achieving consistency. In the fine-grained lock structure, enhanced concurrency is achieved, but with increase concurrency, it leads to the increase in locking overhead for each lock requests and release. On the other Swati (B) · S. B. Bajaj Computer Science Engineering Department, Amity University, Pachgaon, Haryana, India S. B. Bajaj e-mail: [email protected] V. Jaglan Computer Science Engineering Department, Graphic Era Hill University, Dehradun, Uthrakand, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_57
695
696
Swati et al.
hand, coarse-grained lock structure introduces less lock overhead, but lessens the degree of concurrency. A requirement of optimal locking protocol is required which can allow users to achieve a balance between the lock overhead and concurrency such that optimal performance can be achieved. An important measure carried out to implement the locking is to check the region of a hierarchy such that they are locked by different transactions in non-conflicting modes. The issue was first identified by Gray [5] that proposed intention locking scheme. According to this scheme, the ancestors of the node including root are locked in intension mode and then requested node is locked in share or in exclusive manner. This becomes valuable in identifying overlapping regions that can exist in the sub-graphs and are being requested by multiple transactions. The intension mode thereby helps the other concurrent threads to perform a check on the compatibility while trying to acquire the lock in the hierarchical structure. Unfortunately, the existing intension lock techniques do not exploit the opportunities while locking a group of nodes in the various locking modes. As discussed in [5, 6], and [7], the multiple granularity only focus on reducing the traversal cost but none of them have tried to improvise the concurrency of the transactions that are being executed in the hierarchical framework. While multi-version locking protocol [8–10] enhances the concurrency of the system, it allows the concurrent read–write operations to execute simultaneously in the conflicting environment. Basically, in case of read–write conflict, unlike multiple granularity, our proposed work avoids synchronous waiting for the requesting resource reclamation by utilizing the other existing versions of the given data item [11–14]. We have used the timestamp ordering to obtain a consistent snapshot of the data item such that the requesting thread always chooses the correct version of the data item using commit timestamps of the given version. We therefore proposed a new locking paradigm, by combining the features of multi-version locking protocol in multiple granularity that extend to support increased concurrency. The basic principle involves the separation of the effects of commutatively (which relates to serializability) and compatibility (which relates to deadlock freedom) on the data manipulation operations that lead to improve the existing compatibility matrix of multiple granularity.
2 Literature Review Some of the important work that are being carried out in the field of multiple granularity are discussed below (Table 1).
Locking Paradigm in Hierarchical Structure Environment
697
Table 1 Literature review of MGL locking protocol S. No.
Author and year
Technique
Cons
1
Kalikar and Nasre [5]
The dominator relationship has being specified that reduces the locking cost in the hierarchical structure
The concurrency of the system decreases as it leads to the unnecessary lock on the child nodes
2
Kalikar and Nasre [6]
In order to quickly identify In the given framework, the the overlapping regions concurrency of the system is between the two requesting not enhanced transactions, some greedy approach is being followed that laid down the importance of reducing the traversal cost
3
Jelena et al. [15]
Proposed a generic lock service(GLS) that is a multi-granularity middleware that supports Traditional lock interface, by using a special value that is being overloaded in mutex locks for any static initialization
4
Yang et al. [16]
The CTrace a Optimization of the MGL multi-granularity system is framework is not taken into being discussed that allow the consideration users to explore the process traces for conceptual abstraction that supports granularity at different levels
5
Ganesh and Nasre [12]
A novel indexing technique has being discussed that quickly identify the overlap region in the hierarchical structure
MGL Concurrency factor is not considered
The compatibility matrix of MGL is not improved
3 Analysis on Design Cases of Improved Locking Compatibility for the Proposed Work The lock compatibility between the two locking modes specifies that the locks can be acquired concurrently on the given data item by multiple transactions; i.e. the compatibility matrix reduces the burden of serialization of locks by supporting an increased level of parallelism [2]. We analyze how our approach handle conflicts by providing an improvement on existing compatibility matrix of multiple granularity protocol as shown in Table 2. We provide the Justification cases in detail for each improved scenario by considering the new hierarchical structure as shown in Fig. 1.
698
Swati et al.
Table 2 Proposed improved locking compatibility Lock mode
IS
IX
S
SIX
X
IS
Y
Y
Y
Y
Y*
IX
Y
Y
Y*
N
N
S
Y
Y*
Y
Y*
Y*
SIX
Y
N
Y*
N
N
X
Y*
N
Y*
N
N
Fig. 1 Proposed hierarchical structure
The detailed discussion of the cases is as follows: Case I: Improved Compatibility between IS and X Let us consider the two transaction T1, T2 executing in the system. The transaction 1 1 T1 wishes to apply a share lock on data item r345 we apply a share lock on r345 and 1 ). Next we have transaction T2 that intension share lock to all its ancestors (F31 , P34 wish to apply an exclusive lock on the data item ‘F31 ’. As per our proposed locking paradigm, the two locking modes, namely IS and X on the same data item F31 , are 1 ’ while write on ‘F31 ’ is supported by providing T1 to read the prior version of ‘r345 in progress by another transaction T2. The summarized operation is shown in Table 3. Case II: Improved Compatibility between IX and S Next we consider the case which comprises of two transaction T1, T2 running in 2 the system. The transaction T1 is requesting an exclusive mode on the data item r346 1 while transaction T2 requests for a file F3 in share mode. The operation proceed by 2 and intension exclusive lock to all its ancestors applying an exlusive lock on r346 1 1 (P34 , F3 ) by the transaction T1 and transaction T2 applies a share lock on file F31
Locking Paradigm in Hierarchical Structure Environment Table 3 Resource request from the transactions
699
Transaction
Node
Lock mode
T1
F31 1 P34 1 r345 F31
IS
T2
IS S X
and intension share to its parent node. The two locking modes, namely intension exclusive (IX) and shared(S), are supported by the two transaction on the common 2 and T2 is allowed to read an older data item F31 if T1 is allowed to write on r346 1 version of F3 such that compatability between IX and S is supported as shown in Table 4. Case III: Improved Compatibility between S and IX In the next case, we have two transaction T1, T2 in the system. The transaction T1 5 wishes to apply a share lock on F52 and T2 applies a write lock on r545 .To carry out its 2 operation, T1 locks the file F5 in share mode, and simultaneously, the transaction T2 5 3 and intension exclusive to all its ancestors (P54 ,F52 ) applies an exclusive lock on r545 which in turn supports the compatibility between two locking modes (S-IX) for the same data item F52 as shown in Table 5. Case IV: Improved Compatibility between S and SIX In this case, we have transaction T1, T2 running concurrently in the system. The transaction T1 is requesting a share lock on file F31 and share intension exclusive (SIX) lock on the same data item F31 . The operation is supported by providing a prior version for reading file F31 by the transaction T1 while T2 performs its desired Table 4 Resource request by transaction
Table 5 Resource request by transaction
Transaction
Node
Lock mode
T1
IX
T2
F31 1 P34 2 r346 F31
Transaction
Node
Lock mode
T1
F52
S
T2
F52 3 P54 5 r545
IX
IX X S
3 P54
X
700 Table 6 Resource request by transaction
Swati et al. Transaction
Node
Lock mode
T1
F31 F31 1 P34 1 r345
S
Transaction
Node
Lock mode
T1
S
T2
F52 F52
Transaction
Node
Lock mode
T1
F31
SIX S
T2
1 P34 1 r345 F31
T2
Table 7 Resource request by transaction
Table 8 Resource request by transaction
SIX S X
X
X S
operation thus thereby supporting the compatibility between the two locking modes (S and SIX) by the transaction T1 and T2 as shown in Table 6. Case V: Improved Compatibility between S and X The next case comprises of two transaction T1, T2 that are requesting to lock the common data item F52 in shares and exclusive mode. The operation is being successfully carried out by providing T1 with prior version of F52 for reading the data item while T2 continue to perform write on F52 . Thereby it support the compatability between the two locking modes (S and X) as shown in Table 7 Case VI: Improved Compatibility between SIX and S. The next we consider the two transaction T1, T2 in the system. The transaction T1 requests for the file. F31 in SIX mode and T2 applies a share lock on file F31 (Table 8).
3.1 Results and Discussions We analyze the performance of our proposed work and compare it with existing multiple granularity locking by carrying out our experiments extensively on different number of transactions which varies from 10 to 40. During our work, we obtained three clusters as shown in Figs. 2, 3, 4, and 5 using K-mean clustering approach to
Locking Paradigm in Hierarchical Structure Environment Fig. 2 Cluster performance with 10 transactions
701
5000 4000 3000 2000 1000 0
6.20% 6.00% 5.80% 5.60% 5.40% Cluster Cluster Cluster 3 2 1
Improvement Ratio
execution time
Cluster performance with 10 Transactions
clusters Existing Fig. 3 Cluster performance with 20 transactions
Proposed
Improvement
25000
10.00%
20000
9.00%
15000
8.00%
10000
7.00%
5000
6.00%
0
5.00%
Improvement Ratio
execution time
Cluster performance with 20 Transactions
Cluster Cluster Cluster 3 2 1
clusters Existing
Proposed
Improvement
analyze the performance of our proposed work in the system. The clusters are formed by considering the following criteria: • Clusters obtained for low contentions • Clusters obtained for moderate contentions • Clusters obtained for high contentions.
Swati et al.
execution time
Fig. 4 Cluster performance with 30 transactions
Cluster performance with 30 Transactions 50000 40000 30000 20000 10000 0
13.00% 11.00% 9.00% 7.00% 5.00% Cluster Cluster Cluster 3 2 1
Improvement Ratio
702
clusters
execution time
Fig. 5 Cluster performance with 40 transactions
Proposed
Improvement
Cluster performance with 40 Transactions 50000 40000 30000 20000 10000 0
14.00% 12.00% 10.00% 8.00% 6.00% Cluster Cluster Cluster 3 2 1
Improvement Ratio
Existing
clusters Existing
Proposed
Improvement
4 Conclusion Improving efficiency of the system is one of the most crucial aspects that cover locking performance in the hierarchical framework. The simulation provides an overview of supporting our proposed work by comparing it with existing multiple granularity locking. We support the presence of multiple versions at each granular level that exhibits higher performance. Basically, we combine the flexibility offered by multi-version in the existing hierarchical locking that provides enhanced concurrency and improved compatibility matrix.
Locking Paradigm in Hierarchical Structure Environment
703
References 1. Aries, D., Stonebraker, M.: Locking granularity revisited. ACM Trans. Database Syst. 4(2), 210–227 (1979) 2. Chatterjee, B., Nguyen, N., Tsigas, P.: Efficient lock free binary search trees. In: Proceedings of the 2014 ACM Symposium on Principles of Distributed Computing (PODC’14) ACM, pp. 322–331, New York, NY, (2014) 3. Bhan, U., Chandra, R. The impact of multiple granularity on concurrency control in Multi (2012) 4. User Environment, Journal of Information and Operations Management, 3(1), 289–292 5. Gray, J.N., Lorie, R.A., Putzolu, G.R., Traiger, I.L.: Granularity of locks and degrees of consistency in a shared Database. In Readings. In: Systems, D., Stonebraker, M. (eds.) Morgan Kaufmann, pp. 94–121. CA, San Francisco (1988) 6. Kalikar, S., Nasre, R.: (2017) Dom lock: a new multi-granularity locking technique for hierarchies. ACM Transactions on Parallel Computing 4(2), 1–29 (2018) 7. Ganesh, K., Saurabh, K., Rupesh, N.: Multi-granularity locking in hierarchies with synergistic hierarchical and fine-grained locks. In: 24th International Conference on Parallel and Distributed Computing Turin, Italy, August 27–31, 2018 Proceedings, Springer International Publishing AG, part of Springer Nature, pp. 546–559,(2018) 8. Desai, N., Mueller, F.: Scalable distributed concurrency services for hierarchical locking. In: Proceedings of the IEEE 23rd International Conference on Distributed Computing Systems. 64(6), 708–724, June (2004). 9. Kalikar, S., Nasre, R.: NumLock: towards optimal multi-granularity locking in hierarchies. In: Proceedings of the 47th International Conference on Parallel Processing, August 13–16, USA (2018) 10. Priyanka, K., Peri, S., Vidyasankar, K., Chatterjee, M., Cao, J., Kothapalli, K., Rajsbaum, S.A.: Timestamp based multi version STM algorithm. In: Proceeding of the Distributed Computing and Networking. ICDCN 2014, Vol. 8314 (2014) 11. Yang, B., Kent, K.B., Aubanel, E., MacKay, S., Agila, T.: A multi-granularity locking scheme for java packed objects based on a concurrent multiway tree. In: Concurrency and Computation Practice and Experience Wiley Online library, October (2018) 12. Priyanka, K., Sathya, P., Vidyasankar, K.: A timestamp based multi-version STM algorithm. In: International Conference on Distributed Conference on Distributed Computing and Networking, pp 212–226 (2014) 13. Neumann, T.: Tobias mühlbauer and alfons kemper fast serializable multi-version concurrency control for main-memory database systems. ACM SIGMOD’15, May 31–June 4. Melbourne, Victoria, Australia (2015) 14. Dashti, M., John, S.B., Shaikhha, A., Koch, C.: Transaction repair for multi-version concurrency control. In: proceedings of the ACM International Conference on Management of Data, Pages 235–250, USA, May 14–19 ( 2017) 15. Kuznetsov, P., Ravi, S.: On the cost of concur ency in transactional memory. Proceedings of the International Conference on Principles of Distributed Systems 7109(1), 112–127 (2011) 16. Jelena, A., Georgios, C., Rachid, G., Vasileios, T.: Locking made easy. ACM, ISBN 978-4503 8/16/12. December 12–16, Trento, Italy (2016) 17. Faleiro, J., Abadi, D.: Rethinking serializable multiversion concurrency control. In: Proceeding of the 41st International Conference on Very Large Data Bases. 8(11), 1990–1201, Kohala Coast, Hawaii, August 31st September 4th (2015) 18. Lim, H., Mellon, C., Kaminsky, M., Andersen, D.G.: Cicada: dependably fast multi-core inmemory transactions. ACM SIGMOD ’17, May 14–19. IL, USA, Chicago (2017)
Ensemble Maximum Likelihood Estimation Based Logistic MinMaxScaler Binary PSO for Feature Selection Hera Shaheen, Shikha Agarwal, and Prabhat Ranjan
Abstract High-dimensional data is a situation when there is a large number of dimensions compared to the number of samples. It becomes difficult to analyze the data and may give the wrong result of the prediction. Dimension reduction in used to solve this problem. Feature selection is one of the ways to apply dimension reduction. Feature selection is the method of selecting those features which help in prediction most or gives the desired output. Its performance is lacking in high-dimensional data and complex problems. It is found that most of the time, authors have given inertia weight as a solution and tried to set its value using various methods. But the answer still lacks the optimal value of parameters, hence learning-induced procedures are required to update them. Therefore, a method is proposed for the optimization of parameters of PSO for optimum feature selection. The proposed solution is given to solve these two objectives. First, to handle feature selection using PSO and get the PSO’s optimal parameters’ optimal values. The three main parameters are inertia weight, cognitive, and social constants. Ensemble maximum likelihood estimation based MinMaxScaler binary PSO is proposed to optimize the parameters of PSO for feature selection. In this, the Sigmoid function of binary PSO is replaced with MinMaxScaler function (Shaheen et al. in First international conference on sustainable technologies for computational intelligence. Springer, Singapore, pp 705–716, 2020 [1]), and parameter optimization is done using maximum likelihood estimation. The results of experiments performed on the dataset showed the comparable classification accuracy. Keywords Optimization · Particle swarm optimization · Particle swarm optimization for feature selection · Parameter optimization
H. Shaheen (B) Department of MCA, Patna Women’s College, Patna, India e-mail: [email protected] S. Agarwal · P. Ranjan Department of Computer Science, Central University of South Bihar, Gaya, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_58
705
706
H. Shaheen et al.
1 Introduction to Particle Swarm Optimization Particle swarm optimization is a nature-inspired optimization technique [2–4]. It is inspired by biological systems where sharing of knowledge among the candidates produces a global high-quality solution. It is an evolutionary computation method in which work is done iteratively to have a final solution. Russell Eberhart and James Kennedy first invented PSO in 1995. Here, birds or other social animals’ movements around food sources to reach minimum optimal time are observed. Each bird tries to move closer to the bird, which is closer to the food sources. It takes all the advantages of the flocking of species. It is important to note that a large flock may have some disadvantages. When more birds form a flock, it creates more wings and more mouths and more noise. So, predators can find the flock, which is a threat for the birds. When the flock is large, it will also require more food, which creates competition among them. So, some weaker birds may die because of that. These are the disadvantages of bird flocking. PSO does not simulate these disadvantages of bird flocking. During the search process, dying of any individual is not permitted. In the genetic algorithm, weaker individuals die out in each iteration, leaving only the fittest one. In PSO, a potential solution is improved by swarm cooperation, whereas an evolutionary algorithm improvement is made because of competition. It makes swarm intelligence different from an evolutionary algorithm.
2 Particle Swarm Optimization Particle swarm optimization [5, 6] is an evolutionary computation technique that originated from swarm intelligence based on the social behavior of bird flocking and fish schooling. It simulates the idea of food searching pattern of birds to reach to more amount of food in minimal time. Kennedy and Eberhart first developed it in 1995 [7, 8]. Swarm intelligence [9] is the algorithm influenced by the social behavior of birds flocking and fish schooling. Swarm Intelligence is an evolutionary computation technique in which a collection of mediators (known as birds or fish) moves through search space by sharing information to get the optimal solution at an optimal time. The mediator’s movement is random in general or local. But due to the sharing of knowledge and interaction among mediators lead to an optimal global solution. The basic principles of swarm intelligence [9, 10] given here: (1) Proximity, (2) Quality , (3) Diverse response (4) Stability (5) Adaptability principle. Constituents of Basic Particle Swarm Optimization The major terminologies used with PSO [11, 12] include: particle, swarm, velocity, position, cognitive learning, social learning, inertia weight, the particle structure, sharing of information, the fitness of particle, Pbest, Gbest, and proof of vector model of PSO.
Ensemble Maximum Likelihood Estimation Based Logistic …
707
Particle The particle of PSO represents the best probable solution to the problem. The search for PSO is initiated by randomly generated particles(probable solutions). Each particle is represented by position (x) and velocity(v). Position x Every particle has randomly initialized position which is denoted by x. This vector x records the current position (location) of the particle in the search space. xi = xi1 xi2 . . . xij . . . xi,n where xi denotes ith particle position, xij is the jth dimension of ith particle, and n is the dimension of the search space. Particle position can be binary or real valued. When it stores binary values (i.e., 0 or 1), each 1’s represents that dimension’s presence, and 0’s means the absence of the corresponding size. Velocity v Velocity indicates the change in the direction of search and rate of movement of any particle. It is the rate of change in the position of the particle with time. In terms of PSO, the unit time is a restart of the PSO search to update position and velocity after getting information from neighbors and the particle (one iteration of PSO search). The dimension of velocity and position is the same, i.e., n. Velocity is denoted by v. This vector v contains the direction and speed of the particle. vi = vi1 vi2 . . . vi,n where vi is the velocity of ith particle, vij is the j th dimension of ith particle, and n is the dimension of the search space. Swarm The collection or group of probable solutions is called a swarm. Learning is the process of getting new information and knowledge to enhance its experience for the betterment of the future. It is used for modification or alteration of one’s behavior. Therefore, it helps in changing behavior. Learning in PSO are of two types, i.e., Cognitive learning and social learning. Cognitive Learning Cognitive learning is one type of learning in which an individual uses its brain, experience, and sense to acquire knowledge. Here, an individual’s brain guides, as it is evolved to make it more intelligent. Social Learning It is another type of learning in which many individuals share their knowledge and information about something. From all the information collected, the best information is selected and learned by all of them. Both these learning, i.e., cognitive and social learning, help in the search process. Inertia Weight Inertia weight helps to stabilize between exploration and exploitation of the search process. The inertia weight gives a factor of particle’s previous velocity to its velocity at the current iteration. Yuhui and Eberhart [13] gave it. Sharing of Information Sharing of information means that particles of the swarm exchange information about the best probable result from the search space during the search process. From this information, they update the velocity and position of each particle in the direction of improvement.
708
H. Shaheen et al.
Fitness of Particle By fitness of particle mean that fitness of individual particle according to the optimization function and its position. It is represented by f (xi (t)). It means fitness of particle i at tth iteration and x is its position. In [10], Bansal has given a worked-out example as Min f (x1 , x2 ) = x12 + x22 ;
(1)
where x1 ,x2 ∈ (−5, 5). Pbest The particle stores its best solution visited so far by itself in its memory is known as Pbest [14]. Gbest The particle stores in the memory the best solution visited so far by all the particle in the swarm is known as Gbest [14]. Algebraic Representation of PSO Every particle in its memory of its own best position or best experience, which is denoted by personal best called pbi (t). Then common best experience among the members of swarm denoted by g(t). It is a global best which belongs to whole swarm and does not belongs to a particular particle. Let a vector from current position to personal best new old old old = vid + c1 ∗ r1 ∗ (pbid − xid ) + c2 ∗ r2 ∗ (gbd − xid ) vid
(2)
new old new = xid + vid xid
(3)
where r1 , r2 are random values uniform distribution (0,1). c1 and c2 are cognitive and social parameters.
3 Proposed Method From the survey of PSO, it was found that parameter optimization [15, 16] of PSO has not been done yet, using Machine Learning methods. Particle swarm optimization has been used for feature selection in many fields. PSO has been used with logistic regression [17] for feature selection, in which velocity equation is mapped with logistic function. v (velocity) has many parameters w, c1 and c2 (inertia weight, cognitive, and social constants, respectively) that need to be optimized using some learning mechanism. Hence, ensemble maximum likelihood estimation (MLE) is used to optimize the parameters for better feature selection. Therefore, a method ensemble maximum likelihood estimation (MLE)-based logistic MinMaxScaler is proposed. The complete derivation of the proposed method is done to provide a mathematical justification of the proposed method. Suppose we have set (or family) of probability distributions given by a set of parameters β. The distribution may be either probability mass function(pmfs) or probability density functions(pdfs). The training set of m random samples are denoted as {s1 , s2 ..., sm }. Let sj is vector of values
Ensemble Maximum Likelihood Estimation Based Logistic …
709
of features and β is vector of real-valued parameter. All samples are independent, so the probability of the set is the product of the probabilities of the individual examples. Let fβ is the probability of jth sample. Then, f (s1 , s2 , ...sm ; β) =
fβ (si ; β)
(4)
i
likelihood function, which is the joint probability of all samples (total number of samples m having d dimension) having label 1 or 0. Therefore, it is the product of probabilities of samples belonging to class 1 or probabilities of samples belonging to class 0. p(yi = 0|si ) (5) L(β) = p(yi = 1|si ) where yi denote the class of sample i. =
yi =1
pyi
(1 − p)(1−yi )
(6)
yi =0
Therefore, total log Likelihood over all the training sample is given by the following equation. Here, first term is the sum of log of probability of all the positive training examples, and second term is the probability of all the negative training examples. l(β) =
log pi +
i:yi =1
log(1 − pi )
(7)
i:yi =0
Therefore, partial derivative of log likelihood of all the positive training examples as well as all the negative training examples. ∂ l(β) = (1 − pi )zij + −pi zij = (yi − pi )zij ∂βj i:y =1 i:y =0 i i
(8)
i
Here, zij is the value of jth feature of ith training example. Therefore, it is in short can be written like this Eq. 9. It means when class label of ith sample is 1, then it is (1 − pi )zij , and when the class label is 0, then it is −pi zij . ∂ l(β) = (yi − pi )zij ∂βj i
(9)
and β update equation βj = βj + α
∂ l(β) ∂βj
(10)
710
H. Shaheen et al.
where α is the learning rate. βj = βj + α
(yi − pi )zij
(11)
i
In PSO, velocity update equation of ith particle of d th dimension is given by: new old old old vid = ωvid + c1 ∗ r1 ∗ (pbid − xid ) + c2 ∗ r2 ∗ (gbd − xid )
(12)
Let ω is β0 , c1 is β1 and c2 is β2 . Then new old old old = β0 vid + β1 ∗ r1 ∗ (pbid − xid ) + β2 ∗ r2 ∗ (gbd − xid ) vid
(13)
According to logistic regression, new )= P = p(Y = 1|vid
Therefore, p=
1 new 1 + e−vid
(14)
1 1+
(15)
old old old e−(β0 vid +β1 ∗r1 ∗(pbid −xid )+β2 ∗r2 ∗(gbd −xid ))
So, β update equation for jth feature by putting in Eq. 11: βj = βj + α(y − p) ∗ zj
(16)
Here α is learning rate.
1 βj = βj + α y − 1 + e−βz βj = βj + α y −
∗ zj
1 1 + e−{β0 vid
old
old old +β1 ∗r1 ∗(pbid −xid )+β2 ∗r2 ∗(gbd −xid )}
(17) ∗ zj
(18)
Therefore, we will update all three parameters of PSO using this equation. Feature selection generally uses Sigmoid function in PSO [18] given by: Sig(v) =
1 1 + e(−v)
(19)
Here, velocity is passed as an input. Then, a threshold value is set, which is used to decide whether a feature is selected or rejected. Here, position equation of PSO is used to find selected features using that threshold value. Sigmoid function, also called an activation function, has S-shaped curve. It gives the output in the range of 0 and 1. Then, threshold value is set on the basis of which decision is taken. In the
Ensemble Maximum Likelihood Estimation Based Logistic …
711
Fig. 1 Block diagram of proposed method
proposed method, we are replacing Sigmoid function with MinMaxScaler function which is discussed in next section. The MinMaxScaler is the scaling function, which has the following formula for each feature: MinMaxScaler(v) =
vid − vmin vmax − vmin
(20)
where vmin is minimum velocity and vmax is the minimum velocity. The value of vmin and vmax chosen as [vmin, vmax] = [−6, 6] and criteria to stop is chosen as maximum iteration to be 150 or maximum accuracy to be 100. Number of particles new are chosen as 40, and vid is defined by: new old old old = vid + c1 ∗ r1 ∗ (pbid − xid ) + c2 ∗ r2 ∗ (gbd − xid ) vid
(21)
Then, for feature selection based on following rule is used. xid =
1 if MinMaxScaler(v) >= 0.5 0 otherwise
When xid = 1, means feature is selected, and xid = 0 means feature is not selected. In the proposed method shown in Fig. 1, this MinMaxScaler [1] is used to scale the value of velocity in each iteration. Then, position is found using a threshold value for velocity, which will tell whether a feature is selected or not. The algorithm of the proposed method is given in Algorithm.
712
H. Shaheen et al.
1: Input: High Dimensional Data=(S1 , S2 , ..., Sm ),(Si Rd ) and associated class is C = (C1 , C2 , .., Cm ) 2: Output: fitness of gbest f itgb , position of gbest (gb) 3: while (Iter≤MaxIter ) do 4: for (j=1 to n) do 5: calculate fitness of particles using KNN 6: end for 7: for (i=1 to n) do 8: if (f iti ≥ f itpbi ) then 9: pbi =xi 10: f itpbi = f iti 11: else No change in pb, f itpbi 12: end if 13: if (f iti ≥f itgb ) then 14: gb = pbi 15: f itgbi = f iti 16: elseNo change in gb, f itgb 17: end if 18: end for 19: for (d = 1 to number of features) do 20: /*Update the particle */ new old old = β0old vid + β1 old ∗ r1 ∗ (pbid − xold ∗ r2 ∗ (gbd − xold 21: vid id ) + β2 id ) 22: Iterate M LE f or number of times 23: for (Iter = 1 to T ) do 24: F ind updated values of β0new , β1new , β2new 1 new old 25: β0,1,2 = β0 +α y − ∗ old old old 1 + e−{β0 vid +β1 ∗r1 ∗(pbid −xid )+β2 ∗r2 ∗(gbd −xid )} old vid 26: /*Standardize the algorithm using MinMaxScaler PSO as given by equation*/ v new − vmin scaled = id 27: vid vmax − vmin scaled ≥ rand 1, if vid 28: xid (t + 1) = scaled 0, if vid < rand 29: end for 30: end for 31: end while
Algorithm: MLE based logistic MinMaxScaler binary PSO
3.1 Analysis and Time Complexity of Algorithm The time complexity of the proposed method mainly depends on the time taken to compute the new velocity and position of each dimension of each particle. Let us consider, I is the total number of iterations, D is the initial dimension, and P is the total number of particles. So, the time complexity of the logistic PSO is O(I (P + PDT )). Since P is the number of particles which is very less in number and much smaller than PDT . Therefore, time complexity will become = O(I (PDT )). Therefore, it
Ensemble Maximum Likelihood Estimation Based Logistic …
713
will become = O(ID) because the number of particles is very less, and the number of iteration of maximum likelihood estimation (MLE) is T , which is also very less compared to the number of dimensions. Finally, it reduced to = O(D). Since the number of iteration and number of particles is much lesser than the number of dimensions, therefore the time complexity of logistic regression is O(D).
3.2 Experimental Design For feature selection, proposed method maximum likelihood estimation (MLE)based logistic MinMaxScaler binary particle swarm optimization is applied on gene expression profile dataset (SRBCT dataset) taken from gene expression model selector (GEMS) [19]. The description of dataset SRBCT is that it has 83 number samples, 2308 features, 4 number of classes Ewing’s Sarcoma, Rhabdomyosarcoma, Burkitt’s Lymphoma, and Neuroblastoma. Ewing’s Sarcoma has 29 samples, Rhabdomyosarcoma has 25 samples, Burkitt’s Lymphoma has 11 samples, and Neuroblastoma has 18 samples. The fitness of each particle is obtained using the K-nearest neighbor classifier with leave one out cross validation. The value of k is taken as 1. The performance of the algorithm is justified according to classification accuracy and selected features metrics. The simulations were performed on Matlab 7.11.1.866 R2010b, Licence No. 691568. The parameter is chosen from [20]. Initially ω, c1 and c2 was set as 0.5, 2, and 2, respectively. Later, these are updated using the proposed method. The value of vmin and vmax chosen as [vmin, vmax] = [−6, 6] and criteria to stop is chosen as a maximum iteration to be 150 or maximum accuracy to be 100. Maximum likelihood estimation (MLE) is iterated 100 times or till the difference between updates of parameters in two successive iterations is less than 0.005. The number of particles is chosen as 40.
3.3 Result and Discussion The results is given in Table 1. The data (IBPSO, EVPSO, Genetic Algorithm) in the comparison Table 2 has been taken from [21]. An experiment was performed on the dataset SRBCT having 83 samples and 2308 features. Classification accuracy and the number of features selected are found in each run. Four runs were performed. In the first run, the accuracy and number of features selected are 97.59% and 108, respectively. In the second run, accuracy and number of features selected are 98.8% that is 58, respectively. In the third run, accuracy and number of features selected are 97.59% and 108 respectively. In the fourth run, the accuracy and number of features selected are 98.89% and 72, respectively. Then, the average of all the runs has been taken, the average accuracy is 98.21%, and the average number of features selected is 84. It is shown in Table 1. Average
714
H. Shaheen et al.
classification accuracy is enhanced. There is a good amount of feature reduction, i.e., 96.36%. The confusion matrix of the last run is given below : ⎡ 29 0 ⎢ 0 25 ⎢ ⎣0 0 0 0
1 0 10 0
⎤ 0 0⎥ ⎥ 0⎦ 18
The confusion matrix assumes that the horizontal axis is the actual class, and the vertical is the predicted class. Precision is obtained as 1.0000, 1.0000, 0.9091, and 1.0000 for class Ewing’s Sarcoma, Rhabdomyosarcoma, Burkitt’s Lymphoma, and Neuroblastoma, respectively. The recall is obtained as 0.9667, 1.0000, 1.0000, and 1.0000 for class Ewing’s Sarcoma, Rhabdomyosarcoma, Burkitt’s Lymphoma, and Neuroblastoma respectively. For class Ewing’s Sarcoma: TP = 29, TN = 25 + 10 + 18 = 53, FP = 0, FN = 1; For class Rhabdomyosarcoma: TP = 25, TN = 29 + 10 + 18 = 57, FP = 0, FN = 0; For class Burkitt’s Lymphoma: TP = 10, TN = 29 + 25 + 18 = 72, FP = 1, FN = 0; For class Neuroblastoma: TP = 18, TN = 29 + 25 + 10 = 64, FP = 0, FN = 0; The proposed maximum likelihood estimation (MLE)-based logistic MinMaxScaler binary PSO for feature selection has shown good classification for three classes (Ewing’s Sarcoma, Neuroblastoma, Rhabdomyosarcoma) of data and satisfactory for the class Burkitt’s lymphoma. For class Burkitt’s lymphoma, one sample is misclassified. Therefore, from this result
Table 1 Results of four runs of ensemble maximum likelihood estimation (MLE)-based logistic MinMaxScaler binary PSO on SRBCT dataset Run Classification accuracy Selected features Run1 Run2 Run3 Run4 Average
97.59 98.80 97.59 98.89 98.21
108 58 98 72 84
Table 2 Comparison of proposed method with other feature selection algorithms, genetic algorithm, and other classifier for classification accuracy and feature selection Dataset
IBPSO
EVPSO
Genetic algorithm
MinMaxScaler Maximum [1] PSO likelihood estimation (MLE) based Logistic MinMaxScaler binary PSO
Accuracy
97.59
95.66
81.9277
97.89
98.21
Selected genes
1124
1050
46
1137
84
Ensemble Maximum Likelihood Estimation Based Logistic …
715
found, we can say that the proposed method has comparable classification accuracy and features also reduced to a large extent.
3.4 Summary Feature selection plays an important role in classification. Therefore, a model requires, so that it can enhance the accuracy as well as can reduce the number of features so that the processing speed increases as well. To enhance performance is the main goal of the classification model. Here, by optimizing parameters of PSO, along with MinMaxScaler PSO helps in improving not only the performance of PSO, but also reduces the dimensions.
4 Conclusion Particle swarm optimization has been used for optimization as well as for classification for a long time. Various authors have modified PSO for their problems in their way to get a better result for their problems. But, still, it is suffering from problems like it might stick in local optima, or there is some stagnation in the result. They converge prematurely. Parameters ω, c1 , and c2 have to be chosen randomly in literature. So, no one has given solution to find optimal values of these parameters ω, c1 and c2 . Moreover, the normalization of the Sigmoid function used in PSO for feature selection has to be reconsidered for a better result. Here, parameter optimization of PSO is done using ensemble maximum likelihood estimation (MLE) which improves classification accuracy. Hence, it is combined with MinMaxScaler to improve its performance. When maximum likelihood estimation (MLE) is combined with MinMaxScaler, then it enhances the classification accuracy to 98.21%, and dimension reduction is 96.36%. Therefore, the proposed method maximum likelihood estimation(MLE)-based logistic MinMaxScaler binary PSO has comparable classification accuracy. The method has shown good classification for three classes (Ewing’s Sarcoma, Neuroblastoma, Rhabdomyosarcoma) of data and satisfactory for the class Burkitt’s lymphoma. For class Burkitt’s lymphoma, one sample is misclassified. But the dimension reduction is very outperforming which gives more weightage to the performance of the proposed method. This may not perform the same having many classes in the dataset. In the future, one may work in this area to enhance its performance in the dataset having many classes.
716
H. Shaheen et al.
References 1. Shaheen, H., Agarwal, S., Ranjan, P.: MinMaxScaler binary PSO for feature selection. In: First International Conference on Sustainable Technologies for Computational Intelligence, pp. 705–716. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-0029-9_55 2. Farshi, T.R., Drake, J.H., Özcan, E.: A multimodal particle swarm optimization-based approach for image segmentation. Expert Syst. Appl. 149, 113233 (2020).https://doi.org/10.1016/j.eswa. 2020.113233 3. Swami, V., Kumar, S., Jain, S.: An improved spider monkey optimization algorithm. In: Pant, M., Ray, K., Sharma, T., Rawat, S., Bandyopadhyay, A. (eds) Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computing, vol 583. Springer, Singapore https://doi.org/10.1007/978-981-10-5687-1_7 4. Sharma, S., Saha, A.K., Nama, S.: An enhanced butterfly optimization algorithm for function optimization. In: Soft Computing: Theories and Applications, pp. 593–603. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-4032-5_54 5. Zhang, Y., Wang, S., Ji, G.: A comprehensive survey on particle swarm optimization algorithm and its applications. Math. Probl. Eng. (2015). https://doi.org/10.1155/2015/931256 6. Wang, D., Tan, D., Liu, L.: Particle swarm optimization algorithm: an overview. Soft Comput. 22(2), 387–408 (2018). https://doi.org/10.1007/s00500-016-2474-6 7. Eberhart, R., Kennedy, J.: Particle swarm optimization, In: Proceedings of the IEEE International Conference on Neural Networks, Vol. 4, pp. 1942–1948 (1995). https://doi.org/10.1109/ ICNN.1995.488968 8. Rauf, H.T., Shoaib, U., Lali, M.I., Alhaisoni, M., Irfan, M.N., Khan, M.A.: Particle swarm optimization with probability sequence for global optimization. IEEE Access 8, 110535–110549 (2020). http://doi.org/10.1109/ACCESS.2020.3002725 9. Millonas, M.M.: Swarms, phase transitions, and collective intelligence. arXiv preprint adaporg/9306002 (1993). https://arxiv.org/abs/adap-org/9306002 10. Bansal, J.C.: Particle swarm optimization. In: Evolutionary and Swarm Intelligence Algorithms, pp. 11–23. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-91341-4_2 11. Jana, B., Chakraborty, B., Mandal, B.: A Task scheduling technique based on particle swarm optimization algorithm in cloud environment. In: Ray, K., Sharma, T., Rawat, S., Saini, R., Bandyopadhyay, A. (eds) Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computing, vol. 742. Springer, Singapore (2019). https://doi.org/10.1007/978981-13-0589-4_49 12. Das, D., Panda, S., Padhy, S.: Quantum particle swarm optimization tuned artificial neural network equalizer. In: Pant, M., Ray, K., Sharma, T., Rawat, S., Bandyopadhyay, A. (eds) Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computing, vol 583. Springer, Singapore (2018) https://doi.org/10.1007/978-981-10-5687-1_52 13. Shi, Y., Eberhart, R.C.: Empirical study of particle swarm optimization.: evolutionary computation, 1999. In: Proceedings of the 1999 Congress on CEC 99, vol. 3. IEEE (1999). https:// doi.org/10.1109/CEC.1999.785511 14. Esmin, A.A., Coelho, R.A., Matwin, S.: A review on particle swarm optimization algorithm and its variants to clustering high-dimensional data. Artif. Intell. Rev. 44(1), 23–45 (2015). https://doi.org/10.1007/s10462-013-9400-4 15. Trelea, I.C.: The particle swarm optimization algorithm: convergence analysis and parameter selection. Inf. Process. Lett. 85(6), 317–325 (2003). https://doi.org/10.1016/S00200190(02)00447-7 16. Shi, V., Eberhart, Y.: Parameter selection in particle swarm optimization. In: International Conference on Evolutionary Programming, pp. 591–600. Springer, Berlin, Heidelberg, March (1998). https://doi.org/10.1007/BFb0040810 17. http://cseweb.ucsd.edu/elkan/250B/logreg.pdf 18. Siqueira, H., Figueiredo, E., Macedo, M., Santana, C.J., Santos, P., Bastos-Filho, C.J., Gokhale, A.A.: Double-swarm binary particle swarm optimization. In: 2018 IEEE Congress on Evolutionary Computation (CEC), pp. 1–8. IEEE (2018) https://doi.org/10.1109/CEC.2018.8477937
Ensemble Maximum Likelihood Estimation Based Logistic …
717
19. Statnikov, A.: Gene expression model Selector (2005) www.gems-system.org 20. Shi, Y., Eberhart, R.: A modified particle swarm optimizer. In: 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No. 98TH8360), 69–73 IEEE (1998) https://doi.org/10.1109/ICEC.1998. 699146 21. Agarwal, S., Ranjan, P.: Optimum feature selection using new ternary particle swarm optimization in two phases.: J. Intell. Fuzzy Syst. 33(4), 2095–2107 (2017). https://doi.org/10.3233/ JIFS-161956
Automatic Identification of Medicinal Plants Using Morphological Features and Active Compounds Saakshi Agrawal and Sowmya Yellapragada
Abstract Plants are an obligatory piece of our biological system, and the decreasing number of plant assortments is a genuine concern. To conserve plants and make optimum utilization of them, it is a major requirement to identify them based on their discrete essential features and properties. Plants structure the foundation of Ayurveda, and the present modern-day medication is an extraordinary wellspring of revenue. Leaf identification by mechanical means frequently prompts wrong recognizability. Here, we are mentioning the idea of mapping the morphological/physical features of leaves, plants, and herbs with the active biochemical compound in the equivalent. Despite the fact, physical features are not associated with the chemical compound in leaf/plants; we can use both types of features to gain a good outcome in the identification and classification of medicinal leaves/plants. Solely morphological features or bio-active compounds in the leaves are not adequate to acquire the precise results in the prediction model. In this paper, we have described the combined tabular data of plants and leaves that incorporate the morphological as well as chemical features of individual leaves/plants/herbs from around 20 countries and 4 continents in the world. Also, there is a clear description of methods that can be used for generating such a prediction model using machine learning techniques (considering the state of the artwork). Keywords Herbs · Medicinal plants · Morphology · Active compound · Machine learning
1 Introduction Over the course of history, plantations have been a significant part of human evolution. There is a growing scientific consensus that plant habitats have been altered, and species are disappearing at rates never witnessed before. Due to deforestation and
S. Agrawal (B) · S. Yellapragada G. G. S. Indraprastha University, New Delhi 110003, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_59
719
720
S. Agrawal and S. Yellapragada
pollution, a lot of medicinal plant leaves have almost become extinct. As a consequence, there is an urgent need for us to identify them and regrow them for the use of future generations. Due to growing illegal trade and malpractices in the crude drug industry on one hand and lack of sufficient experts, on the other hand, automated and reliable identification and classification mechanisms to handle the bulk of data and to curb the malpractices are needed [1]. Not only the exchanging of oxygen, but also the gaseous matter had captured space in a life journey. More than its plants have evolved with lots of species and their precious compounds for inculcating the benefits to the health and body. Focusing on the health of humans and all the other living beings on the planet, this paper mainly brings attention toward the geometric features and active compounds of the plants individually, irrespective of the species they are grown into. The variety of geometric structures of various leaves and their active compounds can be identified using machine learning techniques. Many research papers have already outlined the medicinal plants and their benefits with the features they possess in them depending on the geometric structure of any leaf. The malpractices and frauds in the drugs industry, nowadays, are an absolute treat to the everlasting effect of Ayurveda and the herbal origins of the health-related phenomena in the medicinal world [1, 2]. To revive the advantages of plants, herbs, green bushes, trees, and the leaves, there can be an effective way of implementation to outline the identification ways which discretely classify them into medicinal categories [3]. Botanists significantly need such an application which can make it easier for them to look for the best yet optimal plantations for putting the plants and herbs in the list of creation of medicines as well as drugs from health benefits prospective. A people-plants relationship can be strengthened by these applications, so that the dependency over the artificial products may drop down, and more effective usage of green species may evolve on the planet. The identification of active compounds in any leaf or chemical compounds in roots/branches is not easy for a common human being. These can be made easier by just implementing the machine learning methods over the dataset of leaves/ plants/ herbs. For the upbringing of plants, there is a need for deeper study into the wide variety of species of herbs and plants from the planet, so that they may be brought up in more usage. India has always been a rich repository of plantations and herbs which can be highly beneficial in the cure of diseases and the ill effects of the harmful environment around us [4]. It is not as easy as before to identify the plants and use them for particular reasons and causes. So, we need to build a model that can identify and classify these plants that are to be utilized in the medicinal world to create a good effect on those plants and herbs. There are several methods through which plant leaves and other materials can be identified. Some of the most popular methods among these are spectroscopy, chemical identification, and optical identification [1, 3]. Digital images of the leaves can be identified by using various computer algorithms. Features and parameters can be utilized to identify the leaves by the geometric features possessed by them. The two ways through which classification of leaves can be achieved are geometric features recognition and active compounds identification. This work aims to develop a platform method to rapidly identify geometric features and bio-active compounds from leaves, plant extracts, and their partially purified fractions. Medicinal plants are
Automatic Identification of Medicinal Plants Using …
721
the “backbone” of traditional medicine, which means, more than 3.3 billion people in the less developed countries utilize medicinal plants regularly [5].
2 Related Work Researchers have been working great on the botanist theory of plantations for a long time. Particularly in medicinal use, people may select plants in the environment for the treatment of diseases from their chemo-sensory properties, so that plants with distinct perceived smell and taste may be indicated for the treatment of different diseases [6]. Understanding the complexity of the relationship between people and medicinal plants requires research that dialog with different disciplines, such as chemical ecology and ethnobiology, for example, Albuquerque and Ferreira Júnior [7]. In this case, ethnobiology is the science that investigates the relationships between people and biota in the environment, being one of its objectives to understand how people appropriate plants for different uses [8]. Many plants are on the verge of extinction due to various sources of Chinese medicines and allopathic treatments. Economically, few treatments are more highlighted due to the fast recovery of disease by the use of artificially made drugs and medicines. Ethno-medicinal studies play a vital role in discovering new drugs from indigenous medicinal plants, and green pharmaceuticals are getting popularity and extraordinary importance. Conventional methods of drug discovery from natural products include bio-assay-guided fractionation, which is tedious and has low efficiency, in an organism; the presence of a complete complement of small molecules is called a metabolome [9]. Over the past 50 years, spectroscopic techniques coupled with some good extraction methods like chromatography have contributed natural product chemistry to phenomenal success. Ayurveda is based on the experiences as if experimented; it has branched into nine areas, which are called “Astanga Ayurveda” as follows: • • • • • • • • •
Kaya Chikitsa (medicine) Salya Chikitsa (surgery) Salakya Chikitsa (ENT treatment) Bala Chikitsa (pediatric treatment) Jara Chikitsa (treatment-related to genetics) Rasayana Chikitsa (treatment with chemicals) Vajikarama Chikitsa (treatment with rejuvenation and aphrodisiacs) Graham Chikitsa (planetary effects) Visha Chikitsa (toxicology).
This list describes plants for different therapeutic categories as well as different specialties in Ayurveda [10, 11].
722
S. Agrawal and S. Yellapragada
3 Classification Classification of leaves is performed by various machine learning methods. Artificial neural networks have found widespread use for classification tasks and function approximation in many fields of chemistry and bio-informatics [12]. There are various types of neural networks that can be used for these problems. Developing a prediction model for identifying and classifying the active compounds from the leaves and medicinal plants can be very effective and also detecting the structural features of plants/ leaves to combat the major diseases can prove to be helpful. Descriptors can be used to identify the molecular structure in compounds of leaves [12, 13].
3.1 Morphological Features One of the widely used non-destructive techniques to identify herbs is based on their leaf morphological images. Plant leaves are representative enough to differentiate plant species or variety with high accuracy. At present, plant recognition is still the specialization of plant taxonomists. The advancement of computing technologies would be another alternative of choice for non-specialists. Nowadays, the morphological characteristics of leaves can be extracted by a mathematical model to be put into a software program for recognition. This could reduce false-positive results caused by human error [13]. Computational morph metric methods can quantitatively measure a leaf geometrically and visualize differences. Combining the morphological features and active compounds in the leaf may result in an easier and better way for the identification of plants and their classification. Medicinal classification can be categorized into two major classes: geometric and chemical. The geometric classification includes the morphological features, parameters such as shape, length, width, color, structure, diameter, area, and perimeter. Chemical class includes active compounds and chemical composition [1, 14]. Mapping of the geometrical features with the active compounds in the leaf can make it easier for the classification of medicinal plants. Gathering the data containing names of the plants (Scientific, Regional, English) along with the morphological parameters for each of the corresponding leaf and their active compounds would be the preliminary goal. Region-wise, there may be variances in the plant species and their advantages in medicinal properties. Necessary data could be collected from various regions [14]. Basic Geometric Features: Firstly, we obtain five basic geometric features: diameter, physiological length, physiological width, leaf area, and leaf perimeter. Digital Morphological Features: Based on five basic features introduced previously, we can define 12 digital morphological features used for leaf recognition: smooth factor, aspect ratio, form factor, rectangularity, narrow factor, perimeter ratio
Automatic Identification of Medicinal Plants Using …
723
of diameter, perimeter ratio of physiological length, physiological width, and vein features.
3.2 Active Compounds The extracts of several medicinal plants are very effective against microbial as well as parasitic infections [9]. For example, several groups of anti-fungal proteins like glucanase, chitinase, and proteins, which are of low molecular weight and nonenzymatic, are present in the seeds of many medicinal plants, and these proteins are being used for the protection of a developing embryo from many infections [9]. Shoemaker et al. [15] have reported that there are over 400,000 species of plants on earth which have a huge reservoir of bio-active compounds, but only a small percentage of these have been examined in the research. Molecular structures are often represented using molecular descriptors, which encode much structural information. In recent years, there has been a shift from empirical parameters to purely calculated descriptors, such as quantum chemical descriptors and topological indices [12]. To calculate descriptors solely from molecular structure and apply them to sets of structurally diverse compounds is possible. The quantum chemical descriptors include information about binding and formation energies, dipole moment, and molecular orbital energy levels. Topological descriptors include valence and non-valence molecular connectivity indices calculated from the hydrogen-suppressed formula of the molecule by TOPIX [16], encoding information about the size, composition, and the degree of branching of a molecule.
4 Block Diagram The primary step is to detect the boundary of the leaf image that is given as an input. This would help in calculating the shape-based features like length, breadth, aspect ratio, and roundness of the leaf. This paper is showcasing the methodology of mapping morphological features of leaves with the bio-active compounds in them. Figure 1 is describing the block diagram of the process that can be used for the identification of plants, leaves, shrubs, and herbs for medicinal purposes.
5 Mapping of Morphological Features with Chemical Compounds Mapping of the morphological features and active compound of plant leaves is an effective way of classifying the leaves for various medicinal categories. We can make
724
S. Agrawal and S. Yellapragada
Fig. 1 Block diagram for the process of identification of various medicinal plants
a prediction model in such a way that physical and morphological features of leaves get mapped with the active biochemical compounds. To predict the medicinal capability of the plant leaves and herbs, this kind of model can bring up more accuracy and flexibility [17]. The plant leaf is a complex mixture of many chemical compounds (such as water, pigments, N-containing proteins, structural carbohydrates, etc.), and they all contribute to the overall shape of leaf spectra [18]. Furthermore, the physical state of the leaf (such as leaf thickness and surface roughness) also affects its reflectance spectra.
6 Relationship Between Physical Structure and Chemical Compounds Structural description of a drug plant at morphological (macroscopic) and anatomical (microscopic) levels as used in the pharmacopoeial texts of crude drugs is important for its botanical identity, quality of herbal preparation, and pharmacognostic standardization. Roots, stems, and leaves as well as flowers and fruits are the basic morphological organs of higher plants [19].
Automatic Identification of Medicinal Plants Using …
725
The occurrence of cystolith hairs is an important criterion for the identification of marijuana leaf fragments. Alkaloids and glycosides are the important, therapeutically active chemical substances, which remain as a solution in the cell sap of many plantbased drugs. Their presence can only be detected by Dragendorff or Mayer’s reagent for alkaloids and Borntrager’s reagent for anthraquinone glycosides [19]. The internal structural knowledge is applicable for the identification of organized crude drugs, while the botanical identity of unorganized drugs should be determined by chemical analysis. The relationship between the outer physical structure and the chemical constituent of leaves can be more effective in identifying the correct medicinal capabilities of plants, herbs, and leaves. The variation of the photosynthetic pigment content is determined by the spectrophotometric method. These analyses show that total water content, osmotic pressure, and solute concentration in leaves of shortleaf pine and loblolly pine differed significantly between species as well as between soil moisture categories [20]. The analyses also show that soil moisture treatments caused significant differences in the inbound water contents and that the species did not [21]. The mean values of each physicochemical property can then be estimated (Table 1).
7 Conclusion and Limitations A prediction model based on machine learning for the automatic recognition of medicinal plants will help the local population to improve their knowledge on medicinal plants, help taxonomists to develop more efficient species identification techniques, and will also contribute significantly to the protection of endangered species. For future research, in an attempt to achieve even higher accuracy, probabilistic neural networks and deep learning neural networks would be investigated. Significantly, utilization of both morphological properties and chemical compounds constituents in the plants, leaves, herbs will enhance the performance of the implementation, thereby contributing to the medical domain in a better way. Reliability over the existing medicinal plants for the required purpose all over the world may bring up a positive response. Future research may help in overcoming the following limitations: • The high similarity in the morphological features of any leaf makes the recognition of their shapes difficult. • Since the new features in the existing leaves are not considered in depth, so it might affect the accuracy in recognizing some of the important features. • New features must be designed and extracted to improve accuracy in recognizing uniqueness among the leaves. • There is a scope for new algorithm that can be developed by tweaking the detection technique that may lead to detection of specific medicinal plants accurately.
Carica papaya
Syzygium polyanthum [22]
Piper betel
Origanum vulgare
Annona muricata Sirsak
2
3
4
5
6
Ore-gano
Sirih
Salam
Papaya
Psidium guajava Guava
1
English plant name
Scientific name
S. No.
Papain, chymopapain, cystatin, tocopherol, ascorbic acid, flavonoids, cyanogenic glucoside, and glucosinolates
Aglycone quercetin
Chemical
Broadleaf
Spade-shaped, olive-green leaves
Glossy heart-shaped
Treats stomach ailments, fever, parasitic infections, hypertension, and rheumatism
Strongest antioxidant
Carvacrol, β -fenchyl alcohol, thymol, and γ-terpinene Annonaceous acetogenins, alkaloids, and phenols
Relieves from internal pains in the body
Reduces blood sugar and insulin levels
Treats dengue fever
Spasmolytic
Activity
Hydroxy-chavicol, (Chromadex, A1036B), eugenol (Aldrich, E51791), β-caryophyllene (Aldrich, 7–44-5)
Thinly leathery, elliptic Essential oils, tannins, and or lance-shaped flavonoids
Deeply palmately lobed
Ovate-elliptic or oblong-elliptic
Morphological
Table 1 Table for mapping morphological and active compounds
(continued)
South and North America
Greece
Southeast Asia
Mediterranean region
Southern Mexico
Southern Mexico
Origin
726 S. Agrawal and S. Yellapragada
Scientific name
Blumea balsamifera
Mentha
Ocimum tenuiflorum
Hibiscus laevis
S. No.
7
8
9
10
Table 1 (continued)
Hibiscus
Tulsi
Mint
Sem-bung
English plant name
Oval to lance-like
Oval
Oblong to lanceolate
Softly hairy, half woody, strongly aromatic shrub, 1–4 m (m) high. Simple, alternate, broadly elongated leaves, 7–20 cm long, with toothed margin and appendaged or divided base
Morphological
It helps in losing weight and lowers the cholesterol level in our body. Hibiscus is a good skin cleanser
Oleanolic acid, ursolic acid, Counters metabolic rosmarinic acid, stress eugenol, carvacrol linalool, and β-caryophyllene Tannins, anthraquinones, quinines, phenols, flavonoids, alkaloids, terpenoids, saponins, cardiac glycosides, protein
Origin
(continued)
China, Japan, and the Pacific Islands
Indian sub-continent
North America
Treats urolithiasis Indian (urinary tract or kidney Sub-continent and stones) and urinary Southeast Asia tract infections
Activity
Limonene, cineole, Eliminates toxins from menthone, menthofuran, iso the body menthone, menthyl acetate, isopulegol, menthol, pulegone, and carvone
Ngai or Blumea camphor
Chemical
Automatic Identification of Medicinal Plants Using … 727
Scientific name
Citrus limon
Tinos-pora cordifolia
Ocimum basilicum
Petroselinum crispum
Matricaria chamomilla
Coriandrum sativum
Withania somnifera
S. No.
11
12
13
14
15
16
17
Table 1 (continued)
Ashwa-gandha
Cori-ander
Chamo-mile
Parsley
Basil
Giloy
Lemon
English plant name
Elliptic
Variable
Fern-like light green and feathery
Curly and flat-leafed
Oval
Heart-shaped leaves
Ovate, oblong, and taper to a point on the non-stem end
Morphological
Withaferin-A and withanone (withanolides)
Linalool (72.7%) followed by λ-terpinene (8.8%), α-pinene (5.5%), camphor (3.7%)
Sesquiterpenes, flavonoids, coumarins, and polyacetylenes
Vitamin C (248.31 mg/100 g dry matter), carotenoids (31.28 mg/100 g dry matter), chlorophyll (0.185 mg/g dry matter)
Linalool and methyl chavicol (estragole)
Alkaloids, terpenoids, lignans, steroids
Linalool (30.62%), geraniol (15.91%), α-terpineol (14.52%), and linalyl acetate (13.76%)
Chemical
Assam (a region in northeast India), northern Burma or China
Origin
Mediterranean region of Southern Europe and Western Asia
Italy
Reduces anxiety and stress, help fight depression
Takes care of aches, pains, and skin concerns
(continued)
India, the Middle East, and parts of Africa
Southern Europe and Northern Africa
Acts as an Europe and West anti-inflammatory herb Asia
Improves bone health, protects against chronic diseases, and provides antioxidant benefits
Treats cuts, wounds, and skin infections
Reduces mental stress, Tropical areas of anxiety and also boosts India, Myanmar, the memory and Sri Lanka
Treats nerve disorders like insomnia, nervousness, and palpitation
Activity
728 S. Agrawal and S. Yellapragada
Aloe barbadensis Aloe- Vera miller
Mentha Piperita
19
20
Peppermint
Tur-meric
Curcuma longa
18
English plant name
Scientific name
S. No.
Table 1 (continued)
Fuzzy
Dagger-shaped
Oblong or lanceolate
Morphological
Limonene, cineole, Eases tinctures, chest menthone, menthofuran, iso rubs menthone
Holds antimicrobial capacity
Reduces symptoms of cold, jaundice, and even intestinal worms
α-phellandrene (18.2%), 1,8-cineole (14.6%) and p-cymene (13.3%) polysaccharides
Activity
Chemical
Europe and North America
South-west Arabian Peninsula
Vedic culture in India
Origin
Automatic Identification of Medicinal Plants Using … 729
730
S. Agrawal and S. Yellapragada
References 1. Gopal, A., Prudhveeswar Reddy, S., Gayatri, V.: Classification of selected medicinal plants leaf using image processing. In: 2012 International Conference on Machine Vision and Image Processing, MVIP 2012, (December), 5–8 (2012). https://doi.org/10.1109/MVIP.2012.642 8747 2. Pandey, M.M., Rastogi, S., Rawat, A.K.S.: Indian traditional ayurvedic system of medicine and nutritional supplementation. Evid.-Based Complement. Altern. Med. (2013) 3. Ching, J., Soh, W.L., Tan, C.H., Lee, J.F., Tan, J.Y.C., Yang, J., Yap, C.W., Koh, H.L.: Identification of active compounds from medicinal plant extracts using gas chromatography-mass spectrometry and multivariate data analysis. J. Sep. Sci. 35(1), 53–59 (2012) 4. Petrovska, B.B.: Historical review of medicinal plants’ usage. Pharmacognosy Rev. 6(11), 1–5 (2012) 5. Singh, R.: Medicinal plants: a review. J. Plant Sci. 3(1), 50 (2015) 6. Geck, M.S., Cabras, S., Casu, L., Reyes García, A.J., Leonti, M.: The taste of heat: how humoral qualities act as a cultural filter for chemosensory properties guiding herbal medicine. J. Ethnopharmacol. 198, 499–515 (2017) 7. Albuquerque, U.P., Ferreira Júnior, W.S.: What do we study in evolutionary ethnobiology? Defining the theoretical basis for a research program. Evol. Biol. 44, 206–215 (2017) 8. Albuquerque, U.P., Alves, R.R.N. (eds.): Introduction to Ethnobiology. Springer International Publishing, N.p. (2016). https://doi.org/10.1007/978-3-319-28155-1 9. Mustafa, G., Arif, R., Atta, A., Sharif, S., Jamil, A.: Bioactive compounds from medicinal plants and their importance in drug discovery in Pakistan. Matrix Sci. Pharma 1(1), 17–26 (2017) 10. Mukherjee, P.K., Wahile, A.: Integrated approaches towards drug development from Ayurveda and other Indian systems of medicines. J. Ethnopharmacol. 103, 25–35 (2006) 11. Mukherjee, P.K., Rai, S., Kumar, V., Mukherjee, K., Hylands, P.J., Hider, R.C.: Plants of Indian origin in drug discovery. Expert Opin. Drug Discov. 2(5), 633–657 (2007) 12. Xue, C.X., Zhang, X.Y., Liu, M.C., Hu, Z.D., Fan, B.T.: Study of probabilistic neural networks to classify the active compounds in medicinal plants. J. Pharm. Biomed. Anal. 38(3), 497–507 (2005) 13. Azlah, M.A.F., Chua, L.S., Rahmad, F.R., Abdullah, F.I., Alwi, S.R.W.: Review of techniques for plant leaf classification and recognition. Computers 8(4) (2019) 14. Wu, S.G., Bao, F.S., Xu, E.Y., Wang, Y.X., Chang, Y.F., Xiang, Q.L.: A leaf recognition algorithm for plant classification using a probabilistic neural network. In:ISSPIT 2007—2007 IEEE International Symposium on Signal Processing and Information Technology, 11–16 (2007) 15. Shoemaker, M., Hamilton, B., Dairkee, S.H., Cohen, I., Campbell, M.J.: In-vitro anticancer activity of twelve Chinese medicinal herbs. Phytother. Res. 19, 649–651 (2005) 16. Svozil, D., Lohninger, H.: A program for the calculation of structural descriptors. TOPIX—A Program to Calculate Structural Descriptors 1999. https://www.lohninger.com/topix.html. 17. Singh, H., Rani, R., Mahajan, S.: Detection and classification of citrus leaf disease using hybrid features. In: Pant, M., Sharma, T., Verma, O., Singla, R., Sikander, A. (eds) Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computing, vol 1053. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-0751-9_67 18. Huang, X., Wang, P.: Chapter 4 Features and Features’ Selection for Medicinal Plants, 93–159 (2014) 19. Alamgir, A.N.M.: Progress in Drug Research, Volume 73: Therapeutic Use of Medicinal Plants and Their Extracts: Volume 1: Pharmacognosy, Vol. 1 (2017) 20. Jyothi, P.M.S., Nandan, D.: Utilization of the Internet of Things in agriculture: possibilities and challenges. In: Pant, M., Kumar Sharma, T., Arya, R., Sahana, B., Zolfagharinia, H. (eds) Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computing, vol 1154. Springer, Singapore (2020). http://doi-org-443.webvpn.fjmu.edu.cn/https://doi.org/ 10.1007/978-981-15-4032-5_75
Automatic Identification of Medicinal Plants Using …
731
21. Sihag, J., Prakash, D., Yadav, P.: Evaluation of soil physical, chemical parameter and enzyme activities as indicator of soil fertility with SFM Model in IA–AW Zone of Rajasthan. In: Pant, M., Kumar Sharma, T., Arya, R., Sahana, B., Zolfagharinia, H., (eds) Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computing, vol. 1154. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-4032-5_98 22. Dewijanti, I.D., Mangunwardoyo, W., Artanti, N., Hanafi, M.: Bioactivities of Salam leaf (Syzygium polyanthum (Wight) Walp). In: AIP Conference Proceedings, 2168 (2019)
A Prototype IoT Management System to Control Grid-Parallel Distribution of Localised Renewable Energy for Housing Complexes in New-Normal Era Sandip Das, Abhinandan De, and Niladri Chakraborty Abstract In this post-COVID-19 new-normal era, remote management and monitoring of services, maintaining social distance is utmost essential. IoT devices, which are manageable with personal Android smartphones are always welcomed. When it comes to essential services like electric power distribution, at the time of power shortage or natural calamities, uninterrupted distribution and management of gridparallel localised renewable energy generation become most important. This paper proposes a novel prototype model of IoT management system, requiring nominal use of manpower to control distribution of localised renewable energy generation in housing complexes, working like a micro-electrical grid. The prototype model is manageable with one’s personal Android smartphone from a distance of up to about 10 m, with specific IoT device name and password to ensure security. Considering an islanding situation, this IoT end device will be able to perform without Internet cloud connectivity using the smartphone’s Bluetooth, forming a fog network; and with a backup battery power of 9 V DC as uninterrupted electric power source. Also, the proposed prototype model will be operable with conventional electric power using a step down transformer–rectifier at source and with Internet through Wi-Fi by implying an ESP8266 module, replacing the existing HC-05, at nominal additional cost. Keywords IoT · Management · Arduino · Android · New-normal
S. Das (B) Electrical Engineering Department, H.E.T.C, Hooghly, India A. De Electrical Engineering Department, IIEST, Shibpur, India N. Chakraborty Power Engineering Department, Jadavpur University, Kolkata, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9_60
733
734
S. Das et al.
1 Introduction The ongoing development of urban areas, building of housing complex projects and increasing population raises demand of electricity as well secure power structure. As situated near to Ganges Delta and extended costal area of Bay of Bengal, Kolkata and surroundings experience plenty of storms in North Indian Ocean cyclone season. Amid of COVID-19, while citing Kolkata and surroundings, the super cyclonic storm ‘Amphan’ on Wednesday 20 May 2020 compels to rethink over the anatomy of existing power structure and originate the necessity of building a parallel mini-grid alike structure, wherever possible. Infrastructural possibilities of solar photovoltaic (SPV) energy generation are plenty in housing complexes, due to their wide rooftop areas, where renewable energy generation is possible. Also, this is necessary to meet up their emergency energy requirements by establishing a parallel distribution system inside, arranging a mini-grid alike structure, when islanded fully or partially. Though the amount of generated power would not be more than 1 kW per square metre for SPV power [1], a good storage battery bank and user-friendly distribution management system may wisely utilise this power based on necessity. Based on necessity and demand of the habitat of different houses as well for general security and emergency lighting purpose inside the complex, at night time; the generated and stored power may be distributed through an IoT system by operating different relays in a localised manner. The IoT end devices are manageable from a distance, maintaining hygiene, which can be operated by a manager or caretaker of the housing complex society, with personal smartphone.
2 Genesis of the Prototype IoT Management System Complete reform of existing power sector in Kolkata and surrounding systems is a more complex process. In the long run as the system grows, vertical separation and competitive privatisation may be pursued together with the creation of a functioning of segmented micro-generation with renewable resources and horizontally splitting the generation segments. Creation of an effective smart infrastructure is also more urgent than unbundling the sector in smaller systems, though accounting separation may sometimes be desirable as in the present context of building a parallel mini-grid alike structure with IoT management system.
2.1 Post Cyclonic Situation Amid Covid-19 Focussing on the post-cyclonic situation in this region looking into daily newspaper headlines like ‘People demand restoration of power, water in Hooghly’: The Statesman, KOLKATA, Saturday 23th May 2020; ‘Irate south Kolkata residents put
A Prototype IoT Management System to Control Grid-Parallel …
735
up blockade—To protest against alleged ‘inefficiency’ and ‘ignorance’ of the authorities concerned in restoring electricity and water supply in wards under KMC’: The Sunday Statesman, KOLKATA, 24th May 2020; ‘Water & electricity still in a shambles in districts’: THE TELEGRAPH, CULCUTTA, 25th May 2020; ‘Power cuts leads to medical woes’: THE TELEGRAPH, CULCUTTA, 25th May 2020. This severe scenario refers a reform is utmost important.
2.2 Existing Electricity Distribution Structure The electricity distribution industry in Kolkata and surroundings is dominated by public and private sector utilities. The reforms should introduce to unbundle the vertically integrated utilities and to introduce private investment and mini-management for building a parallel mini-grid alike structure of below 5 kW, with renewable resources in localised areas, for essential services like water, medication, etc. The existing vertically integrated utilities are bundled with ‘generation’, ‘transmission’, ‘distribution’ and ‘consumers’, as depicted in Fig. 1. However, as in Fig. 2, a mini-grid alike structure of localised ‘generation’, unbundling the ‘transmission’ part and with localised ‘distribution’ can co-exist parallelly with the existing vertically integrated utilities. This will serve the consumers for their essential services. However, in some activities, ‘competition for the market’ and ‘management contracts’ can provide a partial role for the private micro-investors, where the investors may invest in housing complex projects, for renewable energy generation, to serve the consumers for their essential services and may collect revenues decentralised manner with area-wise mini-managements. As leading electricity generation industries are taking initiatives in newer dimensions;
Fig. 1 Orientation of existing vertically integrated utilities
736
S. Das et al.
Fig. 2 Proposed orientation of parallelly integrated utilities in mini-grid alike structure
‘NTPC, ONGC sign pact for renewable energy JV’: The Statesman, KOLKATA, Monday, 25 May 2020.
2.3 Proposed Grid-Parallel Distribution Structure A mini-grid alike structure based on organising the utilities into distinct dimensions and layers integrated with IoT management system inside, reducing transmission losses also, for the same amount of on-load utilities, can be proposed. In many housing society, economy and weak investment conditions do not favour full self-dependant of utilities. The proposed model is based on the concepts of enterprise innovation, emphasising external collaboration and partnerships, which can be operated via IoT-based mini-grid management system of the very housing society or as area-wise. This proposed model can also be applied to micro-cities, rural towns, hamlets, etc., as the energy sectors of both urban and rural areas are undergoing transformation due to new technologies such as communications, distributed generation (DGs) and smart grid in this post-COVID-19 new-normal era.
3 Prototype IoT Management System This IoT management system consists of three parts: first the prototype IoT integrated base device developed with Arduino UNO R3, which is to be attached with different relays to operate utilities of different house, when electronic signal is generated through its specific pins, second the signal device with HC-05 Bluetooth module, which will bridge the gap between Arduino UNO R3 package [2] and one’s personal
A Prototype IoT Management System to Control Grid-Parallel …
737
Android smartphone device, third the Android application named as ‘MiniGridManagementSytem _ SandipDas’, which will operate remotely from one’s personal Android smartphone device, to generate specific signal through Bluetooth of the device, as different soft switches are pressed.
3.1 IoT Integrated Base Device The IoT integrated base device is developed with Arduino UNO R3 package board, run by ATmega328p microcontroller. Primarily, this device is powered by the USB cable, although this can be done with a 9 V battery at specific power input point of this board. The output pin numbers D8 and D9 are connected to two LEDs through ground (mentioned as ‘GND’ on the board), to identify signal pulse for relay operation. This works satisfactorily, when operated remotely by soft switches. Provision to increase the number of relay connection is there, when connected with other digital and analogue output pins. For serial communication with the signal device, pin numbers 0 and 1 are used (mentioned as ‘RX’ & ‘TX’ on the board). The signal device HC-05, also powered by this board, connects with 3.3 V pin and GND (Fig. 3). The program to operate the base device is written on Arduino IDE, with ‘C’ language, as shown in Fig. 4. To engage more output pins into operation, a little modification can be done in this program. After compiling, this program is to be uploaded into the Arduino UNO R3 board with connected USB cable. Once uploading is completed, then it can be operated with a 9 V battery disconnecting the USB cable. The output of the base device can be observed as depicted in Fig. 5 with the help of serial monitor window of the Arduino IDE if the USB cable remains connected. For this prototype model, it can be observed that LEDs attached to pin numbers D8
Fig. 3 Base device developed with Arduino UNO R3, operating two output relay signal
738
Fig. 4 Arduino IDE for ‘MiniGridMngmnt’ program
Fig. 5 Arduino IDE ‘MiniGridMngmnt’ serial monitor window
S. Das et al.
A Prototype IoT Management System to Control Grid-Parallel …
739
Fig. 6 Signal device made with HC-05 Bluetooth module
and D9 are operating perfectly, where relays are to be connected for power switching between different utilities.
3.2 Signal Device Working as Bluetooth Module The signal device is made with HC-05 Bluetooth module. The ‘RXD’ and ‘TXD’ pins of this module are connected with ‘TX’ and ‘RX’ of base device, respectively, for serial communication between these two devices. The signal device is powered by connecting ‘VCC’ and ‘GND’ to ‘3.3 V’ and ‘GND’ of the base device, respectively, as shown in Fig. 6. When the signal device is connected and powered with the base device, these two devices will act integrally, as depicted in Fig. 7. A signal can be sent to the signal device through Android smartphone’s Bluetooth, searching by name and connecting with specified password. This application will act as a soft switching of the base device located remotely.
3.3 Android Application Named as ‘MiniGridManagementSytem_SandipDas’ The Android application named as ‘MiniGridManagementSytem_SandipDas’ formatted in Fig. 8 is the final user interface part of the system. This application is built dedicatedly for the prototype management system, where users may operate the base device remotely with its inbuilt soft switches, by installing the dedicated ‘.APK’ file [3, 4]. There are four different switches for operation of four different
740
S. Das et al.
Fig. 7 Signal device is connected and powered with the base device, and these two devices will act as integrated
Fig. 8 Screenshot of the dedicated Android application named as ‘MiniGridManagementSytem_SandipDas’, showing different switches
A Prototype IoT Management System to Control Grid-Parallel …
741
relays attached to base device, two of them are shown working in this model, where other two are kept reserve. For each of the switches ON/OFF operation, the Android program will send a particular data to the signal device through Bluetooth. Depending upon the data fetched by the signal device, the base device will act accordingly and will make or brake the connection of different relays associated with that.
4 Operating Principle In order to reduce the influence of manpower to control the distribution of localised renewable energy generation in housing complexes, this IoT management system may take initiative. A manager or caretaker can control the power utilities of different buildings, seating remotely at a separate zone and maintaining social distance and hygiene. The generated and storage power can be utilised wisely depending upon demand and necessity, at the time of power shortage or islanding situation, by only operating personal smart Android phone. The system has a great security due to password protection of the signal device using HC-05. The signal device may be kept at a height of 10 m (as the range of HC-05) for structural security, also. The base device may be kept at nearest location of different housing building’s power line or nearest to the localised renewable energy generation and distribution centre for more efficient operation of relays, only keeping an uninterrupted wired connection between base device and signal device. A future scope of different sensor integration organising [5] with this system is also possible, where the base device can take decision automatically for particular relay operation, sensing current consumption level of other housing buildings through power lines.
5 Timing Diagrams for Different Output Conditions An operating environment is created to examine the prototype system in different output conditions, by fetching real-time data. Feedback of outputs of the base device, i.e. the Arduino UNO R3 package, is taken through analogue input pins A1 and A3. Modifications as shown in Fig. 9 are made in the Arduino IDE ‘MiniGridMngmnt’ program, to generate timing diagrams of output signals. The modified program can read the output signals of pin numbers D8 and D9, through analogue input pins A1 and A3, respectively. In the serial plot window of the IDE, the voltage versus time plot can be seen, in real time. This serial plot window reflects the actual output scenario of the prototype system, as if examined by DSO. Primarily, two LEDs are connected in those signal pins D8 and D9. The connected red and blue LEDs are shown in Fig. 3 and in Fig. 7. For practical purpose, the LEDs will be replaced by relays to act accordingly. In the timing diagrams for different output conditions, the horizontal axis is considered for time scale, where 0.5 s is unity of the scale. The vertical axis acts as the
742
S. Das et al.
Fig. 9 Arduino IDE ‘MiniGridMngmnt’ program modification to generate timing diagrams of output signals in real time, through feedback inputs from analogue pins A1 and A3
voltage scale, and 1 mV is unity of the scale. The red-coloured waveform is indexed for the output connected to red LED and the blue-coloured waveform for blue LED. The base signals of different output pin signals are kept isolated in magnitude, for non-operating condition also. The base signal for blue LED is kept between 0 and 10 mV, whereas for the red LED between 100 and 150 mV, to avoid mismanagement of relay operation. In Fig. 10, the voltage versus time plot shows the base signal for non-operating condition of both the LEDs with some noise, where the blue waveform belongs between 0 and 10 mV and the red waveform oscillates between 100 and 150 mV. This plot is taken up to 50 s time span for non-operating condition of the IoT management system. Figure 11 of voltage versus time plot shows the operating condition of both the LEDs. When both the soft switches are triggered ON at 50 s, the blue waveform runs at 1 V region and the red waveform oscillates around 900 mV, where both the LEDs are in glowing mode. These voltages may trigger relay, when connected with those pins. In Fig. 12, the voltage versus time plot shows the OFF condition of both the LEDs, after operating for a time span of 50 s. At time 100 s, the soft switches are put OFF,
A Prototype IoT Management System to Control Grid-Parallel …
743
Fig. 10 Voltage versus time plot, when both switches remain OFF
Fig. 11 Voltage versus time plot, when both switches are put ON
where the blue waveform comes down between 0 and 10 mV and the red waveform oscillates between 100 and 150 mV. Figure 13 shows the operating condition of the red LED only. When only the soft switch of red LED is triggered ON at 50 s, the red LED comes under glowing
744
S. Das et al.
Fig. 12 Voltage versus time plot, when both soft switches are put OFF after running for specific time
Fig. 13 Voltage versus time plot, when soft switch of red LED operates ON and other remains inactive
A Prototype IoT Management System to Control Grid-Parallel …
745
Fig. 14 Voltage versus time plot, when soft switch of red LED operates OFF after running for specific time and other remains inactive
mode and the red waveform runs around 300 mV to 500 mV, which is quite enough to operate a relay, connected with that pin. The blue waveform remains inactive as around 0 mV. In Fig. 14, the voltage versus time plot shows simultaneously ON—OFF condition of the red LED, after operating from 50 to 100 s, and the soft switch dedicated for red LED is put OFF. Beyond the time scale of 100 s, the red waveform comes down at 150 mV level, which indicates stop of the relay operating signal. The blue waveform remains around 0 mV throughout the operation, indicating no signal to operate that particular relay. The output of different voltage signals discussed in this chapter remains in acceptable range, with negligible amount of noises, as shown.
6 Conclusion The paper proposed and implemented an IoT-based monitoring and control architecture for grid-parallel distribution of locally generated electricity for households, minimising human intervention and ensuring social distancing in the new-normal era. A prototype of the dedicated hardware required for the purpose has been developed and tested successfully. The proposed system requires an Android smartphone running a dedicated application to replace the existing manual switching requirements for mini- and micro-electrical grids and ensures reliable and consistent power supply with remote monitoring and control of switching devices. When instructions were fed from smartphones, using dedicated Android application developed
746
S. Das et al.
for the purpose, all the modules of the system such as IoT integrated base device, the signal device working as Bluetooth module and the Android application named as ‘MiniGridManagementSytem_SandipDas’ were found to be working together efficiently and consistently. Operating through Internet using Wi-Fi can be implemented at nominal additional cost, by an ESP8266 module, replacing the existing HC-05, in future. Few samples of real-time monitoring of local distribution of electricity have been demonstrated through real-time plots of timing diagrams under different operating conditions. The timing diagrams plotted under different operating conditions confirmed that the prototype model functioned in the intended manner, and its operation was efficient as well as reliable ensuring feasibility of real-time implementation.
References 1. Das, S.: A model based solar insolation utiliser for socio economic development. In: Computer, Communication, Control and Information Technology (C3IT), IEEE Conference 2015, pp. 1– 6, Print ISBN: 978-1-4799-4446-0, https://doi.org/10.1109/C3IT.2015.7060159. IEEE India (2015). 2. https://www.arduino.cc 3. https://www.android.com/intl/en_in 4. https://appinventor.mit.edu 5. Das, S.: Logically organised sensor based prototype model for automatic control of process temperature. In: Mandal J.K. et al. (eds.) Proceedings of Second International Conference: INDIA 2015, Information Systems Design and Intelligent Applications, vol. 2, pp. 629– 637; ISSN 2194-5357, ISSN 2194-5365 (electronic), Advances in Intelligent Systems and Computing, Vol. 340, 2015, pp. 629–637; ISBN 978-81-322-2246-0 (Print), ISBN 978-81322-2247-7 (eBook). https://doi.org/10.1007/978-81-322-2247-7_64. © Springer India 2015.
Author Index
A Abayomi, Abdultaofeek, 173 Agarwal, Reshu, 603 Agarwal, Shikha, 705 Agbesi, Samuel, 149 Agrawal, Saakshi, 719 Anoop, Ashley, 13 Arora, Upma, 613 Asha, N., 477 Asuquo, Daniel, 173
B Baig, M. A. K., 459 Bajaj, Shalini Bhaskar, 695 Bhakuni, Chandrashekhar, 321 Bhatia, Udbhav, 667 Bhatia, Varsha, 501 Bhatla, Shruti, 201 Bhatt, Ujjval, 321 Bulla, Chetan, 125
C Chakraborty, Niladri, 733 Chakraborty, Prabir, 321 Chatterjee, Subarna, 559 Chauhan, Arpit Kumar, 623 Chauhan, Pintu, 243 Chavan, Akshay, 125 Choubey, Dilip Kumar, 587, 667 Coble, Kyle, 571
D Darapaneni, Narayana, 321 Das, Ratan, 491 Das, Sandip, 733 De, Abhinandan, 733 Deepalakshmi, P., 213 Dhiman, Yashikha, 49 Dhumka, Ankur, 49 Dixit, Priyanka, 229 Doegar, Amit, 645 Dutta, Debashis, 333
E Eyoh, Imoh, 173
G Gandhi, H., 407, 419 Ganesh, Talari, 467 Garg, Arpan, 467 Garg, Sakshi, 633 Garg, Umang, 85 Gautam, Jyoti, 137 Gill, Rupali, 657 Gupta, Amit, 107, 201 Gupta, Neha, 85 Gupta, Praveen Kumar, 187 Gupta, Rajeev, 299 Gupta, Richa, 201 Gupta, Sakshi, 361 Gupta, Sonali, 33, 41 Gupta, Tanya, 603
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 T. K. Sharma et al. (eds.), Soft Computing: Theories and Applications, Advances in Intelligent Systems and Computing 1380, https://doi.org/10.1007/978-981-16-1740-9
747
748 H Hiran, Kamal Kant, 311
I Indira Gandhi, M. P., 477 Ipe, Navin K., 559
J Jaglan, Vivek, 375, 501, 695 Jain, Vivek, 321 Jauhar, Sunil Kumar, 63 Jayaraman, R., 523 Jeyakumar, G., 433 Joshi, Ashish, 49, 75
K Kalidass, K., 523 Karatangi, Shylaja VinayKumar, 603 Kashyap, Shobhana, 263 Kaswan, Kuldeep Singh, 501 Kathuria, Charu, 685 Katiyar, Sandhya, 161 Kaul, Sharang, 571 Kaur, Jaspreet, 349 Kaur, Ranjeet, 645 Khan, Rijwan, 349, 623 Khera, Preeti, 491 Kulkarni, Swati, 125 Kumar, Jitendra, 667 Kumar, Naveen, 447, 535, 547 Kumar, Neelesh, 491 Kumar, Sanjay, 161 Kumar, Vinay, 397 Kumawat, Sunita, 361, 501
L Lohani, M. C., 107
M Mahajan, Akanshu, 571 Mahrishi, Mehul, 311 Makkar, Priyanka, 23 Malhotra, Anshu, 23 Malik, Annu, 603 Malsa, Nitima, 137 Manchanda, Mahesh, 97 Mani, Ambica Prakash, 251 Manwal, Manika, 41 Mehrotra, Deepti, 633, 685
Author Index Misra, Navnit Kumar, 685 Morwal, Sudha, 311 Muskaan, 515
N Nankani, Hanisha, 311 Nyoho, Emmanuel, 173
P Paduri, Anwesh Reddy, 321 Pahari, Sushmit, 587 Pal, Manjish, 243 Pande, Akshara, 33 Pandey, Sujata, 633 Pant, Isha, 75 Potadar, Sneha, 125 Purohit, Kamlesh Chandra, 97 Purohit, Khamir, 321
Q Qayoom, Bazila, 459
R Raghul, S., 433 Rani, Komal, 535 Rani, Manju, 447 Ranjan, Prabhat, 705 Rath, Subhabrata, 333 Rawat, Charu, 49 Rishu, 657 Ritika, 49 Ruchika, 547 Rudra, Bhawana, 1, 13
S Saha, Sangeeta, 1 Sahoo, Ashok Kumar, 299, 515 Sarangi, Pradeepta Kumar, 299, 515 Sardana, Vikas, 321 Saxena, Somya, 491 Seem, Ankur, 623 Sehrawat, Harkesh, 375 Sequeira, A. H., 63 Shaheen, Hera, 705 Sharma, Pawan, 397 Sharma, Ruhi, 287 Sikka, Sunil, 23 Silakari, Sanjay, 229 Singh, Anuj, 97
Author Index
749
Singh, Avtar, 263 Singh, D., 407, 419 Singh, Gajendra Pratap, 361 Singh, H. P., 571 Singh, Jaiteg, 657 Singh, Neema, 1 Singh, Nipur, 613 Singh, Sunny, 515 Singh, Taranjeet, 349 Singh, Yudhvir, 375 Solanki, Napendra, 243 Srivastava, Sandeep Kumar, 161 Sudha, 375 Swathi, Mummadi, 13 Swati, 33, 695
Tyagi, Bhawana, 397
T Tanwar, Sonam, 287 Tomar, A., 407, 419 Tomer, Vikas, 41 Tripathi, Vikas, 201 Tripathi, V. M., 251 Tshering, 667
Y Yellapragada, Sowmya, 719 Yogita, 187
U Umoh, Uduak, 173 Upadhyaya, Gaurav Kumar, 645
V Vadivel, S. M., 63 Vadivukarasi, M., 523 Vaissnave, V., 213 Vikas, 397 Vincent, Helen, 173 Vyas, Vaibhav, 137
Z Zutti, Sourabh, 125