Multimedia Data Processing and Computing 9781032469317, 9781032488882, 9781003391272

This book focuses on different applications of multimedia with supervised and unsupervised data engineering in the moder

135 45 29MB

English Pages 197 Year 2023

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Cover
Half Title
Series Page
Title Page
Copyright Page
Contents
Preface
Editor Biographies
List of Contributors
Chapter 1: A Review on Despeckling of the Earth’s Surface Visuals Captured by Synthetic Aperture Radar
1.1. Introduction
1.2. Synthetic Aperture Radar (SAR)
1.2.1. SAR Geometry
1.2.2. Frequency Bands and Characteristics
1.2.3. Frequency Polarization, Penetration, and Scattering
1.3. Applications of Sar Visuals
1.4. Inevitable Challenges in Sar Imagery
1.5. Formulation of Sar Despeckling Problem
1.6. Sar Despeckling Methodologies
1.6.1. Filtration-based Techniques
1.6.2. Optimization-based Techniques
1.6.3. Hybrid Techniques
1.6.4. Deep Network–based Techniques
1.7. Comparative Analysis
1.8. Conclusion and Future Scope
References
Chapter 2: Emotion Recognition Using Multimodal Fusion Models: A Review
2.1. Introduction
2.2. Emotion Theories and Models
2.3. Emotion Recognition and Deep Learning
2.3.1. Facial Expression Recognition
2.3.2. Speech Emotion Recognition
2.3.3. Multimodel Emotion Recognition
2.4. Multimodal Emotion Recognition
2.4.1. Multimodal Emotion Recognition Combining Audio and Text
2.4.2. Multimodal Emotion Recognition Combining Image and Text
2.4.3. Multimodal Emotion Recognition Combining Facial and Body Physiology
2.4.4. Other Multimodal Emotion Recognition Models
2.5. Databases
2.5.1. Database Descriptions
2.6. Conclusion and Future Work
References
Chapter 3: Comparison of CNN-Based Features with Gradient Features for Tomato Plant Leaf Disease Detection
3.1. Introduction
3.2. Proposed System for Tomato Disease Detection
3.2.1. Local Directional Pattern
3.2.2. Histogram of Oriented Gradient (HOG)
3.2.3. Convolutional Neural Network-based Features
3.2.4. Support Vector Machine–based Classification
3.3. Experimental Analysis
3.4. Conclusion
References
Chapter 4: Delay-sensitive and Energy-efficient Approach for Improving Longevity of Wireless Sensor Networks
4.1. Introduction
4.2. The Internet of Things
4.3. Routing Protocol for Low-Power and Lossy Networks
4.4. Related Work
4.5. Energy and Time Efficiency Network Model
4.5.1. Energy Efficiency Network Model
4.5.2. Time Efficiency Network Model
4.6. Results and Analysis
4.7. Conclusion and Future Scope
References
Chapter 5: Detecting Lumpy Skin Disease Using Deep Learning Techniques
5.1. Introduction
5.2. Material and Methods
5.2.1. Dataset
5.2.2. Research Methodology
5.2.3. Parameter Tuning
5.2.4. Proposed Architecture
5.3. Model Evaluation and Results
5.3.1. Environment of Implementation
5.3.2. Description of the Model
5.3.3. Results and Evaluation
5.4. Conclusion and Future Work
Acknowledgments
References
Chapter 6: Forest Fire Detection Using a Nine-Layer Deep Convolutional Neural Network
6.1. Introduction
6.2. Literature Survey
6.3. Materials and Methods
6.3.1. About the Dataset
6.3.2. Proposed Methodology
6.4. Results and Discussion
6.5. Conclusion
References
Chapter 7: Identification of the Features of a Vehicle Using CNN
7.1. Introduction
7.2. Literature Review
7.2.1. Image Capture
7.2.2. Identification and Detection of Vehicle
7.2.3. Automatic License Plate Recognition
7.2.4. Vehicle Logo Recognition
7.2.5. Vehicle Model Recognition
7.2.6. Re-identification of a Vehicle
7.3. Conclusion
7.4. Open Research Areas
References
Chapter 8: Plant Leaf Disease Detection Using Supervised Machine Learning Algorithm
8.1. Introduction
8.2. Literature Survey
8.3. Proposed System
8.3.1. Leaf Disease Image Database
8.3.2. Image Preprocessing
8.3.3. Feature Extraction
8.3.4. Classification
8.4. Results
8.4.1. Analysis of the Qualitative Data
8.5. Conclusion
References
Chapter 9: Smart Scholarship Registration Platform Using RPA Technology
9.1. Introduction
9.2. Robotic Process Automation
9.3. Benefits of RPA
9.4. Background
9.5. Issues in Robotization of a Process
9.6. Tools and Technologies
9.6.1. RPA As a Rising Technology
9.6.2. Automation 360
9.7. Methodology
9.7.1. Implications in Existing System
9.7.2. Proposed System
9.7.3. Data Collection and Source File Creation
9.7.4. Creation and Structure of the Task Bot
9.8. Implementation
9.8.1. Setting User Credentials
9.8.2. Designing the Task Bot
9.8.3. Running the Task Bot
9.8.4. Implementing the Task Bot
9.8.5. Opening the CSV File
9.8.6. Launching the Scholarship Form
9.8.7. Populating the Web Form
9.8.8. Sending the E-mail
9.9. Results Analysis
9.10. Conclusion
References
Chapter 10: Data Processing Methodologies and a Serverless Approach to Solar Data Analytics
10.1. Introduction
10.1.1. Solar Thermal Energy
10.2. Literature Review
10.3. Data Processing Methodologies
10.3.1. Artificial Intelligence
10.3.2. Machine Learning
10.3.3. Deep Learning
10.4. Serverless Solar Data Analytics
10.4.1. Kinesis Data Streams
10.4.2. Kinesis Data Firehose
10.4.3. Amazon S3
10.4.4. Amazon Athena
10.4.5. Quicksight
10.4.6. F. Lambda
10.4.7. Amazon Simple Queue Service (SQS)
10.5. Conclusion
References
Chapter 11: A Discussion with Illustrations on World Changing ChatGPT – An Open AI Tool
11.1. Introduction
11.2. Literature Review
11.3. AI and ChatGPT
11.3.1. Code, Chat and Career: Pros and Cons of Using AI Language Models for Coding Industry
11.3.2. Jobs of Future: Will AI Displace or Augment Human Workers?
11.4. Impact of ChatGPT
11.4.1. Impact that ChatGPT Creates on Students
11.4.2. Impact that ChatGPT Creates on Teachers/Academicians
11.4.3. Impact that ChatGPT Creates on Parents
11.5. Applications of ChatGPT
11.6. Advantages of ChatGPT
11.7. Disadvantages of ChatGPT
11.8. Algorithms Used In ChatGPT
11.8.1. Illustration 1
11.8.2. Illustration 2
11.8.3. Illustration 3
11.9. Future of ChatGPT
11.10. Conclusion
References
Chapter 12: The Use of Social Media Data and Natural Language Processing for Early Detection of Parkinson’s Disease Symptoms and Public Awareness
12.1. Introduction
12.2. Literature Review
12.2.1. Parkinson’s Disease Detection and Diagnosis
12.2.2. Social Media and Health Research
12.2.3. Natural Language Processing in Health Research
12.2.4. Early Detection of Health Issues Using Social Media and NLP
12.2.5. Public Awareness and Health Communication
12.3. Methodology
12.3.1. Data Collection
12.3.2. Data Preprocessing
12.3.3. Feature Extraction
12.3.4. Machine Learning Models
12.4. Results
12.4.1. Data Collection and Preprocessing
12.4.2. Feature Extraction
12.4.3. Machine Learning Models
12.4.4. Evaluation Metrics
12.4.5. Performance Results
12.4.6. Feature Importance
12.5. Discussion
12.6. Conclusion
References
Chapter 13: Advancing Early Cancer Detection with Machine Learning: A Comprehensive Review of Methods and Applications
13.1. Introduction
13.2. Literature Review
13.3. Methodology
13.4. Results
13.5. Application of Research
13.6. Conclusion
References
Index
Recommend Papers

Multimedia Data Processing and Computing
 9781032469317, 9781032488882, 9781003391272

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Multimedia Data Processing and Computing This book focuses on different applications of multimedia with supervised and unsupervised data engineering in the modern world. It includes AI-based soft computing and machine techniques in the field of medical diagnosis, biometrics, networking, manufacturing, data science, automation in electronics industries, and many more relevant fields. Multimedia Data Processing and Computing provides a complete introduction to machine learning concepts, as well as practical guidance on how to use machine learning tools and techniques in real-world data engineering situations. It is divided into three sections. In this book on multimedia data engineering and machine learning, the reader will learn how to prepare inputs, interpret outputs, appraise discoveries, and employ algorithmic strategies that are at the heart of successful data mining. The chapters focus on the use of various machine learning algorithms, neural network algorithms, evolutionary techniques, fuzzy logic techniques, and deep learning techniques through projects, so that the reader can easily understand not only the concept of different algorithms but also the real-world implementation of the algorithms using IoT devices. The authors bring together concepts, ideas, paradigms, tools, methodologies, and strategies that span both supervised and unsupervised engineering, with a particular emphasis on multimedia data engineering. The authors also emphasize the need for developing a foundation of machine learning expertise in order to deal with a variety of real-world case studies in a variety of sectors such as biological communication systems, healthcare, security, finance, and economics, among others. Finally, the book also presents real-world case studies from machine learning ecosystems to demonstrate the necessary machine learning skills to become a successful practitioner. The primary users for the book include undergraduate and postgraduate students, researchers, academicians, specialists, and practitioners in computer science and engineering.

Innovations in Multimedia, Virtual Reality and Augmentation Series Editor: Lalit Mohan Goyal, J. C. Bose University of Science & Technology YMCA Rashmi Agrawal, J. C. Bose University of Science & Technology YMCA Advanced Sensing in Image Processing and IoT Rashmi Gupta, Korhen Cengiz, Arun Rana, Sachin Dhawan Artificial Intelligence in Telemedicine: Processing of Biosignals and Medical images S. N. Kumar, Sherin Zafar, Eduard Babulak, M. Afshar Alam, and Farheen Siddiqui Multimedia Data Processing and Computing Suman Kumar Swarnkar, J P Patra, Tien Anh Tran, Bharat Bhushan, and Santosh Biswas

Multimedia Data Processing and Computing

Edited by

Suman Kumar Swarnkar, J P Patra, Tien Anh Tran, Bharat Bhushan, and Santosh Biswas

© Designed cover image: Shutterstock MATLAB® is a trademark of The MathWorks, Inc. and is used with permission. The MathWorks does not warrant the accuracy of the text or exercises in this book. This book’s use or discussion of MATLAB® software or related products does not constitute endorsement or sponsorship by The MathWorks of a particular pedagogical approach or particular use of the MATLAB® software. First Edition published 2024 by CRC Press 2385 NW Executive Center Drive, Suite 320, Boca Raton FL 33431 and by CRC Press 4 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN CRC Press is an imprint of Taylor & Francis Group, LLC © 2024 selection and editorial matter, Suman Kumar Swarnkar, J P Patra, Tien Anh Tran, Bharat Bhushan, and Santosh Biswas individual chapters, the contributors Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. copyright law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, access www.copyright.com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978750-8400. For works that are not available on CCC please contact [email protected] Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used only for identification and explanation without intent to infringe. ISBN: 978-1-032-46931-7 (hbk) ISBN: 978-1-032-48888-2 (pbk) ISBN: 978-1-003-39127-2 (ebk) DOI: 10.1201/9781003391272 Typeset in Times by KnowledgeWorks Global Ltd.

Contents Preface.......................................................................................................................xi Editor Biographies.................................................................................................. xiii List of Contributors.................................................................................................xvii Chapter 1 A Review on Despeckling of the Earth’s Surface Visuals Captured by Synthetic Aperture Radar.................................................1 Anirban Saha, Suman Kumar Maji, and Hussein Yahia 1.1 Introduction................................................................................1 1.2 Synthetic Aperture Radar (SAR)................................................2 1.2.1 SAR Geometry..............................................................2 1.2.2 Frequency Bands and Characteristics...........................4 1.2.3 Frequency Polarization, Penetration, and Scattering................................................................5 1.3 Applications of Sar Visuals........................................................7 1.4 Inevitable Challenges in Sar Imagery........................................7 1.5 Formulation of Sar Despeckling Problem..................................8 1.6 Sar Despeckling Methodologies.................................................9 1.6.1 Filtration-based Techniques........................................ 10 1.6.2 Optimization-based Techniques.................................. 10 1.6.3 Hybrid Techniques...................................................... 11 1.6.4 Deep Network–based Techniques............................... 12 1.7 Comparative Analysis.............................................................. 14 1.8 Conclusion and Future Scope................................................... 16 References........................................................................................... 16 Chapter 2 Emotion Recognition Using Multimodal Fusion Models: A Review............................................................................................. 21 Archana Singh and Kavita Sahu 2.1 Introduction.............................................................................. 21 2.2 Emotion Theories and Models................................................. 22 2.3 Emotion Recognition and Deep Learning................................ 22 2.3.1 Facial Expression Recognition.................................... 22 2.3.2 Speech Emotion Recognition......................................24 2.3.3 Multimodel Emotion Recognition...............................24 2.4 Multimodal Emotion Recognition............................................25 2.4.1 Multimodal Emotion Recognition Combining Audio and Text............................................................25 2.4.2 Multimodal Emotion Recognition Combining Image and Text............................................................26 v

vi

Contents

2.4.3

Multimodal Emotion Recognition Combining Facial and Body Physiology........................................26 2.4.4 Other Multimodal Emotion Recognition Models......... 26 2.5 Databases.................................................................................. 27 2.5.1 Database Descriptions................................................. 27 2.6 Conclusion and Future Work.................................................... 29 References........................................................................................... 29 Chapter 3 Comparison of CNN-Based Features with Gradient Features for Tomato Plant Leaf Disease Detection........................................... 32 Amine Mezenner, Hassiba Nemmour, Youcef Chibani, and Adel Hafiane 3.1 Introduction.............................................................................. 32 3.2 Proposed System for Tomato Disease Detection...................... 33 3.2.1 Local Directional Pattern............................................34 3.2.2 Histogram of Oriented Gradient (HOG)..................... 35 3.2.3 Convolutional Neural Network-based Features.......... 35 3.2.4 Support Vector Machine–based Classification........... 36 3.3 Experimental Analysis............................................................. 37 3.4 Conclusion................................................................................ 41 References........................................................................................... 41 Chapter 4 Delay-sensitive and Energy-efficient Approach for Improving Longevity of Wireless Sensor Networks............................................. 43 Prasannavenkatesan Theerthagiri 4.1 Introduction.............................................................................. 43 4.2 The Internet of Things.............................................................. 43 4.3 Routing Protocol for Low-Power and Lossy Networks............44 4.4 Related Work............................................................................ 45 4.5 Energy and Time Efficiency Network Model........................... 49 4.5.1 Energy Efficiency Network Model.............................. 49 4.5.2 Time Efficiency Network Model................................. 51 4.6 Results and Analysis................................................................. 52 4.7 Conclusion and Future Scope................................................... 54 References........................................................................................... 54 Chapter 5 Detecting Lumpy Skin Disease Using Deep Learning Techniques...... 56 Shiwalika Sambyal, Sachin Kumar, Sourabh Shastri, and Vibhakar Mansotra 5.1 Introduction.............................................................................. 56 5.2 Material and Methods.............................................................. 57 5.2.1 Dataset......................................................................... 57 5.2.2 Research Methodology................................................ 58

vii

Contents

5.2.3 Parameter Tuning........................................................ 59 5.2.4 Proposed Architecture................................................. 59 5.3 Model Evaluation and Results..................................................60 5.3.1 Environment of Implementation.................................60 5.3.2 Description of the Model.............................................60 5.3.3 Results and Evaluation................................................60 5.4 Conclusion and Future Work.................................................... 63 Acknowledgments............................................................................... 63 References........................................................................................... 63 Chapter 6 Forest Fire Detection Using a Nine-Layer Deep Convolutional Neural Network................................................................................... 65 Prabira Kumar Sethy, A. Geetha Devi, and Santi Kumari Behera 6.1 Introduction.............................................................................. 65 6.2 Literature Survey......................................................................66 6.3 Materials and Methods.............................................................66 6.3.1 About the Dataset........................................................66 6.3.2 Proposed Methodology............................................... 67 6.4 Results and Discussion............................................................. 69 6.5 Conclusion................................................................................ 71 References........................................................................................... 71 Chapter 7 Identification of the Features of a Vehicle Using CNN....................... 73 Neenu Maria Thankachan, Fathima Hanana, Greeshma K V, Hari K, Chavvakula Chandini, and Gifty Sheela V 7.1 Introduction.............................................................................. 73 7.2 Literature Review..................................................................... 74 7.2.1 Image Capture............................................................. 75 7.2.2 Identification and Detection of Vehicle....................... 75 7.2.3 Automatic License Plate Recognition......................... 75 7.2.4 Vehicle Logo Recognition........................................... 76 7.2.5 Vehicle Model Recognition......................................... 77 7.2.6 Re-identification of a Vehicle...................................... 77 7.3 Conclusion................................................................................ 81 7.4 Open Research Areas............................................................... 81 References........................................................................................... 81 Chapter 8 Plant Leaf Disease Detection Using Supervised Machine Learning Algorithm............................................................................ 83 Prasannavenkatesan Theerthagiri 8.1 Introduction.............................................................................. 83 8.2 Literature Survey......................................................................84 8.3 Proposed System...................................................................... 85

viii

Contents

8.3.1 Leaf Disease Image Database..................................... 86 8.3.2 Image Preprocessing................................................... 86 8.3.3 Feature Extraction....................................................... 86 8.3.4 Classification............................................................... 89 8.4 Results...................................................................................... 91 8.4.1 Analysis of the Qualitative Data................................. 91 8.5 Conclusion................................................................................94 References...........................................................................................94 Chapter 9 Smart Scholarship Registration Platform Using RPA Technology....... 96 Jalaj Mishra and Shivani Dubey 9.1 Introduction..............................................................................96 9.2 Robotic Process Automation....................................................96 9.3 Benefits of RPA........................................................................ 98 9.4 Background...............................................................................99 9.5 Issues in Robotization of a Process........................................ 101 9.6 Tools and Technologies.......................................................... 102 9.6.1 RPA As a Rising Technology.................................... 102 9.6.2 Automation 360......................................................... 103 9.7 Methodology........................................................................... 105 9.7.1 Implications in Existing System................................ 105 9.7.2 Proposed System....................................................... 106 9.7.3 Data Collection and Source File Creation................. 107 9.7.4 Creation and Structure of the Task Bot..................... 107 9.8 Implementation....................................................................... 107 9.8.1 Setting User Credentials............................................ 108 9.8.2 Designing the Task Bot............................................. 108 9.8.3 Running the Task Bot................................................ 109 9.8.4 Implementing the Task Bot....................................... 110 9.8.5 Opening the CSV File............................................... 110 9.8.6 Launching the Scholarship Form.............................. 111 9.8.7 Populating the Web Form.......................................... 112 9.8.8 Sending the E-mail.................................................... 112 9.9 Results Analysis..................................................................... 112 9.10 Conclusion.............................................................................. 113 References......................................................................................... 113 Chapter 10 Data Processing Methodologies and a Serverless Approach to Solar Data Analytics.................................................... 116 Parul Dubey, Ashish V Mahalle, Ritesh V Deshmukh, and Rupali S. Sawant 10.1 Introduction............................................................................ 116 10.1.1 Solar Thermal Energy............................................... 117

ix

Contents

10.2 Literature Review................................................................... 117 10.3 Data Processing Methodologies............................................. 119 10.3.1 Artificial Intelligence................................................ 120 10.3.2 Machine Learning..................................................... 121 10.3.3 Deep Learning........................................................... 127 10.4 Serverless Solar Data Analytics............................................. 129 10.4.1 Kinesis Data Streams................................................ 130 10.4.2 Kinesis Data Firehose............................................... 131 10.4.3 Amazon S3................................................................ 131 10.4.4 Amazon Athena......................................................... 132 10.4.5 Quicksight................................................................. 132 10.4.6 F. Lambda.................................................................. 132 10.4.7 Amazon Simple Queue Service (SQS)...................... 133 10.5 Conclusion.............................................................................. 133 References......................................................................................... 133 Chapter 11 A Discussion with Illustrations on World Changing ChatGPT – An Open AI Tool........................................................... 135 Parul Dubey, Shilpa Ghode, Pallavi Sambhare, and Rupali Vairagade 11.1 Introduction............................................................................ 135 11.2 Literature Review................................................................... 135 11.3 AI and ChatGPT..................................................................... 136 11.3.1 Code, Chat and Career: Pros and Cons of Using AI Language Models for Coding Industry..................................................................... 137 11.3.2 Jobs of Future: Will AI Displace or Augment Human Workers?....................................................... 138 11.4 Impact of ChatGPT................................................................ 138 11.4.1 Impact that ChatGPT Creates on Students..................................................................... 139 11.4.2 Impact that ChatGPT Creates on Teachers/ Academicians............................................................ 140 11.4.3 Impact that ChatGPT Creates on Parents................. 140 11.5 Applications of ChatGPT....................................................... 141 11.6 Advantages of ChatGPT......................................................... 145 11.7 Disadvantages of ChatGPT.................................................... 146 11.8 Algorithms Used In ChatGPT................................................ 147 11.8.1 Illustration 1.............................................................. 147 11.8.2 Illustration 2.............................................................. 148 11.8.3 Illustration 3.............................................................. 150 11.9 Future of ChatGPT................................................................. 152 11.10 Conclusion.............................................................................. 152 References......................................................................................... 153

x

Contents

Chapter 12 The Use of Social Media Data and Natural Language Processing for Early Detection of Parkinson’s Disease Symptoms and Public Awareness..................................................... 154 Abhishek Guru, Leelkanth Dewangan, Suman Kumar Swarnkar, Gurpreet Singh Chhabra, and Bhawna Janghel Rajput 12.1 Introduction............................................................................ 154 12.2 Literature Review................................................................... 155 12.2.1 Parkinson’s Disease Detection and Diagnosis.......... 155 12.2.2 Social Media and Health Research........................... 155 12.2.3 Natural Language Processing in Health Research.................................................................... 155 12.2.4 Early Detection of Health Issues Using Social Media and NLP.............................................. 155 12.2.5 Public Awareness and Health Communication......... 156 12.3 Methodology........................................................................... 157 12.3.1 Data Collection.......................................................... 157 12.3.2 Data Preprocessing.................................................... 157 12.3.3 Feature Extraction..................................................... 157 12.3.4 Machine Learning Models........................................ 158 12.4 Results.................................................................................... 158 12.4.1 Data Collection and Preprocessing........................... 158 12.4.2 Feature Extraction..................................................... 158 12.4.3 Machine Learning Models........................................ 159 12.4.4 Evaluation Metrics.................................................... 159 12.4.5 Performance Results.................................................. 159 12.4.6 Feature Importance................................................... 160 12.5 Discussion............................................................................... 161 12.6 Conclusion.............................................................................. 161 References......................................................................................... 162 Chapter 13 Advancing Early Cancer Detection with Machine Learning: A Comprehensive Review of Methods and Applications.................. 165 Upasana Sinha, J Durga Prasad Rao, Suman Kumar Swarnkar, and Prashant Kumar Tamrakar 13.1 Introduction............................................................................ 165 13.2 Literature Review................................................................... 166 13.3 Methodology........................................................................... 169 13.4 Results.................................................................................... 170 13.5 Application of Research......................................................... 172 13.6 Conclusion.............................................................................. 172 References......................................................................................... 173 Index....................................................................................................................... 175

Preface Welcome to the world of Multimedia Data Processing and Computing! The field of multimedia has grown exponentially in recent years, and as a result, the demand for professionals who understand how to process and compute multimedia data has increased. This book aims to provide a comprehensive introduction to multimedia data processing and computing. It covers various aspects of multimedia data, including text, audio, images, and video, and provides practical knowledge on how to process and compute them. The book starts with an overview of multimedia data, including its types and characteristics. It then delves into the fundamental concepts of multimedia data processing, including multimedia data representation, compression, and retrieval. The book also covers advanced topics such as multimedia data analysis, multimedia data mining, and multimedia data visualization. The book is designed to be accessible to students, researchers, and professionals who are interested in learning about multimedia data processing and computing. The content is presented in a clear and concise manner, with plenty of examples and illustrations to help readers understand the concepts. One of the unique features of this book is its focus on practical applications of multimedia data processing and computing. Each chapter includes practical examples and case studies that demonstrate how to apply the concepts learned in realworld scenarios. Overall, we hope that this book will serve as a valuable resource for anyone who wants to learn about multimedia data processing and computing. We would like to thank our colleagues and students for their support and feedback during the development of this book. We hope you find this book informative and enjoyable to read. Editors Dr. Suman Kumar Swarnkar, Ph.D. Dr. J P Patra, Ph.D. Dr. Tien Anh Tran, Ph.D. Dr. Bharat Bhushan, Ph.D. Dr. Santosh Biswas, Ph.D.

xi

Editor Biographies Dr. Suman Kumar Swarnkar received a Ph.D. (CSE) degree in 2021 from Kalinga University, Nayaraipur. He received an M.Tech. (CSE) degree in 2015 from the Rajiv Gandhi Proudyogiki Vishwavidyalaya, Bhopal, India. He has more than two years of experience in the IT industry as a software engineer and more than six years of experience in educational institutes as assistant professor. He is currently associated with Chhatrapati Shivaji Institute of Technology, Durg, as assistant professor in the Computer Science and Engineering Department. He has guided more than five M.Tech. scholars and some of undergraduates. He has published and granted various patent Indian, Australian and other countries. He has authored and co-authored more than 15 journal articles, including WOS and Scopus papers and has presented research papers in three international conferences. He has contributed a book chapter published by Elsevier, Springer. He has life-time membership in IAENG, ASR, IFERP, ICSES, Internet Society, UACEE, IAOP, IAOIP, EAI, and CSTA. He has successfully completed many FDP, trainings, webinars, and workshops and also completed the two-week comprehensive online Patent Information Course. He has proficiency in handling teaching and research as well as administrative activities. He has contributed massive literature in the fields of intelligent data analysis, nature-inspired computing, machine learning and soft computing. Dr. J P Patra has more than 17 years of experience in research, teaching in these areas: artificial intelligence, analysis and design of algorithms, cryptography and network security at Shri Shankaracharya Institute of Professional Management and Technology, Raipur, under CSVTU Technical University, India. He has researched, published, and taught in this area for more than 17 years. He was acclaimed as the author of the books Analysis and Design of Algorithms (ISBN978-93-80674-53-7) and Performance Improvement of a Dynamic System Using Soft Computing Approaches (ISBN: 978-3-659-82968-0). In addition, he has more than 50 papers published in international journals and conferences. He is associated with IIT Bombay and IIT Kharagpur as a Remote Centre Coordinator since 2012. He is on the editorial board and review board of four leading international journals. In addition, he is on the technical committee board for several international conferences. He has life membership in professional bodies like CSI, ISTE, QCFI. He has also served in the post of Chairman of the Raipur chapter of the Computer Society of India, which is India’s largest professional body for computer professionals. He has served in various positions in different engineering colleges as associate professor and head. Currently, he is working with SSIPMT, Raipur as a Professor and Head of Department of Computer Science and Engineering.

xiii

xiv

Editor Biographies

Dr. Tien Anh Tran is an assistant professor in the Department of Marine Engineering, Vietnam Maritime University, Haiphong City, Vietnam. He graduated with a B.Eng. and M.Sc. in Marine Engineering from Vietnam Maritime University, Haiphong City, Vietnam. He received his Ph.D. degree at Wuhan University of Technology, Wuhan City, People’s Republic of China in 2018. He has been invited as a speaker and an organization committee member of international conferences in Australia, United Kingdom, Singapore, China, etc. He is an author/reviewer for international journals indexed in SCI/SCIE, EI. He is a leading guest editor and editorial board member for the following journals: Environment, Development and Sustainability (SCIE, IF = 3.219), IET Intelligent Transport System (SCIE, IF = 2.480), International Journal of Distributed Sensor Networks (SCIE, IF = 1.640), and International Journal of Renewable Energy Technology (InderScience). In 2021, he edited the Springer book The Impact of the COVID-19 Pandemic on Green Societies. His current research interests include ocean engineering, marine engineering, environmental engineering, machine learning, applied mathematics, fuzzy logic theory, multi criteria decision making (MCDM), maritime safety, risk management, simulation and optimization, system control engineering, renewable energy and fuels, etc. Dr. Bharat Bhushan is an assistant professor in the Department of Computer Science and Engineering (CSE) at the School of Engineering and Technology, Sharda University, Greater Noida, India. He received his undergraduate degree (B.Tech. in Computer Science and Engineering) with distinction in 2012, his postgraduate degree (M.Tech. in Information Security) with distinction in 2015 and doctorate degree (Ph.D. in Computer Science and Engineering) in 2021 from Birla Institute of Technology, Mesra, India. He has earned numerous international certifications such as CCNA, MCTS, MCITP, RHCE and CCNP. He has published more than 100 research papers in various renowned international conferences and SCIindexed journals, including Journal of Network and Computer Applications (Elsevier), Wireless Networks (Springer), Wireless Personal Communications (Springer), Sustainable Cities and Society (Elsevier) and Emerging Transactions on Telecommunications (Wiley). He has contributed more than 25 chapters in various books and has edited 15 books from well-known publishers like Elsevier, Springer, Wiley, IOP Press, IGI Global, and CRC Press. He has served as keynote speaker (resource person) at numerous reputed international conferences held in different countries, including India, Iraq, Morocco, China, Belgium and Bangladesh. He has served as a reviewer/editorial board member for several reputed international journals. In the past, he has worked as an assistant professor at HMR Institute of Technology and Management, New Delhi and Network Engineer in HCL Infosystems Ltd., Noida. He is also a member of numerous renowned bodies, including IEEE, IAENG, CSTA, SCIEI, IAE and UACEE. Dr. Santosh Biswas completed his B.Tech. in Computer Science and Engineering from NIT Durgapur in 2001. Following that he received M.S. (by research) and Ph.D. degrees from IIT Kharagpur in 2004 and 2008, respectively. Since then, he has been working as a faculty member in the Department of Computer Science and

Editor Biographies

xv

Engineering, IIT Guwahati for seven years, where he is currently an associate professor. His research interests are VLSI testing, embedded systems, fault tolerance, and network security. Dr. Biswas has received several awards, namely, Young Engineer Award by Center for Education Growth and Research (CEGR) 2014 for contribution to Teaching and Education, IEI young engineer award 2013–14, Microsoft outstanding young faculty award 2008–2009, Infineon India Best Master’s Thesis sward 2014 etc. Dr. Biswas has contributed to research and higher education. Dr. Biswas has taught more than ten courses at the B.Tech., M.Tech. and Ph.D. levels in IIT Guwahati, which is a premier institute of higher education in India. He has successfully guided two Ph.D., 22 M.Tech., and 25 B.Tech. students who are now faculty members at IITs and IIITs, undergoing higher studies abroad, or working in top multinational companies like Microsoft, Cisco, Google, Yahoo, etc. At present, he is guiding eight Ph.D. students and about ten B.Tech. and M.Tech. students. Apart from teaching in IIT Guwahati, Dr. Biswas has actively participated in helping new institutes in northeast India, namely, IIIT Guwahati, NIT Sikkim etc. He has also organized two AICTE QIP short-term courses for faculty members of different AICTE-approved engineering colleges. Further, he was in the organizing team of two international conferences held at IIT Guwahati. Dr. Biswas is the author of two NPTEL web open access courses, which are highly accessed by students all over the world. He has published about 100 papers in reputed international journals and conferences which have crossed 20 citations. Also, he is the reviewer for many such top-tier journals and conferences

List of Contributors Santi Kumari Behera Department of Computer Science and Engineering VSSUT Burla, India

Parul Dubey Department of Artificial Intelligence G H Raisoni College of Engineering Nagpur, India

Chandini Chavvakula MSc Forensic Science Kerala Police Academy Thrissur, India

Shivani Dubey Department of Information and Technology Greater Noida Institute of Technology Greater Noida, India

Gurpreet Singh Chhabra Assistant Professor Department of Computer Science and Engineering GITAM School of Technology GITAM Deemed to be University Visakhapatnam, India Youcef Chibani Faculty of Electrical Engineering University of Sciences and Technology Houari Boumediene (USTHB) Algiers, Algeria Ritesh V. Deshmukh Department of Computer Engineering JCET Yavatmal, India A. Geetha Devi Department of ECE PVP Siddhartha Institute of Technology Vijayawada, India Leelkanth Dewangan Assistant Professor G H Raisoni Institute of Engineering and Business Management Jalgaon, India

Shilpa Ghode Department of Information Technology G H Raisoni College of Engineering Nagpur, India Abhishek Guru Department of CSE Koneru Lakshmaiah Education Foundation, Vaddeswaram, India Adel Hafiane University of Orléans Bourges, France Fathima Hanana MSc Forensic Science Kerala Police Academy Thrissur, India Bhawna Janghel Assistant Professor Department of Computer Science and Engineering Rungta College of Engineering in Bhilai Durg, India xvii

xviii

List of Contributors

Hari K MSc Forensic Science Kerala Police Academy Thrissur, India

J. Durga Prasad Rao Additional Director and Academic Dean Shri Shankaracharya Mahavidyalaya Bhilai, India

Sachin Kumar Department of Computer Science and IT University of Jammu Jammu and Kashmir, India

Anirban Saha Department of Computer Science and Engineering Indian Institute of Technology Patna Patna, India

Ashish V. Mahalle Department of Computer Science and Engineering G H Raisoni College of Engineering Nagpur, India

Kavita Sahu Department of CSIS SRMU Barabanki, India

Suman Kumar Maji Department of Computer Science and Engineering Indian Institute of Technology Patna Patna, India Vibhakar Mansotra Department of Computer Science and IT University of Jammu Jammu and Kashmir, India Amine Mezenner Faculty of Electrical Engineering University of Sciences and Technology Houari Boumediene (USTHB) Algiers, Algeria Jalaj Mishra Department of Computer Science and Engineering Greater Noida Institute of Technology Greater Noida, India Hassiba Nemmour Faculty of Electrical Engineering University of Sciences and Technology Houari Boumediene (USTHB) Algiers, Algeria

Pallavi Sambhare Department of Information Technology G H Raisoni College of Engineering Nagpur, India Shiwalika Sambyal Department of Computer Science and IT University of Jammu Jammu and Kashmir, India Rupali S. Sawant Department of Computer Engineering JCET Yavatmal, India Prabira Kumar Sethy Department of Electronics Sambalpur University Burla, India Sourabh Shastri Department of Computer Science and IT University of Jammu Jammu and Kashmir, India Archana Singh Department of CSE SRMU Barabanki, India

xix

List of Contributors

Upasana Sinha Associate Professor at SoS E&T Guru Ghasidas Vishwavidyalaya (A Central University) Bilaspur, India Suman Kumar Swarnkar Department of Computer Science and Engineering Shri Shankaracharya Institute of Professional Management and Technology Raipur, India Prashant Kumar Tamrakar Assistant Professor Department of Computer Science and Engineering RSR Rungta College of Engineering and Technology Bhilai, India Neenu Maria Thankachan MSc Forensic Science Kerala Police Academy Thrissur, India

Prasannavenkatesan Theerthagiri Department of Computer Science and Engineering GITAM School of Technology GITAM Deemed to be University Bengaluru, India Gifty Sheela V MSc Forensic Science Kerala Police Academy Thrissur, India Greeshma K V MSc Forensic Science Kerala Police Academy Thrissur, India Rupali Vairagade Department of Information Technology G H Raisoni College of Engineering Nagpur, India Hussein Yahia Team Geostat, INRIA Bordeaux Sud-Ouest Talence Cedex, France

1

A Review on Despeckling of the Earth’s Surface Visuals Captured by Synthetic Aperture Radar Anirban Saha, Suman Kumar Maji, and Hussein Yahia

1.1 INTRODUCTION With the recent rapid growth of technology, the need for capturing, visualizing, and processing data from the Earth’s surface has emerged as an essential component of many important and pertinent scientific instruments that appear to have critical real-time applications. The vital initial phase of capturing these data is accomplished utilizing remote sensing technology. The term “remote sensing” indicates sensing certain data remotely (i.e., acquiring or interpreting a representation of the target data from a distant location without establishing any physical contact between the sensor and the data being recorded). In terms of capturing the Earth’s surface data, this can be redefined as sensing or interpreting a clear view of a predefined target region over the Earth’s surface utilizing sensors mounted on certain aerial devices or satellites. Apparently, the field of geo-scientific remote sensing aims to deal with sensing as well as surveilling the changes in geographical properties over a predefined region based on its application. Depending on the method of data gathering, the sensors, used to collect remote sensing data, can be roughly divided into two groups. One group includes passive sensors, which employ the optical wave that is reflected when an external light source, such as the sun, transmits light (Figure 1.1a). However, the use of this kind of sensors was limited since they were unable to provide high-quality real-time data from areas that faced away from the light source being used. Due to this significant constraint, the requirement to build sensors that could capture data over preset regions regardless of the local illumination opened the door for the creation of an additional class of sensors known as active sensors (Figure 1.1b). In order to offer a clear vision of the target location regardless of the current optical or meteorological circumstances, these active sensors employ radar technology of transmitting and receiving self-generated frequencies. These active sensors gained widespread acceptance within the technical and scientific communities working on remote sensing images as a result of their superiority over passive sensors. The effectiveness of these active sensors is determined by their effective aperture length. The data quality and target area coverage improve significantly as the DOI: 10.1201/9781003391272-1

1

2

Multimedia Data Processing and Computing

FIGURE 1.1  Illumination responsible for (a) active and (b) passive sensors.

aperture length of the radar rises. Real aperture radar (RAR) refers to active remote sensors that have a physical aperture length that is the same as their effective aperture length. The capability of these sensors is, however, constrained by the restrictions on extending the effective physical aperture length beyond a ceiling. In order to overcome this RAR sensor constraint, researchers developed the idea of using the motion of the radar-mounted vehicle to assist in extending the effective aperture length of these sensors. This would result in a regulated aperture length of a particular active remote sensing device as and when necessary. The evolution of these kinds of sensors eventually led to the creation of data-capture tools known as synthetic aperture radar (SAR), indicating the synthetic nature of their aperture length. Since these controlled sensors have such a wide range of capabilities, they are frequently utilized in a variety of applications to keep a watch on the surface of the planet. This chapter initially introduces various concepts influencing the data captured by SAR sensors in Section 1.2. Following this, in Section 1.3, it discusses the important applicability of the visuals captured by these sensors, thus establishing the importance of processing these data. In continuation, it deeply analyzes the inherent challenges while capturing these visuals in Section 1.4, thereby formulating a model representing the problem statement of SAR despeckling in Section 1.5. Later, in Section 1.6, the chapter discusses the developing history of despeckling techniques, and in Section 1.7, it compares the results of certain well-adopted approaches. In Section 1.8, the chapter concludes with analyzing scopes of future development.

1.2  SYNTHETIC APERTURE RADAR (SAR) SAR sensors generate high-range microwave frequencies, which transmit and recollect the dispersed energy utilizing the motion of the self-mounting device. Processing the time latency and the amplitude or the strength of these captured energy provides a high-quality visual pattern of the targeted surface.

1.2.1  SAR Geometry To provide a visual reference of the target area, SAR makes use of the sidelooking airborne radar (SLAR) technology [1, 2]. In accordance with what the

Despeckling of the Earth’s Surface Visuals by Synthetic Aperture Radar

3

FIGURE 1.2  Basic data capturing model of SLAR technology. (Courtesy: FJ Meyer, UAF. [2])

name implies, this technology captures visual representations of the target area while it is positioned below the radar at an angle to the horizontal plane and in the direction of the ground range, which is perpendicular to the azimuth direction or the flight direction (Figure 1.2). While in motion, the radar sensor with antenna length L transmits and receives a series of short pulses of microwave frequencies across a maximum slant distance R from the ground. Defining the characteristics of the transmitted frequencies, the standard symbols t and ß signify the corresponding wavelength, pulse length, and bandwidth, respectively. When the antenna length is taken into account, the mathematical dependence represented by equation 1.1 determines the wavelength-bandwidth dependency of the transmitted pulses.

β=

λ (1.1) L

Evidently, a thorough understanding of the different SAR image resolutions is necessary to further grasp the characteristics of a SAR-acquired visual. Slant Range Resolution: The radar sensors capture the reflected pulses and the related delay while receiving them. The pulses that are received sooner are assumed to have been reflected from a nearer surface than the pulse rates that are received later. As a result, a minimum separation of half the pulse duration is required to clearly discriminate between objects at various places across the surface along the

4

Multimedia Data Processing and Computing

swath area of range direction, demonstrating a slant range resolution as indicated by equation 1.2, where c stands for the speed of light.

δ R  = 



cT (1.2) 2

Ground Range Resolution: When the slant angle θ i and the slant range resolution are taken into account, it depicts the actual spatial resolution across the ground surface. This is estimated by equation 1.3 and denotes the smallest real ground separation between objects having a distinct discernible component in the recorded visual.

VG   =  

δR cτ − (1.3) sin θ i 2 sin θ i

Azimuth Resolution: While in motion, the sensor scans the ground surface toward the direction of motion. Due to this, the same object is expected to be recorded multiple times over a particular displacement. This displacement of the radar system between the position at which it initially scans a particular object and the position of its final encounter of the same object is termed azimuth resolution. It also demonstrates the synthetic aperture length of the SAR sensor. This can be mathematically represented by equation 1.4, which varies along the swath width with change in slant range R.

λ δ Az   =     R   =   β R (1.4) l

Spectral Resolution: It represents a sensor’s ability to record the amount of spectral information of the captured visuals. In other words, it represents the range of clearly discernible bands of finer wavelength that the sensor employs to record different spectral characteristics of the targeted ground visual. Radiometric Resolution: It shows how much distinct information may be recorded in the tiniest chunk of the captured visual. The amount of levels or bits needed to express a single pixel value in the provided visual might be used to signify this. Typically, it reflects the exponent raised to the power of 2. Temporal Resolution: It represents the minimum time interval required by a satellite-mounted radar system to capture the same geographical location.

1.2.2  Frequency Bands and Characteristics SAR sensors make use of a range of frequency bands depending on the applicability of the captured visuals. Among these, a predefined number of frequency bands are observed to be used most frequently. So these bands are given a global identity for easy reference. The detailed properties of these bands are listed below. Ka Band: The radar sensors that utilize this band identity normally deal with pulse frequencies ranging from 27 GHz to 40 GHz, and the signal is expected to possess a wavelength between 1.1 cm and 0.8 cm. This band is rarely used in SAR sensors with certain exception.

Despeckling of the Earth’s Surface Visuals by Synthetic Aperture Radar

5

K Band: The pulse frequencies that are dealt with by the radar sensors that use this band typically range from 18 GHz to 27 GHz, and the signal is anticipated to have a wavelength between 1.7 cm and 1.1 cm. It is a rarely used band. Ku Band: Radars operating at this band are expected to transmit pulses with frequency ranging from 12 GHz to 18 GHz, with corresponding wavelength between 2.4 cm and 1.7 cm. It is also a rarely used band while concentrating on SAR sensors. X Band: This band deals with signals having frequencies within the range of 8G Hz to 12 GHz and wavelengths between 3.8 cm and 2.4 cm. This forms the basic operating band for radars such as TerraSAR-X, TanDEM-X, COSMO-SkyMed, and PAZ SAR. C Band: This band is believed to be used by sensors that can process signals with frequencies between 4G Hz and 8G Hz and wavelengths between 7.5 cm and 3.8 cm. Radars such as ERS-1, ERS-2, ENVISAT, Radarsat-1, Radarsat-2, Sentinel-1, and RCM operate within this band. S Band: It is claimed that devices using this band process signals having wavelengths between 15 cm and 7.5 cm and frequencies between 2 GHz and 4 GHz. This band has very little but rapidly increasing usage in SAR systems. L Band: This band is stated to be used by radars having processing capabilities for signal components with frequencies ranging from 1 GHz to 2 GHz and wavelengths between 30 cm and 15 cm. This band is mainly used by radars that provide free and open access to its data. These radars include Seasat, JERS-1, ALOS-1, ALOS-2, PALSAR-2, SAOCOM, NISAR, TanDEM-L, etc. P Band: Radars are said to be operating in this band if they work with frequency pulses having a wavelength between 100 cm and 30 cm and a frequency between 0.3 GHz and 1 GHz. Utilization of this band can be observed by the BIOMASS radar.

1.2.3  Frequency Polarization, Penetration, and Scattering SAR sensors visualize surface regions by measuring the strength and latency of the received signal, which corresponds to the self-transmitted series of frequency pulses. This enables construction of SAR systems with controlled polarization of both transmitting and receiving signals. These radars are often built in a linearly polarized mode with fixed orientation along the propagation channel, resulting in control over transmitting or receiving either horizontally or vertically polarized wave patterns. This leads to a design with four possible polarized configurations. 1. HH-polarization: Transmits horizontally polarized signals and records the encountered horizontally polarized signals. 2. VV-polarization: Transmits vertically polarized signals and records the encountered vertically polarized signals. 3. HV-polarization: Transmits horizontally polarized signals and records the encountered vertically polarized signals. 4. VH-polarization: Transmits vertically polarized signals and records the encountered horizontally polarized signals.

6

Multimedia Data Processing and Computing

Based on the capability to interact with these polarized wave-forms, SAR sensors are broadly categorized into four types that include the following: 1. Single-pol SAR systems: These systems can record signals with the same polarity configuration utilized while transmitting the same. 2. Cross-pol SAR systems: These devices are capable of transmitting and simultaneously recording signals with opposite polarity configurations. 3. Dual-pol SAR systems: Single polarity configuration is used while transmitting, but the system is capable of recording both horizontally or vertically polarized signals. 4. Quad-pol SAR systems: These systems are capable of transmitting and recording signals making use of all four polarity configurations. Recent SAR sensors are mainly designed either in dual-pol or quad-pol configurations in order to record detailed polarimetric properties of the incident object. Apart from this frequency polarity configuration, the dielectric properties of the signal traversing medium also play a vital role in recording a distinct object. This is because they directly affect the penetration ability of the corresponding signal. The penetrating depth (ζp), by a signal with wavelength traversing through a medium with relative complex dielectric coefficient can be quantified as given in equation 1.5.

ζ p  ≈ 

λ p { Re (η )}

2 π  { Im (η )}

(1.5)

Additionally, the return signal is influenced by how rough the impact surface is. Smooth surfaces cause signal loss by reflecting the whole incident signal in an opposite direction, which prevents the recording of any distinguishable echoed signal. The intensity of the recorded signal at a particular polarization also rises when the affected surface’s roughness increases due to a process known as signal scattering. Equation 1.6 can be used to indicate this fluctuation in the recording of the intensity of polarized signal, suggesting a rough surface.

IV  V > I HH > I HV or IV   H (1.6)

Similar to this, the transmitted signal experiences a twofold bounce scattering effect when there is a rapid change in altitude above the targeted surface. This is also reflected by the relationship between received intensities corresponding to different polarized configurations as indicated by equation 1.7.

I HH > IV  V > I HV or IV   H (1.7)

In addition to these two obvious scattering effects, the transmitted frequency pulse may experience volumetric scattering as a result of the signal’s repeated bouncing off a large structure over the targeted area. In this case, the cross-polarized signal’s measured intensities predominate over those of the single-polarized signal.

Despeckling of the Earth’s Surface Visuals by Synthetic Aperture Radar

7

1.3  APPLICATIONS OF SAR VISUALS SAR data are now widely recognized and used in a range of scientific applications [2]. There are applications in military surveillance, intelligence gathering, etc., that are incredibly secure and private. They are also employed by the positioning systems of ships and airplanes in order to inform the relevant authorities. In addition, these data are useful for tracking changes in a specific geophysical site, which in turn supports a wide variety of applications such as tracking the rate of urbanization, the density of forest cover, changes in environment, the rate of ice cover melting, and many more. Apart from these, SAR data are widely used in applications like global mapping, biomass mapping, vegetation cover analysis, agricultural monitoring, maritime navigation, and the list continues. All of these applications use specific band frequency configuration sensors as their primary data gathering method. The Ka, K, and Ku bands are thought to function at very high frequencies, restricting their use on a large scale due to the negative effects of operating frequencies in this range. Sensors that operate in these bands are used for delicate tasks like airport surveillance and a few sporadic but essential military requirements. The high-resolution SAR visuals produced by SAR systems operating in the X band frequency are mostly used for monitoring-based applications. SAR systems operating in the C band are also used for applications like mapping and rate of change detection. In contrast, SAR sensors that utilize S, L, and P bands provide visuals with medium resolution and are used for applications that require data acquired from highly penetrative zones.

1.4  INEVITABLE CHALLENGES IN SAR IMAGERY Despite all of these advantages and potentialities, SAR systems have several inherent flaws that lower the quality of the data that is acquired. This cannot be avoided entirely; rather, these issues must be addressed effectively while processing SAR visuals. Some of the most prevalent issues that must be taken into consideration before dealing with this data are highlighted, along with their primary causes. Geometric Distortion: Regardless of the type of sensor being utilized, this is one of the most frequent difficulties with remote sensing imagery. This issue is caused by a variety of factors, including the sensor-mounted system’s constant changes in location, altitude, and perspective. Due to timing variations, clock synchronization, and drift, measurement equipment also introduces distortion. The emergence of this difficulty is also caused by some inherent geophysical and atmospheric factors. System Non-linear Effects: The main factor introducing this kind of deterioration is any uncontrolled condition impacting the impulse response function of the contributing frequency. A few of these variables include the atmospheric thermal effect and low-level image representation inaccuracy. Range Migration: This issue develops as a result of any motion occurring during the recording of visuals in the azimuth direction of the radar system, which causes the unintended recording of a target object’s hyperbolic-shape reflectance. These motions might be caused by the rotation of the Earth, the movement of a mounted

8

Multimedia Data Processing and Computing

FIGURE 1.3  Example of speckled SAR data from TerraSAR-X. (Courtesy DLR. [3])

radar system, or the movement of the target object in the direction of the synthetic aperture. Speckle Introduction: After interacting with the target site, high-frequency radar waves are recorded by SAR devices. Due to dispersion of the same transmitted signal from nearby locations during the signal’s return phase, this signal is very susceptible to interference. This unwelcome signal interference impact might be seen as either constructive or destructive in nature. The additive salt-and-pepper noisy pattern is introduced in the collected frequency recordings due to this dual nature of frequency interference. When these corrupted recorded signals are converted to the corresponding visual representation, this additive component results in a multiplicative speckle pattern. This appears to have an impact on the recorded image quality by introducing a granular-structured cover all over the visual (Figure 1.3).

1.5  FORMULATION OF SAR DESPECKLING PROBLEM The superimposition of an apparent granular structured view over a raw SAR visual largely contributes to its corrupted representation, which needs to be carefully processed and removed. Upon thorough analysis of the contaminating nature of these unwanted component, it is observed to affect each pixel information of the target visual approximately following a mathematical multiplicative model [2, 4–6]. Considering this observation, the dependency of the target visual over the raw captured visual is most likely expected to take a mathematical form as provided by equation 1.8.

I   =  U   ×   S (1.8)

The variables I, U, and S, as used in equation 1.8, symbolize the captured raw SAR visual, the real intended SAR visual, and the unwanted speckle distribution,

Despeckling of the Earth’s Surface Visuals by Synthetic Aperture Radar

9

respectively. With detailed study about the property of these corrupting speckle structures, it is estimated to resemble similar properties as gamma distribution [2, 4]. Therefore, the problem of removing these components makes an initial assumption that denotes that the components, targeted to be removed, follow a mathematical distribution as stated by equation 1.9.

ρS   =  

LLnL − 1exp exp  ( − nL )   ;  n   ≥  0,  L   ≥  1  (1.9) Γ (L)

The function Γ(L), in equation 1.9, represents gamma distribution whereas the parameter L indicates the parameter controlling the corruption level or the speckle level, and is termed Look. It is also observed by various studies [2] that the level of corrupting speckle components can be directly determined using a small homogeneous segment of the captured raw visual. The statistical model determining this speckle level parameter is expressed by equation 1.10.

L  = 

p− (1.10) σp

The variable p, as used in equation 1.10, is a representation of a small homogeneous segment of the captured corrupted data. The ratio between its statistical mean and standard deviation approximately denotes the speckle level. The removal of these statistically related speckle components from the captured raw visuals, thereby predicting the real visual, is termed a “despeckling problem”. In an ideal case, it aims to design an optimal mathematical function that is capable of mapping the real visual U from the corrupted captured visual I, eliminating the unwanted component S. Now, considering the fact that the pixel data once lost due to the introduction of these unwanted granular speckle structures can never be regenerated but can be approximated, the actual derivation of the real visual is no way possible by any mathematical model. Therefore, the problem statement of any SAR despeckling model intends to design a mathematical model that can map a given captured raw SAR visual I to an intended SAR visual U which denotes a best approximation of the real visual U.

1.6  SAR DESPECKLING METHODOLOGIES Numerous SAR despeckling solutions have been created during the past five decades with an understanding of the significance of despeckling SAR visuals. The works [7, 8] were among the first solutions to this issue. Later, at the start of the 1980s, this subject experienced considerable progress. Since then, models making notable advances in this area have been well documented in the literature. These well-documented SAR despeckling models can be broadly categorized into a number of different variants based on the fundamental methods applied to achieve the desired results.

10

Multimedia Data Processing and Computing

1.6.1  Filtration-based Techniques The majority of the ground-breaking research in the area of SAR despeckling makes use of filtration-based techniques that aim to create the best filter possible in order to eliminate undesirable speckle components. Based on this, numerous filters were constructed, among which the filter illustrated in [9] has achieved large acceptability and is dubbed the “Lee Filter”. This filter succeeds in despeckling homogenous portions of the recorded visual but fails when trying to do so with heterogeneous parts. In light of this, an analogous filter called the “Enhanced Lee Filter” [10] was created with the capacity to despeckle both homogeneous and heterogeneous areas. A new filter was afterward introduced that made the claim to be adaptive [11]. It is referred to as the “Frost Filter” and derives filter parameter values using a locally calculated mean and variance. The Kuan Filter [12] was introduced as another improvement of the Lee filter using the adaptive non-stationary mean and non-stationary variance (NMNV) model. Now it has been shown that all of these well-known filters were founded on a substantial local variance coefficient, making them inefficient when applied to data with a considerable shift in regional variation. The article [13] that outlines the use of controlled filtering addresses this. A rapid advancement focused on despeckling remote sensing visuals was later observed with the start of 21st century. The filtration technique, as demonstrated in [14], has provided considerable improvement in the outcome. It is based on an extended non-local means (NLM) approach with iterative assignment of values to its filter weights considering statistical correlation among similar noisy patches and its corresponding estimates. Another such improvement was illustrated in [15], which is likewise based on an NLM-filtration technique with the introduction of discontinuous adaptive filter parameters derived from the processed outcome of importance sampling unscented Kalman filter. Moreover, article [16] demonstrates a two-phase approach for despeckling SAR visuals. The initial phase is mainly responsible for despeckling tasks, achieved by hard thresholding of directionally smoothened outcomes, whereas the final phase is targeted toward the enhancement of these processed visuals utilizing hybrid Gaussian-Laplacian filter. Although these filtration techniques resulted in substantial gains, there appeared to be some significant downsides. The main one is over-smoothing throughout the processed output, which results in substantial blur, compromising on parts of crisp edge information. Aside from that, the presence of blocking artefacts seems to have been introduced along the edge information. Furthermore, the despeckled result retains a minor amount of speckle component as well as an undesired introduction of ghost texture.

1.6.2 Optimization-based Techniques Alongside filtration-based SAR despeckling approaches, several other techniques were developed aiming to derive and optimize a mathematical function based on a prior knowledge about the contaminating speckle distribution. The authors of [17] pioneered such techniques demonstrating the hidden optimization function in all well-known filtration-based methods that makes use of a variation coefficient.

Despeckling of the Earth’s Surface Visuals by Synthetic Aperture Radar

11

With this, an optimization function employing gradient magnitude locally and Laplacian operators capitalizing on instantaneous variation coefficient was derived. This function is optimized using edge-preserving anisotropic diffusion to generate a clear despeckled outcome. The article [18] defines a method that requires initial translation of multiplicative nature to additive nature of the unwanted components, which is achieved by linear decomposition, thereby deriving a wavelet shrinkage optimizing function based on maximum a posteriori criteria and utilizing adaptive thresholding over bidirectional gamma distribution in wavelet shrinkage domain. Subsequently, the method proposed in [19] utilizes directionlet transform coefficients while deriving the approximation function that is to be optimized. These coefficients are processed with the help of bivariate distributions considering statistical dependence among adjacent coefficients. Also the noise-free coefficients are shown to be estimated using non-linear thresholding over maximum a posteriori estimator-based model outcomes. The document, recorded as [20], enhances a detailpreserving anisotropic diffusion-optimizing equation by incorporating a controlled statistical curvature motion as derived by an adaptive coupling function. The authors of [21] proposed a despeckling method utilizing empirical mode decomposition with threshold function computed over a fixed-size sliding window, thereby preserving edge information discarding the speckle contamination. Parallelly, the article [22] utilized sparse representation in a shearlet domain along with a Bayesian approach. This led to the introduction of a model that processes shearlet-transformed visuals utilizing a Bayesian-based iterative algorithm over sparse representation. The despeckling approach, proposed in [23], described the performance improvement achieved by employing a dual formulation–based adaptive total variance regularization technique making use of a wavelet-estimated contaminating speckle level. In [24], the authors discuss the SAR despeckling problem as an multi-objective optimization problem and utilize a particle swarm optimization approach to estimate the desired outcome. While modelling this approach, the derived objective function considered two counteractive reference matrices as well as two counteractive nonreference matrices as its component. The authors of [25] came up with an approach that aims to despeckle SAR visuals utilizing gradient-restricted features extracted from the estimation of multifractral analysis. The article [26] translated the despeckling problem as an optimization problem approximating a non-linear hyperbolic– parabolic equation coupled with a partial differential equation, thereby deriving an effective despeckling approach. Though these techniques adopted several measures to overcome the limitations of filtering- based approaches, they were successful to an extent in minimizing those limitations. But there existed over-smoothing trials in heterogeneous areas along with limited introduction of ghost texture. On the other hand, the majority of these methods resulted in considerably efficient outcomes.

1.6.3 Hybrid Techniques With independent development of filtering and optimization-based despeckling approaches, a number of researchers tried to combine these concepts to model an optimal SAR despeckling approach. After analyzing the advantages and disadvantages

12

Multimedia Data Processing and Computing

of several such methods, they tried to utilize the correlated strengths and weaknesses of these approaches. The document [27] illustrates a well-known SAR despeckling technology, often titled SAR-BM3D, which utilizes three major sequential computations. Initially, it groups similar image patches based on an ad-hoc similarity criterion analyzing the multiplicative model. Then it applies collaborative filtering strategy over these three-dimensional groups, each representing a stack of the computed similar patches. This filtering is based on a local linear minimum mean square error solution in wavelet domain. Later these filtering estimations are rearranged and aggregated to approximate the original data. The article, referenced as [28], investigates the process of utilizing more than one complementary approach in a controlled fashion. This is regulated thoroughly by a proposed soft classification model based on the homogeneity estimates of different regions. However, the authors of [29] came up with the implementation of a SAR despeckling methodology that integrates Bayesian non-local means in the computation of optimized filter parameter of a generalized guided filter. This is designed mainly to avoid large degradation in preserving minute texture information captured by the data. In [30], the author demonstrates an efficient two-state SAR despeckling method implemented over logarithmic space. The first state focuses on heterogeneous-adaptive despeckling utilizing a maximum a-posteriori estimator in a complex wavelet domain, while the second state tackles the smoothing of the homogeneous regions with the help of a local pixel grouping– based iterative principal component analysis approach. These models’ performance results are somewhat better than those of several filtration- or optimization-based methods. However, due to their complexity and inability to completely eliminate all of the flaws in the derived models, these models fall short of their intended goals. Despite this, several of these models have gained widespread acceptance just as a result of appreciable visual outcome enhancements.

1.6.4 Deep Network–based Techniques The introduction of deep learning, with the performance of AlexNet convolutional neural network (CNN) in the 2012 ImageNet classification challenge [31], has promised the solution to almost every image-processing task. Thereafter, techniques based on deep learning were experimented thoroughly by scientific community even for addressing image denoising tasks [32–35]. With the continuous increase in availability of an apparently large quantity of SAR data and the rapid rush of adapting deep-learning techniques, a SAR despeckling model based on deep-learning techniques was pioneered by the works [36, 37] in 2017. This maps the noisy data to the clean data with an encoder–decoder formation of a convolutional model having skip connections between corresponding encoder and decoder layers. In subsequent months, another similar approach was proposed by the authors of [38], which is trained to filter the unwanted speckle component from the raw captured SAR visual extracting the residual. This residual is then processed with the same input to approximate the corresponding despeckled visual. Later, the work [39] also considered similar residual-learning strategy incorporating the power of non-linearity using intermediate-structure skip connections and the dilated convolutional framework. Now, during the timeline of 2019 and 2020, rapid improvements were recorded

Despeckling of the Earth’s Surface Visuals by Synthetic Aperture Radar

13

by despeckling techniques based on this approach. The authors of [40] proposed a convolutional model–based despeckling approach with the novelty of deriving a cost function considering the optimal trade-off between SAR despeckling performance and minute detail preservation captured in the visual. Parallelly another work, [41], was documented that aimed to improve non-local SAR despeckling using convolutional architecture. In contrast to architectural development, the authors of the article [42] proposed a normal sequential ten-layer deep-network architecture with the major focus channeled toward the development of cost function that considers both spatial consistency and statistical properties of the real data and noise component, respectively. Moreover article [43] demonstrated the use of a pre-trained additive noise removal convolutional model in eliminating multiplicative speckle component using non-linear correction and processing data in logarithmic domain. It also compared the use of various such approaches in despeckling with and without considering the iterative non-linear correction module to establish the importance of this module. The authors of [44] provided major focus in retaining minute textural information by an initial module predicting a texture-level mapping which is then utilized by the despeckling network to analyze the homogeneous and heterogeneous regions, thereby removing the speckle component. In article [45], the author indicated that any deep-learning despeckling approaches require a large set of noisy and clear images to train the network parameters, but having a complete speckle-free clear remote sensing visual is very rare. Therefore, they came up with an approach of training a residual deep-learning network based on the loss computed over the processing results of consecutive visuals captured in immediate time interval. This has resulted a drastic improvement in performance which is analyzed in detail. Later, the work [46] records the improvement in processing these speckled data using a Fisher–Trippett denoising model in logarithmic domain. A detailed visual and quality comparison has shown the efficacy of this model (Figure 1.4).

FIGURE 1.4  Visual performance analysis over simulated data.

14

Multimedia Data Processing and Computing

This rate of introducing and analyzing deep learning–based SAR despeckling approaches has continued during the past two years as well. The authors of [47] tried to mimic the characteristics of the variational approach by introducing a datafitting block which is then followed by a deep convolutional prior network. This model was trained in an iterative optimizing fashion in order to efficiently fine-tune its parameters. Article [48] focuses mainly on deriving an efficient multi-objective cost function considering the spatial and statistical properties of these data. This cost function balances three independent characteristics that include spatial dependence within the captured data, statistical properties of the unwanted speckle component, and scattered identification. Experiments have shown that the utilization of this cost function while training these deep-learning despeckling approaches has resulted in significant improvement of the result. Parallelly, the authors of article [49] demonstrated a unique despeckling network that is made to extract both clean and noisy components, thereby merging them to reconstruct the input. So, an integration of two loss functions with regularization terms is utilized during training. One of these indicates the loss while reconstructing the clean visual, and the other is responsible for computing the loss of reconstructing the input data. The document [50] proposes a self-supervised SAR-despeckling convolutional approach that utilizes features from a pre-tuned network with hybrid loss function in order to preserve minute textural information. Similarly, article [51] illustrated an unsupervised approach for the same by extracting speckle-to-speckle pairs of the input and utilizing an advance nested-UNet model over the extracted speckle-to-speckle pairs adopting Noise2Noise training strategy. Subsequently, the authors of [52] concentrated on optimizing the tradeoff between speckle reduction and texture preservation by processing shallow features extracted by different configurations of multi-scale modules, utilizing a deep-residual despeckling network with dual attention. Apart from this, the authors of [53] and [54] recognized the importance of processing the gradient or the contrast information of the input data, thereby recording significant improvements in despeckling performance. These approaches significantly overcame the challenges of over-smoothing as well as blocking artifacts that formed the major drawback of approaches based on other techniques. Moreover, these techniques are capable of retaining maximum detailing with a largely acceptable amount of speckle reduction. In addition to this, these methods do not require manual tuning of a large number of hyper-parameters. Despite the significant strengths of these methodologies in accomplishing the desired objective, they have a major drawback while considering the computational complexity.

1.7  COMPARATIVE ANALYSIS This section is designed to have a quick visual analysis of a handful of well-adopted SAR despeckling technologies. This analysis was made both on simulated and real data. For simulated analysis, the UCMerced data [55] was taken into consideration, whereas a multi-look data captured by a SAR sensor operating in Ku band over Horse Track, Albuquerque, NM, USA was utilized to compare the performance over real input. The despeckling methods that were compared include PPBit [14], SAR-BM3D

Despeckling of the Earth’s Surface Visuals by Synthetic Aperture Radar

15

FIGURE 1.5  Visual performance analysis over real data.

[27], SAR-CNN [37], SAR-IDCNN [38], SAR-DRN [39], and AGSDNet [53]. The visual performance analysis over simulated data is presented in Figure 1.4. Upon detailed analysis of these visuals, the strengths and weaknesses mentioned for each despeckling approach are evident. Some of these results encounter over-smoothing, while several visuals encounter a small existence of speckle components. Also, the visuals predicted by deep network–based approaches provided better visuals, with the recently developed approach titled AGSDNet performing the best. On the other hand, Figure 1.5 analyzes similar performance but over the considered real data commenting upon the generalization capability of each model. Alongside visual comparison, a quantified qualitative comparison has been also tabulated in Table 1.1. The most frequently used performance metrics, which include TABLE 1.1 Quantified Performance Metric Values Comparing the Quality of Predicted Outcome of Various Models Methods PPBit [14] SAR-BM3D [27] SAR-CNN [37] SAR-IDCNN [38] SAR-DRN [39] AGSDNet [53]

Simulated Data Analysis PSNR SSIM 24.46 0.77 25.21 0.85 28.26 0.78 29.65 0.87 29.48 0.82 30.02 0.89

Real Data Analysis EPD-ROA(HD) EPD-ROA(VD) 0.97 0.98 0.96 0.95 0.96 0.94 0.97 0.96 0.96 0.95 1.00 1.00

16

Multimedia Data Processing and Computing

PSNR [56] and SSIM [57], were considered to analyze the performance of the simulated data, whereas the values of the metric EPD-ROA [58] along the horizontal and vertical directions were recorded to analyze the quality of despeckled real data. These comparisons suggest the rate of advancement throughout the past few years.

1.8  CONCLUSION AND FUTURE SCOPE In conclusion, it is observed that remote sensing data has a wide range of applicability. This sensor type used to collect these data exhibits their pros and cons. Considering these, SAR sensors are widely accepted by the scientific community and implemented in a high range of applications. Therefore, upon a thorough study of the data-capturing techniques in these sensors and various other related phenomena, several intrinsic challenges were expected to adversely affect the captured data. One of the major challenges includes speckle introduction that needs to be carefully addressed before processing these data. Several development milestones, achieved in processing these speckle components, were analyzed along with their corresponding strengths and weaknesses. Moreover, a comparative study has been made analyzing the performance of some highly adopted SAR despeckling mechanisms documented in the literature. Though a considerable number of improvements in handling the SAR despeckling problem have already been encountered, there exists a large scope for developing much more efficient and computation-friendly SAR despeckling techniques due to which rapid interest is observed among researchers to explore this field in past few years.

REFERENCES 1. Earth Resources Observation and Science (EROS) Center. Side-Looking Airborne Radar (SLAR). 2017. doi: 10.5066/F76Q1VGQ. https://www.usgs.gov/centers/eros/ science/usgs-eros-archive-aerial-photography-side-looking-airborne-radar-slarmosaics?qt-science_center_objects=0#qt-science_center_objects. 2. Africa Flores et al. The SAR Handbook: Comprehensive Methodologies for Forest Monitoring and Biomass Estimation. NASA, Apr. 2019. doi: 10.25966/nr2c-s697. https://www.servirglobal.net/Global/Articles/Article/2674/sar-handbook-comprehensive-methodologies-for-forest-monitoring-and-biomass-estimation. 3. Andrea Bordone Molini et al. “Speckle2Void: Deep Self-Supervised SAR Despeckling with Blind-Spot Convolutional Neural Networks”. In: IEEE Transactions on Geoscience and Remote Sensing 60 (2022), pp. 1–17. doi: 10.1109/TGRS.2021.3065461. 4. Giulia Fracastoro et al. “Deep Learning Methods for Synthetic Aperture Radar Image Despeckling: An Overview of Trends and Perspectives”. In: IEEE Geoscience and Remote Sensing Magazine 9.2 (2021), pp. 29–51. doi: 10.1109/MGRS.2021.3070956. 5. Suman Kumar Maji, Ramesh Kumar Thakur and Hussein M. Yahia. “SAR Image Denoising Based on Multifractal Feature Analysis and TV Regularisation”. In: IET Image Processing 14.16 (2020), pp. 4158–4167. doi: https://doi.org/10.1049/iet-ipr. https://ietresearch.onlinelibrary.wiley.com/doi/abs/10.1049/iet-ipr.2020.0272 6. Anirban Saha and Suman Kumar Maji. “SAR Image Despeckling Convolutional Model with Integrated Frequency Filtration Technique”. In: TENCON 2022 – 2022 IEEE Region 10 Conference (TENCON). Nov. 2022, pp. 1–6. doi: 10.1109/ TENCON55691.2022.9978085.

Despeckling of the Earth’s Surface Visuals by Synthetic Aperture Radar

17

7. Leonard J. Porcello et al. “Speckle Reduction in Synthetic-Aperture Radars”. In: Journal of the Optical Society of America 66.11 (Nov. 1976), pp. 1305–1311. doi: 10.1364/JOSA.66.001305. 8. Kondo K., Ichioka Y., and Suzuki T. “Image Restoration by Wiener Filtering in the Presence of Signal-Dependent Noise”. In: Applied Optics 16.9 (1977), pp. 2554–2558. doi: 10.1364/AO.16.002554. 9. Jong-Sen Lee. “Speckle Analysis and Smoothing of Synthetic Aperture Radar Images”. In: Computer Graphics and Image Processing 17.1 (1981), pp. 24–32. doi: https://doi. org/10.1016/S0146-664X(81)80005-6. https://www.sciencedirect.com/science/article/ pii/S0146664X81800056. 10. Jong-Sen Lee. “Refined Filtering of Image Noise Using Local Statistics”. In: Computer Graphics and Image Processing 15.4 (1981), pp. 380–389. doi: https://doi. org/10.1016/S0146-664X(81)80018-4. https://www.sciencedirect.com/science/article/ pii/S0146664X81800184. 11. Victor S. Frost et al. “A Model for Radar Images and Its Application to Adaptive Digital Filtering of Multiplicative Noise”. In: IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-4.2 (1982), pp. 157–166. doi: 10.1109/TPAMI. 1982.4767223. https://ieeexplore.ieee.org/document/4767223. 12. Darwin T. Kuan et al. “Adaptive Noise Smoothing Filter for Images with SignalDependent Noise”. In: IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-7.2 (1985), pp. 165–177. doi: 10.1109/TPAMI.1985.4767641. https://ieeexplore. ieee.org/document/4767641. 13. Lopes A. et al. “Structure Detection and Statistical Adaptive Speckle Filtering in SAR Images”. In: International Journal of Remote Sensing 14.9 (1993), pp. 1735–1758. doi: 10.1080/01431169308953999. https://doi.org/10.1080/01431169308953999. 14. Charles-Alban Deledalle, LoÏc Denis and Florence Tupin. “Iterative Weighted Maximum Likelihood Denoising with Probabilistic Patch-Based Weights”. In: IEEE Transactions on Image Processing 18.12 (2009), pp. 2661–2672. doi: 10.1109/ TIP.2009.2029593. https://ieeexplore.ieee.org/document/5196737. 15. Christy Jojy et al. “Discontinuity Adaptive Non-Local Means with Importance Sampling Unscented Kalman Filter for De-Speckling SAR Images”. In: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 6.4 (2013), pp. 1964–1970. doi: 10.1109/JSTARS.2012.2231055. https://ieeexplore.ieee.org/document/6376246. 16. A. Glory Sujitha, P. Vasuki and A. Amala Deepan. “Hybrid Laplacian Gaussian Based Speckle Removal in SAR Image Processing”. In: Journal of Medical Systems 43.7 (2019), p. 222. doi: 10.1007/s10916-019-1299-0. https://doi.org/10.1007/s10916-019-1299-0. 17. Yongjian Yu and S.T. Acton. “Speckle Reducing Anisotropic Diffusion”. In: IEEE Transactions on Image Processing 11.11 (2002), pp. 1260–1270. doi: 10.1109/ TIP.2002.804276. https://ieeexplore.ieee.org/document/1097762. 18. Heng-Chao Li et al. “Bayesian Wavelet Shrinkage with HeterogeneityAdaptive Threshold for SAR Image Despeckling Based on Generalized Gamma Distribution”. In: IEEE Transactions on Geoscience and Remote Sensing 51.4 (2013), pp. 2388–2402. doi: 10.1109/TGRS.2012.2211366. https://ieeexplore.ieee. org/document/6303903. 19. R. Sethunadh and T. Thomas. “Spatially Adaptive Despeckling of SAR Image Using Bi-Variate Thresholding in Directionlet Domain”. In: Electronics Letters 50.1 (2014), pp. 44–45. doi: https://doi.org/10.1049/el.2013.0971. https://ietresearch.onlinelibrary. wiley.com/doi/abs/10.1049/el.2013.0971. 20. Lei Zhu, Xiaotian Zhao and Meihua Gu. “SAR Image Despeckling Using Improved Detail-Preserving Anisotropic Diffusion”. In: Electronics Letters 50.15 (2014), pp. 1092–1093. doi: https://doi.org/10.1049/el.2014.0293. https://ietresearch.onlinelibrary. wiley.com/doi/abs/10.1049/el.2014.0293.

18

Multimedia Data Processing and Computing

21. David de la Mata-Moya et al. “Spatially Adaptive Thresholding of the Empirical Mode Decomposition for Speckle Reduction Purposes”. In: EUSAR 2014, EUSAR 2014 10th European Conference on Synthetic Aperture Radar. VDE VERLAG GMBH, 2014, p. 4. https://www.tib.eu/de/suchen/id/vde%5C%3Asid%5C%7E453607126. 22. Shuai Qi Liu et al. “Bayesian Shearlet Shrinkage for SAR Image De-Noising via Sparse Representation”. In: Multidimensional Systems and Signal Processing 25.4 (2014), pp. 683–701. doi: 10.1007/s11045-013-0225-8. https://link.springer.com/article/10.1007/ s11045-013-0225-8. 23. Yao Zhao et al. “Adaptive Total Variation Regularization Based SAR Image Despeckling and Despeckling Evaluation Index”. In: IEEE Transactions on Geoscience and Remote Sensing 53.5 (2015), pp. 2765–2774. doi: 10.1109/TGRS.2014.2364525. https://ieeexplore.ieee.org/document/6954413. 24. R. Sivaranjani, S. Mohamed Mansoor Roomi and M. Senthilarasi. “Speckle Noise Removal in SAR Images Using Multi-Objective PSO (MOPSO) Algorithm”. In: Applied Soft Computing 76 (2019), pp. 671–681. doi: https://doi.org/10.1016/j.asoc.2018.12.030. https://www.sciencedirect.com/science/article/pii/S1568494618307257. 25. Suman Kumar Maji, Ramesh Kumar Thakur and Hussein M. Yahia. “StructurePreserving Denoising of SAR Images Using Multifractal Feature Analysis”. In: IEEE Geoscience and Remote Sensing Letters 17.12 (2020), pp. 2100–2104. doi: 10.1109/ LGRS.2019.2963453. https://ieeexplore.ieee.org/document/8959376. 26. Sudeb Majee, Rajendra K. Ray and Ananta K. Majee. “A New Non-Linear HyperbolicParabolic Coupled PDE Model for Image Despeckling”. In: IEEE Transactions on Image Processing 31 (2022), pp. 1963–1977. doi: 10.1109/TIP.2022.3149230. https:// ieeexplore.ieee.org/document/9713749. 27. Sara Parrilli et al. “A Nonlocal SAR Image Denoising Algorithm Based on LLMMSE Wavelet Shrinkage”. In: IEEE Transactions on Geoscience and Remote Sensing 50.2 (2012), pp. 606–616. doi: 10.1109/TGRS.2011.2161586. https://ieeexplore.ieee.org/ document/5989862. 28. Diego Gragnaniello et al. “SAR Image Despeckling by Soft Classification”. In: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 9.6 (2016), pp. 2118–2130. doi: 10.1109/JSTARS.2016.2561624. https://ieeexplore.ieee.org/ document/7480344. 29. Jithin Gokul, Madhu S. Nair and Jeny Rajan. “Guided SAR Image Despeckling with Probabilistic non Local Weights”. In: Computers & Geosciences 109 (2017), pp. 16–24. doi: https://doi.org/10.1016/j.cageo.2017.07.004. https://www.sciencedirect. com/science/article/pii/S0098300416308640. 30. Ramin Farhadiani, Saeid Homayouni and Abdolreza Safari. “Hybrid SAR Speckle Reduction Using Complex Wavelet Shrinkage and Non-Local PCA-Based Filtering”. In: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 12.5 (2019), pp. 1489–1496. doi: 10.1109/JSTARS.2019.2907655. https:// ieeexplore.ieee.org/document/8692388. 31. Alex Krizhevsky, Ilya Sutskever and Geoffrey E. Hinton. “ImageNet Classification with Deep Convolutional Neural Networks”. In: Communications of the ACM 60.6 (May 2017), pp. 84–90. doi: 10.1145/3065386. https://doi.org/10.1145/3065386. 32. Kai Zhang et al. “Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising”. In: IEEE Transactions on Image Processing 26.7 (2017), pp. 3142–3155. doi: 10.1109/TIP.2017.2662206. 33. Tobias Plötz and Stefan Roth. “Neural Nearest Neighbors Networks”. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. NIPS’18. Montréal, Canada: Curran Associates Inc., 2018, pp. 1095–1106. 34. Ding Liu et al. “Non-Local Recurrent Network for Image Restoration”. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. NIPS’18. Montréal, Canada: Curran Associates Inc., 2018, pp. 1680–1689.

Despeckling of the Earth’s Surface Visuals by Synthetic Aperture Radar

19

35. Diego Valsesia, Giulia Fracastoro and Enrico Magli. “Deep Graph-Convolutional Image Denoising”. In: IEEE Transactions on Image Processing 29 (2020), pp. 8226–8237. doi: 10.1109/TIP.2020.3013166. 36. Feng Gu et al. “Residual Encoder-Decoder Network Introduced for Multisource SAR Image Despeckling”. In: 2017 SAR in Big Data Era: Models, Methods and Applications (BIGSAR- DATA). 2017, pp. 1–5. doi: 10.1109/BIGSARDATA.2017.8124932. https:// ieeexplore.ieee.org/document/8124932. 37. G. Chierchia et al. “SAR Image Despeckling through Convolutional Neural Networks”. In: 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS). 2017, pp. 5438–5441. doi: 10.1109/IGARSS.2017.8128234. 38. Puyang Wang, He Zhang and Vishal M. Patel. “SAR Image Despeckling Using a Convolutional Neural Network”. In: IEEE Signal Processing Letters 24.12 (Dec. 2017), pp. 1763–1767. doi: 10.1109/LSP.2017.2758203. https://ieeexplore.ieee.org/ document/8053792. 39. Qiang Zhang et al. “Learning a Dilated Residual Network for SAR Image Despeckling”. In: Remote Sensing 10.2 (2018). doi: 10.3390/rs10020196. https://www. mdpi.com/2072-4292/10/2/196. 40. Sergio Vitale, Giampaolo Ferraioli and Vito Pascazio. “A New Ratio Image Based CNN Algorithm for SAR Despeckling”. In: IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium. 2019, pp. 9494–9497. doi: 10.1109/ IGARSS.2019.8899245. https://ieeexplore.ieee.org/document/8899245. 41. D. Cozzolino et al. “Nonlocal SAR Image Despeckling by Convolutional Neural Networks”. In: IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium. 2019, pp. 5117–5120. doi: 10.1109/IGARSS.2019.8897761. https:// ieeexplore.ieee.org/document/8897761. 42. Giampaolo Ferraioli, Vito Pascazio and Sergio Vitale. “A Novel Cost Function for Despeckling Using Convolutional Neural Networks”. In: 2019 Joint Urban Remote Sensing Event (JURSE). 2019, pp. 1–4. doi: 10.1109/JURSE.2019.8809042. https:// ieeexplore.ieee.org/document/8809042. 43. Ting Pan et al. “A Filter for SAR Image Despeckling Using Pre-Trained Convolutional Neural Network Model”. In: Remote Sensing 11.20 (2019). doi: 10.3390/rs11202379. https://www.mdpi.com/2072-4292/11/20/2379. 44. Feng Gu, Hong Zhang and Chao Wang. “A Two-Component Deep Learning Network for SAR Image Denoising”. In: IEEE Access 8 (2020), pp. 17792–17803. doi: 10.1109/ ACCESS.2020.2965173. https://ieeexplore.ieee.org/document/8954707. 45. Xiaoshuang Ma et al. “SAR Image Despeckling by Noisy Reference-Based Deep Learning Method”. In: IEEE Transactions on Geoscience and Remote Sensing 58.12 (2020), pp. 8807–8818. doi: 10.1109/TGRS.2020.2990978. https://ieeexplore.ieee.org/ abstract/document/9091002. 46. Emanuele Dalsasso et al. “SAR Image Despeckling by Deep Neural Networks: From a Pre-Trained Model to an End-to-End Training Strategy”. In: Remote Sensing 12.16 (2020). doi: 10.3390/rs12162636. https://www.mdpi.com/2072-4292/12/16/2636. 47. Huanfeng Shen et al. “SAR Image Despeckling Employing a Recursive Deep CNN Prior”. In: IEEE Transactions on Geoscience and Remote Sensing 59.1 (2021), pp. 273–286. doi: 10.1109/TGRS.2020.2993319. https://ieeexplore.ieee.org/document/9099060. 48. Sergio Vitale, Giampaolo Ferraioli and Vito Pascazio. “Multi-Objective CNN-Based Algorithm for SAR Despeckling”. In: IEEE Transactions on Geoscience and Remote Sensing 59.11 (2021), pp. 9336–9349. doi: 10.1109/TGRS.2020.3034852. https:// ieeexplore.ieee.org/document/9261137. 49. Adugna G. Mullissa et al. “deSpeckNet: Generalizing Deep Learning-Based SAR Image Despeckling”. In: IEEE Transactions on Geoscience and Remote Sensing 60 (2022), pp. 1–15. doi: 10.1109/TGRS.2020.3042694. https://ieeexplore.ieee.org/ document/9298453.

20

Multimedia Data Processing and Computing

50. Shen Tan et al. “A CNN-Based Self-Supervised Synthetic Aperture Radar Image Denoising Approach”. In: IEEE Transactions on Geoscience and Remote Sensing 60 (2022), pp. 1–15. doi: 10.1109/TGRS.2021.3104807. https://ieeexplore.ieee.org/ document/9521673. 51. Ye Yuan et al. “A Practical Solution for SAR Despeckling with Adversarial Learning Generated Speckled-to-Speckled Images”. In: IEEE Geoscience and Remote Sensing Letters 19 (2022), pp. 1–5. doi: 10.1109/LGRS.2020.3034470. https://ieeexplore.ieee. org/document/9274511. 52. Shuaiqi Liu et al. “MRDDANet: A Multiscale Residual Dense Dual Attention Network for SAR Image Denoising”. In: IEEE Transactions on Geoscience and Remote Sensing 60 (2022), pp. 1–13. doi: 10.1109/TGRS.2021.3106764. https://ieeexplore.ieee.org/ document/9526864. 53. Ramesh Kumar Thakur and Suman Kumar Maji. “AGSDNet: Attention and GradientBased SAR Denoising Network”. In: IEEE Geoscience and Remote Sensing Letters 19 (2022), pp. 1–5. doi: 10.1109/LGRS.2022.3166565. https://ieeexplore.ieee.org/ document/9755131. 54. Ramesh Kumar Thakur and Suman Kumar Maji. “SIFSDNet: Sharp Image Feature Based SAR Denoising Network”. In: IGARSS 2022 - 2022 IEEE International Geoscience and Remote Sensing Symposium. 2022, pp. 3428–3431. doi: 10.1109/ IGARSS46834.2022.9883415. 55. Yi Yang and Shawn Newsam. “Bag-of-Visual-Words and Spatial Extensions for Land-Use Classification”. In: Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems. GIS ’10. San Jose, California: Association for Computing Machinery, 2010, pp. 270–279. doi: 10.1145/ 1869790.1869829. https://doi.org/10.1145/1869790.1869829. 56. Alain Horé and Djemel Ziou. “Image Quality Metrics: PSNR vs. SSIM”. In: 2010 20th International Conference on Pattern Recognition. 2010, pp. 2366–2369. doi: 10.1109/ ICPR.2010.579. 57. Zhou Wang et al. “Image Quality Assessment: From Error Visibility to Structural Similarity”. In: IEEE Transactions on Image Processing 13.4 (2004), pp. 600–612. doi: 10.1109/TIP.2003.819861. 58. Hongxiao Feng, Biao Hou and Maoguo Gong. “SAR Image Despeckling Based on Local Homogeneous-Region Segmentation by Using Pixel-Relativity Measurement”. In: IEEE Transactions on Geoscience and Remote Sensing 49.7 (2011), pp. 2724–2737. doi: 10.1109/TGRS.2011.2107915.

2

Emotion Recognition Using Multimodal Fusion Models A Review Archana Singh and Kavita Sahu

2.1 INTRODUCTION Service robot performance has improved recently, and there will surely be a revolution in robot service in the next years, similar to the one that happened with industrial robotics. But first, robots, particularly humanoids, must be able to understand humans’ emotions in order to adapt to their demands. Artificial intelligence is becoming more and more integrated into many areas of human life. Technology that adapts to human requirements is facilitated by artificial intelligence. Algorithms for the recognition and detection of emotions employ the strategies. Many studies have found the link between facial expression and human emotions. Humans can perceive and distinguish these emotions and can communicate. Emotions play a key part in social relationships, human intelligence, perception, and other aspects of life. Application of human–computer interaction technology that recognizes emotional reactions offers a chance to encourage peaceful interaction in the communication of computers and people an increasing number of technologies for processing everyday activities, including facial expressions, voice, body movements, and language, and has expanded the interaction modality between humans and computer-supported communication items, such as reports, tablets, and cell phones. Human emotions reveal themselves in a variety of ways, prompting the development of affect identification systems. There are three primary approaches to recognition: audio-based approach, video-based approach, and audio–video approach. In a method that is based on sound, feelings are portrayed through the use of characteristics such as valence (on a scale that can range from positive to negative or negative to positive), audio frequency, pitch, and so on. There are three different kinds of audio features: spectral features, features related to prosody, and features related to voice quality. Prosody is distinguished by a number of features, two of which are pitch strength and short energy duration time. Examples of spectrum features include the harmonic-to-noise ratio, the spectral energy distribution, and any number of other spectral properties. The Mel Frequency Cepstral Coefficient (MFCC) is a well-known characteristic-extraction technique for spectrum properties [1]. Prosody characteristics are retrieved using pitch TEO (Teager energy operator) DOI: 10.1201/9781003391272-2

21

22

Multimedia Data Processing and Computing

low energy and a zero-crossing rate. Face feature extraction is used to identify emotions in a video-based technique. For feature extraction, different classifiers are used: principal component analysis (PCA) and Haar cascade classifier [2]. The nose, brow, eyes look, lips, and other facial features are all examples of facial features. The remainder of this work is structured as follows. In Section 2.2, we have provided a brief overview of the ideas and models of emotions. In Section 2.3, we next provide a more detailed discussion of Deep Learning(DL) assessments. The comprehensive multimodel ER is described in Section 2.4. We have detailed the datasets used for ER in Section 2.5, and then in Section 2.6, we have provided a summary and discussion of the review.

2.2  EMOTION THEORIES AND MODELS Many alternative definitions of emotions have been offered thus far. The ability to reason about emotions is important for a wide range of social actions. Emotions have an important part in human perception and cognition, as well as in human existence and social interactions. Effective cognition is a term that refers to a compilation of all mental operations used to reason about other people’s emotions. Lay theories, also known as intuitive theories or folk theories, offer a structured understanding of the world as well as a conceptual framework for considering other people’s feelings. The existence of several widely acknowledged fundamental emotions is taken into account by discrete emotion methods. Different theories of emotion are shown in Figure 2.1, and Figure 2.2 shows basic feelings in a 2-D framework of arousal and valence.

2.3  EMOTION RECOGNITION AND DEEP LEARNING 2.3.1  Facial Expression Recognition In the realm of nonverbal communication, facial expressions are among the most important tools for conveying feelings. It’s crucial to be able to recognize facial expressions of emotion because many different applications, such as medical services and human–computer interaction, depend on it. According to the research

FIGURE 2.1  Categorization of emotion theories.

Emotion Recognition Using Multimodal Fusion Models: A Review

23

FIGURE 2.2  Basic feelings in a 2-D framework of arousal and valence.

of Mehrabian [3], only 7% of information is transmitted between people through the written word, 38% through spoken word, and 55% through facial expression. According to Ekman et. al. [4], there are five fundamental feelings: surprise, happiness, rage, sadness, and fear. He showed how individuals from all cultures are capable of expressing these emotions. The expression of emotion can be accomplished in either of these two orthogonal dimensions: valence and arousal. Further, Feldman et al. [5], stated in his research that two dimensions that are orthogonal to one another can be employed to describe emotion. He claimed that the manner in which one displays their feelings is unique to the individual. In addition, there is a large amount of variation in people’s experiences when they are requested to express periodic emotions. Arousal can range from being calm to being excited, whereas valence can go from being positive to being negative. The input would be categorized in this endeavor according to the valence and arousal levels that it possessed. In order to extract facial emotions, multifeature fusion, and other related processes, researchers initially devised techniques for extracting functions such as the Garbor wavelet, Weber Local Descriptor (WLD), and Local Binary Pattern (LBP) [5]. These features simply are not resistant to topic imbalances, which may lead to a large reduction in the amount of consistent information present in the source image [6]. The most frequently discussed issue in the area of face recognition right now is the application of DL network models to the study of facial expressions. For usage in daily social interactions, Face Emotion Recognition (FER) offers a variety of applications, including smart surveillance, lie identification, and intelligent medical practice. Deep Convolutional Neural Network (CNN), Deep Belief Network (DBN), and Long Short-Term Memory (LSTM) were among the deep learning–based facial expression detection models examined by [7], and also about their aggregation.

24

Multimedia Data Processing and Computing

2.3.2 Speech Emotion Recognition Signal preprocessing, feature extraction, and classification are the three primary components that make up emotion identification systems that are based on digitized speech. Several different algorithms for determining the emotions conveyed in human speech have been proposed over the course of time. Deep learning has been utilized extensively, employed in a number of speech fields, inclusive of voice recognition, but that’s just one of them. Identifying the feelings conveyed in a person’s voice has also been accomplished by Bertero et al. [8] using convolutional neural networks. As shown by Hasan et al. [9], the RNN bidirectional (Bi-LSTM) technology is preferable for acquiring the most relevant speech information for enhanced speech recognition performance. Because of its multi-layered structure and effective results conveyance when it comes to deep learning, Bi-LSTM technology is now seeing remarkable growth. Some of these research areas include natural language processing, pattern recognition, speech emotion recognition, and image and speech recognition.

2.3.3 Multimodel Emotion Recognition The broad use of multimodal emotion processing in research is still present. With the inclusion of additional research modalities (such as video, audio, sensor data, and so on), the purpose of the study may be accomplished by integrating a variety of methods and strategies, the majority of which make use of semantic principles, big data techniques, and deep learning. Moreover, the purpose of the study may be accomplished with additional research modalities (such as video, audio, sensor data, and so on). The family of machine learning techniques known as “multimodal fusion for emotion detection” integrates data from many modalities to forecast an outcome measure. A class having a continuous value (such as the amount of valence or arousal) or a discrete value (such as joyful vs. sad) is often what this is. Existing methods for multimodal emotion recognition are surveyed in several literature review studies. Any multimodal fusion strategy must include these three factors: • Extracted features: Audio-video emotion recognition (AVER) has a number of customized features. These basic descriptors largely refer to geometrical characteristics such as face landmarks. Spectral, cepstral, prosodic, and voice-quality characteristics are examples of frequently utilized audio signal features. Deep neural network-based features for AVER have gained popularity recently. Deep convolutional neural networks, sometimes referred to as CNNs, have been found to outperform other AVER techniques [10]. • Multimodal features fusion: The manner in which aural and visual data are combined is a crucial factor in multimodal emotion identification. Decision-level fusion, features-level fusion, hybrid fusion, and model-level fusion are the four types of strategies that have been documented in the literature.

Emotion Recognition Using Multimodal Fusion Models: A Review

25

• Modeling temporal dynamics: Audio-visual data is a representation of a variable collection of signals spanning both temporal and spatial dimensions. There are various techniques for modeling these signals using deep learning. Spatial feature representations deal with learning features from single pictures, extremely brief image sequences, or brief audio clips. Temporal feature representations are models that take input in the form of audio or visual sequences.

2.4  MULTIMODAL EMOTION RECOGNITION Emotion identification is difficult because emotions are complex psychological processes that happen nonverbally. Contrary to unimodal learning, multimodal learning is significantly more effective [11]. Studies have also sought to unite data from many modalities, such as audio and facial emotion expressions, physiological signals, written text and audio, and various aggregations of these modalities, for the purpose of increasing both the effectiveness and the precision of their findings. The multimodal fusion model can identify emotions by fusing physiological inputs in a variety of ways. Deep learning (DL) architectures have recently undergone advancements, and this has led to their use in multimodal emotion detection (Figure 2.3). Support vector machine (SVM), deep belief net, deep convolutional neural network, and their combinations are examples of deep-learning approaches.

2.4.1 Multimodal Emotion Recognition Combining Audio and Text A deep-learning–based strategy was provided by Priyasad et al. [12] to secure codes that are indicative of emotion. The researchers were able to extract acoustic properties from raw audio using a SincNet layer, band-pass filtering, and neural networks. The output of these band-pass filters is then applied to the input of the DCNN. Cross-modal emphasis and a CNN based on raw waveforms were combined in a distinctive form by Krishna et al. [13]. To achieve improved emotion recognition, they analyze unprocessed audio using 1-D convolutional networks and attention mechanisms between the audio and text features. Through testing, Caihua [14] discovered that the SVM-based technique for ML is effective for analyzing voice customer experience. He proposed a multimodal SVMbased speech emotion identification system. The experimental results then show that by using this SVM strategy to solve the typical database classification issue, the SVM algorithm has significantly improved.

FIGURE 2.3  Multimodal recognition based on different inputs.

26

Multimedia Data Processing and Computing

Liu, Gaojun, and others [15] created a novel multimodal approach for classifying musical emotions based on the auditory quality of the music and the text of the song lyrics. The classification result is significantly improved compared to other ML techniques when using the LSTM network for audio classification.

2.4.2 Multimodal Emotion Recognition Combining Image and Text A multimodal DL model was created by Lee et al. [16] using face photos and textual descriptions of the situation. They created two multimodal models using photos and text to categorize the characters’ face images in the Korean TV series Misaeng: The Incomplete Life. The results of the experiment showed that the effectiveness of character recognition is greatly increased when text definitions of their behavior are used. Siriwardhana et al. [17] investigated the use of a pre-trained “BERT-like” architecture for self-supervised learning (SSL) to represent both language and text modalities in order to distinguish multimodal language emotions. They show that a simple fusion mechanism (Shallow-Fusion) simplifies and strengthens complex fusion mechanisms.

2.4.3 Multimodal Emotion Recognition Combining Facial and Body Physiology A ground-breaking multimodal fusion system with regularization was suggested by Zhang et al. [18] and centered on a deep-network architecture and a novel kernel-matrix approach. They converted the native space of a specified kernel into a task-specific function space in representation learning by utilizing the deep network architecture’s better efficiency. They used a common presentation layer to learn how to express fusion, which denotes the tacit merger of several kernels. In order to obtain the system’s multimodal input function for emotion recognition, Nie et al. [19] suggested a multi-layer LSTM architecture. Neural network-based addition ideas will result in significant alterations at the utterance level. In order to extract the global aspects of speech from videos, they employ LSTM to analyze the data. The experimental results led to remarkable and significant improvements in efficiency over the baseline.

2.4.4 Other Multimodal Emotion Recognition Models A novel multimodal fusion attention network for acoustic emotion identification was proposed by Hengshun Zhou et al. [20] and was built on multi-level factorized bilinear pooling (FBP). In order to recognize the emotions shown by the speaker’s speech, a fully convolutional network with a 1-D attention approach and local response normalization is constructed for the audio stream. Then, to complete the process of audio-visual information fusion, a methodology known as global FBP (G-FBP) is developed. This method incorporates self-attention that is based on a video stream with the prospective audio. An adaptive method called AG-FBP predicts the total mass of two components based on emotion-related vectors produced from each modality’s attention mechanism in order to improve G-FBP.

Emotion Recognition Using Multimodal Fusion Models: A Review

27

When evaluated on the IEMOCAP corpus, the new FCN method achieves results that are significantly better than before, with an accuracy of 71.40%. Najmeh Samadiani et al. [21] introduced a new Happy ER-DDF approach, using a hybrid deep neural network for recognizing happy emotions from unrestricted videos. For facial expression recognition, they used ResNet frameworks, as well as a 3-D version of the Inception ResNet architecture that extricates spatial-temporal characteristics. For the evaluation of temporal dynamic features inside consecutive frames, the LSTM layer was applied to extracted features. Because the geometric properties produced by facial landmarks are useful in the recognition of expressions, a CNN model was used to extricate deep characteristics from face distance time series. Their approach divided the non-happy and happy groups by adding feature vectors at both decision-level fusion and feature. With accuracy of 95.97%, 94.89%, and 91.14% for the AM-FED dataset, AFEW dataset, and MELD dataset, respectively, the proposed HappyER-DDF approach detects pleasant emotion more accurately than several competing methods [22]. Kah Phooi Sang et al. [23] proposed an audio-visual emotion identification system that improved recognition efficiency in the audio and video routes by combining rule-based and ML methods. Through the utilization of least square linear discriminant analysis (LS-LDA) and bi-directional principal component analysis (BDPCA), a visual approach was developed for the purposes of dimension reduction and class classification (LSLDA). The information that was acquired visually was input into a novel neural classifier that was given the name optimized kernel Laplacian radial basis function (OKL-RBF). Prosodic input data (log energy, pitch, Teager energy operator, and zero crossing rate) were combined with spectral characteristics (melscale frequency cepstral coefficient) in order to generate the audio pathway. A variety of heuristics were used to the retrieved audio characteristic before it was fed into an audio feature level fusion module. By determining which emotion was most likely to be present in the stream, the module was able to make a decision. The outputs from both paths were combined by an audio-visual fusion module. Standard databases are used to assess the planned audio path, visual path, and final system’s performance. The results of the experiments and comparisons show 86.67% and 90.03% accuracy on eNTERFACE database and RML database, respectively (Table 2.1).

2.5 DATABASES Successful deep learning requires a number of key components, one of which is training the neuron network with examples. There are already various FER databases accessible to aid researchers in this job. Changes in population, lighting, and facial attitude, as well as the quantity and dimension of the images and videos, are all distinct from one another.

2.5.1 Database Descriptions More than 750,000 photos have been shot with MultiPie using 15 different viewpoints and 19 different illumination situations. It features facial expressions such as anger, disgust, neutrality, happiness, a squint, a scream, and surprise. MMI has

28

Multimedia Data Processing and Computing

TABLE 2.1 Summary of Existing Approaches for Human Emotion Recognition S.No. 1

Authors Priyasad et al. [12]

2 3

Krishna et al. [13] Caihua [14]

Classifier/Detector RNN, DCCN with a SincNet layer 1-D CNN CNN

4

Liu et al. [15]

LSTM

5 6 7 8 9 10

Siriwardhana et al. [17] Hidayat et al. [24] Zuo et al. [25] Soleymani et al. [26] Seng et al. [27] Samadiani et al. [21]

11

Li et al. [28]

12

Jaiswal et al. [29]

SSL model TA-AVN CNN BDPCA BDPCA + LSLDA 3D Inception RestNet Neural network, LSTM FCN with 1-D attention mechanism CNN

Database IEMOCAP

Accuracy 80.51%

IEMOCAP Asian character from the TV drama series 777 songs

– –

IEMOCAP RAVDESS FERC-2013 RML eNTERFACE AM-FED AFEW

Improvement by 5.77% – 78.7 70.14 91 86.67 95.97 94.89

AFEW IEMOCAP FERC

63.09 75.49 70.14

2900 films, each of which indicates the neutral, onset, apex, and offset. Additionally, it possesses the five fundamental emotions and the neutral state. GEMEP FERA is comprised of 289 visual sequences and depicts a variety of emotional states including anger, fear, sadness, relief, and happiness. SFEW has 700 pictures that vary in age, occlusion, illumination, and head pose. Additionally, it includes five fundamental feelings in addition to neutral. CK+ features 593 films for both posed and non-posed expressions, as well as five fundamental feelings, including neutral and contempt. FER2013 consists of five fundamental feelings in addition to neutral, and it comprises 35,887 photos in grayscale that were obtained from a Google image search. JAFFE is comprised of 213 black-and-white photographs that were posed by 10 Japanese women. Additionally, it consists of five fundamental emotions and neurological. BU-3DFE includes 2500 3-D facial photos that were acquired from two different angles (–45° and +45°) and includes five fundamental feelings in addition to neutral. CASME II has 247 micro-expression sequences. It shows displays of happiness, disgust, surprise, and regression, among other emotions. The Oulu-CASIA database contains 2880 movies that were shot under three distinct lighting conditions and features five fundamental feelings. AffectNet is comprised of five fundamental feelings in addition to neutral, and it comprises more than 440,000 images gathered from the internet. The RAFD-DB contains 30,000 photos taken from the real world, as well as five fundamental feelings and neutral.

Emotion Recognition Using Multimodal Fusion Models: A Review

29

2.6  CONCLUSION AND FUTURE WORK Due to the fact that our study focused on contemporary ER research, we were able to acquire knowledge regarding the most recent developments in this sector. We have looked at a number of deep-learning multimodels and provided a number of datasets that comprise surprising images obtained from the real world and others produced in lab facilities in order to acquire and achieve an accurate identification of human emotions. FER is one of the most crucial ways to communicate information about someone’s psychological response, but they are continually constrained by their limited ability to discern beyond neutral and the five fundamental emotions. It competes with the more subtle emotions that are there, which are found in daily life and are also present. This will motivate researchers to concentrate on building more robust deep-learning architectures and larger databases in the future in order to discriminate between all major and secondary emotions. Additionally, in today’s emotion recognition, multimodal complex systems analysis has displaced the role of unimodal analysis. For only a few seconds— roughly three to fifteen—emotional alterations in physiological signals can be seen. Consequently, extracting information about the precise timing of emotional response would yield superior outcomes. In order to accomplish this, a windowdependent strategy will be required throughout the processing of the numerous physiological inputs.

REFERENCES 1. Muda, L., Begam, M., & Elamvazuthi, I. (2010). Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques. Preprint, arXiv:1003.4083. 2. Li, X. Y., & Lin, Z. X. (2017, October). Face recognition based on HOG and fast PCA algorithm. In The Euro-China Conference on Intelligent Data Analysis and Applications (pp. 10–21). Springer, Cham. 3. Mellouk, W., & Handouzi, W. (2020). Facial emotion recognition using deep learning: Review and insights. Procedia Computer Science, 175, 689–694. 4. Ekman, P., & Friesen, W. V. (2003). Unmasking the Face: A Guide to Recognizing Emotions from Facial Clues (Vol. 10). San Jose, CA: Malor Books. 5. Feldman, L. A. (1995). Valence focus and arousal focus: Individual differences in the structure of affective experience. Journal of Personality and Social Psychology, 69(1), 153. 6. Chen, A., Xing, H., & Wang, F. (2020). A facial expression recognition method using deep convolutional neural networks based on edge computing. IEEE Access, 8, 49741–49751. 7. Nguyen, T. D. (2020). Multimodal emotion recognition using deep learning techniques (Doctoral dissertation, Queensland University of Technology). 8. Bertero, D., & Fung, P. (2017, March). A first look into a convolutional neural network for speech emotion detection. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5115–5119). IEEE. 9. Hasan, D. A., Hussan, B. K., Zeebaree, S. R., Ahmed, D. M., Kareem, O. S., & Sadeeq, M. A. (2021). The impact of test case generation methods on the software performance: A review. International Journal of Science and Business, 5(6), 33–44.

30

Multimedia Data Processing and Computing

10. Rouast, P. V., Adam, M. T., & Chiong, R. (2019). Deep learning for human affect recognition: Insights and new developments. IEEE Transactions on Affective Computing, 12(2), 524–543. 11. Lan, Y. T., Liu, W., & Lu, B. L. (2020, July). Multimodal emotion recognition using deep generalized canonical correlation analysis with an attention mechanism. In 2020 International Joint Conference on Neural Networks (IJCNN) (pp. 1–6). IEEE. 12. Priyasad, D., Fernando, T., Denman, S., Sridharan, S., & Fookes, C. (2020, May). Attention driven fusion for multi-modal emotion recognition. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 3227–3231). IEEE. 13. Krishna, D. N., & Patil, A. (2020, October). Multimodal emotion recognition using cross-modal attention and 1D convolutional neural networks. In Interspeech (pp. 4243–4247). 14. Caihua, C. (2019, July). Research on multi-modal mandarin speech emotion recognition based on SVM. In 2019 IEEE International Conference on Power, Intelligent Computing and Systems (ICPICS) (pp. 173–176). IEEE. 15. Liu, G., & Tan, Z. (2020, June). Research on multi-modal music emotion classification based on audio and lyric. In 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC) (Vol. 1, pp. 2331–2335). IEEE. 16. Lee, J. H., Kim, H. J., & Cheong, Y. G. (2020, February). A multi-modal approach for emotion recognition of TV drama characters using image and text. In 2020 IEEE International Conference on Big Data and Smart Computing (BigComp) (pp. 420–424). IEEE. 17. Siriwardhana, S., Reis, A., Weerasekera, R., & Nanayakkara, S. (2020). Jointly finetuning “bert-like” self supervised models to improve multimodal speech emotion recognition. Preprint, arXiv:2008.06682. 18. Zhang, X., Liu, J., Shen, J., Li, S., Hou, K., Hu, B., & Zhang, T. (2020). Emotion recognition from multimodal physiological signals using a regularized deep fusion of kernel machine. IEEE Transactions on Cybernetics, 51(9), 4386–4399. 19. Nie, W., Yan, Y., Song, D., & Wang, K. (2021). Multi-modal feature fusion based on multi-layers LSTM for video emotion recognition. Multimedia Tools and Applications, 80(11), 16205–16214. 20. Zhou, H., Meng, D., Zhang, Y., Peng, X., Du, J., Wang, K., & Qiao, Y. (2019, October). Exploring emotion features and fusion strategies for audio-video emotion recognition. In 2019 International Conference on Multimodal Interaction (pp. 562–566). 21. Samadiani, N., Huang, G., Cai, B., Luo, W., Chi, C. H., Xiang, Y., & He, J. (2019). A review on automatic facial expression recognition systems assisted by multimodal sensor data. Sensors, 19(8), 1863. 22. Lucey, P., Cohn, J. F., Matthews, I., Lucey, S., Sridharan, S., Howlett, J., & Prkachin, K. M. (2010). Automatically detecting pain in video through facial action units. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 41(3), 664–674. 23. Seng, K. P., Suwandy, A., & Ang, L. M. (2004, November). Improved automatic face detection technique in color images. In 2004 IEEE Region 10 Conference TENCON 2004. (pp. 459–462). IEEE. 24. Hidayat, R., Jaafar, F. N., Yassin, I. M., Zabidi, A., Zaman, F. H. K., & Rizman, Z. I. (2018). Face detection using min-max features enhanced with locally linear embedding. TEM Journal, 7(3), 678. 25. Zuo, X., Zhang, C., Hämäläinen, T., Gao, H., Fu, Y., & Cong, F. (2022). Cross-subject emotion recognition using fused entropy features of EEG. Entropy, 24(9), 1281. 26. Soleymani, M., Pantic, M., & Pun, T. (2011). Multimodal emotion recognition in response to videos. IEEE Transactions on Affective Computing, 3(2), 211–223.

Emotion Recognition Using Multimodal Fusion Models: A Review

31

27. Seng, K. P., Ang, L. M., & Ooi, C. S. (2016). A combined rule-based & machine learning audio-visual emotion recognition approach. IEEE Transactions on Affective Computing, 9(1), 3–13. 28. Li, J., Jin, K., Zhou, D., Kubota, N., & Ju, Z. (2020). Attention mechanism-based CNN for facial expression recognition. Neurocomputing, 411, 340–350. 29. Jaiswal, M., & Provost, E. M. (2020, April). Privacy enhanced multimodal neural representations for emotion recognition. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, No. 05, pp. 7985–7993).

3

Comparison of CNN-Based Features with Gradient Features for Tomato Plant Leaf Disease Detection Amine Mezenner, Hassiba Nemmour, Youcef Chibani, and Adel Hafiane

3.1 INTRODUCTION Recently, improving agricultural crops became a major sector for national economies due to the population increase and climatic disturbances. In this respect, various strategies are adopted to predict favorable factors for plant disease, such as the weather effects on developing infections, as well as the monitoring of plant leaves for early detection of viruses. In fact, plant health can be inspected in the leaves’ color, edges, and textures. Therefore, automatic plant disease–detection systems are based on leaf-image analysis. Previously, this task was carried out by agronomists who decide if a plant is infected or healthy. Currently, thanks to advances in computer vision and artificial intelligence, powerful systems were developed to make automatic disease detection by analyzing plant-leaf images [1]. Like most of computer vision systems, plant leaf disease–detection systems are composed of three main steps, which are preprocessing, feature generation, and classification [2]. Preprocessing operations, such as background removing, resizing, and image filtering, aim to enhance the visual quality of images and facilitate the extraction of useful information. Note that feature generation is the most critical step within the disease detection and classification system since it has to extract pertinent information that can distinguish between healthy leaves and infected leaves. The state of the art reports the use of all well-known descriptors of pattern recognition and computer vision. Earliest works focused on the use of color, texture, and shape information. In regard to color features, we can report the use of the color histograms generated from the RGB (red, green, and blue) or the HSB (hue, saturation, and brightness) representations, as well as the International Commission on Illumination (CieLAB) and YCbCr features [3]. On the other hand, various forms of local binary patterns (LBPs) are applied to generate textures from images. 32

DOI: 10.1201/9781003391272-3

Comparison of Features for Tomato Plant Leaf Disease Detection

33

The gradient features are commonly extracted by using a histogram of oriented gradients. Both descriptors evince high performance in various application aims [4–6]. Furthermore, to achieve the classification step, machine-learning techniques, especially artificial neural networks and support vector machines (SVMs) are widely used [5, 7]. Recently, the success of artificial intelligence has promoted the use of convolutional neural networks (CNNs), which associate feature generation and classification in the same process. Precisely, in a CNN, convolutional layers ensure feature verification, while the fully connected layers achieve the classification task. In this respect, CNN can be used either as a feature generator or as an end-to-end classification system. The state of the art indicates that deep-learning models are the best systems for solving plant leaf disease detection and classification [8]. Nevertheless, performance observation obtained for various public datasets reveals that handcrafted features still give promising performance. In this chapter we compare two famous handcrafted features with CNN-based features for characterizing tomato leaf diseases. Specifically, we propose the use of local directional patterns (LDPs) as a new plant leaf descriptor. This is an LBP-like descriptor that is calculated on image edges extracted by using the Kirsch detector. Additionally, we utilize the histogram of oriented gradients (HOG) in this comparative study. The detection step is performed by an SVM. The performance assessment of this system is carried out comparatively to CNNs employed as an end-to-end detection system. Experiments are conducted on ten tomato classes including the healthy leaves and nine kinds of diseases. This chapter is organized as follows: Section 3.2 describes the proposed methods to develop the detection system. Section 3.3 presents the experimental analysis, while the last section gives the main conclusions of this work.

3.2  PROPOSED SYSTEM FOR TOMATO DISEASE DETECTION As mentioned above, three steps – preprocessing, feature generation, and classification – are required to develop a plant leaf disease detection–system (See Figure 3.1). Currently, the preprocessing step is limited to dimension reduction,

FIGURE 3.1  Steps composing a plant leaf disease–detection system.

34

Multimedia Data Processing and Computing

where the image size is reduced to 40 × 40 pixels, in order to minimize the time consumption of the system development. For feature generation, we propose local directional patterns, the histogram of oriented gradients, and features extracted from a customized LeNeT-5 model. The classification is carried out by an SVM.

3.2.1  Local Directional Pattern LDP is an edge descriptor that considers edge responses in different directions around each pixel [8]. The first step in the LDP calculation consists of applying the Kirsch edge detector in order to highlight the image edges according to eight directions. As shown in Figure 3.2, there are eight filtering masks that characterize all possible edge directions within images. Note that, for each direction, the presence of a corner or an edge corresponds to high values. The eight resulting images are grouped into a single maxima image that takes the maximal value among the eight values of each pixel. In the second step, the LDP histogram is assessed by considering the K most prominent directions [9]. So, by considering the pixel neighborhood in the maxima image, the highest K values take 1 in the LDP code while remaining neighbors take 0. Then, similarly to LBP, binary codes are converted into decimal values to calculate the histogram of occurrences that is the LDP descriptor. Figure 3.3 illustrates the LDP calculation for various values of K that is an integer in the range [1: 8].

FIGURE 3.2  Filters utilized in the Kirsch edge detector.

Comparison of Features for Tomato Plant Leaf Disease Detection

35

FIGURE 3.3  LDP calculation for a given pixel.

3.2.2  Histogram of Oriented Gradient (HOG) HOG is a well-known computer vision descriptor that was introduced for human detection which showed outstanding performance in several pattern-recognition and classification tasks [10, 11]. It is calculated on a dense grid of evenly spaced cells by counting occurrences of gradient orientations in localized portions of an image. For each cell, HOG features are calculated according to the following steps: 1. Compute the horizontal and vertical gradients for each pixel such that

Gx ( x ,  y )   =   I ( x ,  y   +  1)   −   I ( x ,  y   −  1) (3.1)



Gy ( x ,  y )   =   I ( x   −  1,  y )   −   I ( x   +  1,  y ) (3.2) 2. Evaluate the gradient magnitude and phase as follows:



Magnitude ( x , y ) = Gx2 + G y2 (3.3)



 Gy  Angle ( x , y ) = tan −1  (3.4)  Gx  3. Accumulate magnitudes according to their orientations to get the oriented gradient histogram of the cell. 4. Finally, concatenate histograms obtained for all cells to form the image HOG descriptor.

Figure 3.4 highlights the gradient magnitude and angle calculation for a given pixel.

3.2.3 Convolutional Neural Network-based Features To extract CNN features, we propose a customized LeNet-5 model that is composed of three convolutional layers associated with max-pooling and batch normalization

36

Multimedia Data Processing and Computing

FIGURE 3.4  Example of gradient magnitude and angle calculation.

steps. A flattening layer is used to adapt convolved maps to the prediction bloc that is composed of three fully connected layers. The network architecture was experimentally tuned by considering the CNN as an end-to-end plant leaf disease–detection system. After the training stage, the CNN is used as a feature generator by selecting outputs of the flattening layer or outputs of a fully connected layer as indicated in Figure 3.5.

3.2.4 Support Vector Machine–based Classification The detection task is achieved by an SVM that aims to separate plant leaf images into healthy or infected classes. This classifier is considered the best off-the-shelf classifier for various applications. Regarding plant disease detection and classification, SVM showed an outstanding performance for soybeans and bean disease detection

FIGURE 3.5  Proposed CNN for deep-feature generation.

Comparison of Features for Tomato Plant Leaf Disease Detection

37

by 96.25% and 100% in the overall accuracy [12, 13]. The training process consists of finding an optimal hyperplane that maximizes the margin between two classes [14]. Then, data are assigned to classes by using the following decision function:

 F ( p ) = sign  





∑α K ( p , p) + b (3.5) i

i

i =1

p : Feature vector of a given plant leaf image αi: Lagrange multiplier of the sample pi Sν: The number of support vectors b: Bias of the decision function The adopted kernel function K is the radial basis function that is calculated as RBF ( pi , p ) = e



− ( pi − p ) 2σ 2

2

(3.6)

σ: user defined parameter.

3.3  EXPERIMENTAL ANALYSIS Performance assessment of the proposed system is carried out on tomato plant leaf images extracted from the PlantVillage dataset, which offers ten tomato classes including healthy class and various infections. Figure 3.6 reports the number of samples per class, while Figure 3.7 shows a sample from each class of interest.

FIGURE 3.6  Data distribution in the adopted dataset.

38

Multimedia Data Processing and Computing

FIGURE 3.7  Samples from the studied tomato disease classes.

From this dataset, two-thirds of the samples were used in the training stage, while the remaining samples were used for performance evaluation. The experimental design focuses on evaluating the effectiveness of the proposed features that are LDPand CNN-based features. For comparison purposes, experiments are conducted on the end-to-end CNN and the HOG+SVM systems as well. First, experiments are carried out according to disease-specific detection, which aims to detect a single disease. Then, a global detection test is performed in which all infected leaves compose the disease class. In both experiments the detection task is considered a binary classification problem where the healthy class is confronted to all disease classes. In the first step, several passes were executed to find the best CNN architecture. Experiments based on the end-to-end CNN accuracy were conducted to the configuration reported in Table 3.1. After the training stage, the CNN can be used as a feature generator by considering outputs of intermediate layers of the prediction bloc. This concerns outputs of the flattening layer or those of any fully connected dense layer. Before evaluating the CNN features as well as the LDP features in association with SVM classifiers, the baseline was the HOG-based system since HOG features are older than LDP and CNN. Nevertheless, it was necessary to inspect the performance of the end-to-end CNN. Therefore, in a first experiment, the end-to-end CNN detection was compared to HOG-SVM detection. In this chapter, the detection is focused on a single disease class. The results obtained for various tests are summarized in Table 3.2. Note that HOG has been locally computed by dividing images into 4×4 cells in order to improve the gradient characterization. For each cell, a specific HOG histogram is evaluated and the full image descriptor was obtained by concatenating all histograms. As can be seen, the CNN presents higher accuracies than HOG-SVM system. Precisely, the detection score exceeds 97% in most cases. Therefore, in the second

39

Comparison of Features for Tomato Plant Leaf Disease Detection

TABLE 3.1 Summary of the Proposed CNN Architecture Layer Conv1 Batch Norm Max Pooling Conv2 Batch Norm Max Pooling Conv3 Batch Norm Max Pooling Flatten Dense 1 Dense 2 Dense 3 Dense 4

# Filters/Nodes 32

Filter Size 3×3

64

3×3

128

3×3

Padding Same X X Same X X Same X X

48,280 120 120 84 2

Activation Function ReLu

ReLu

ReLu

ReLu ReLu ReLu Softmax

step of experiments, the end-to-end CNN is considered as a baseline when evaluating both CNN and LDP features in association with SVM classifiers. Furthermore, CNN features are extracted by considering, respectively, outputs of the flattening layer (F), the first fully connected (FC1) layer, the second fully connected layer (FC2), and the third fully connected layer (FC3). The results in terms of overall detection accuracy are reported in Table 3.3. For each feature, the best SVM parameters that allow an optimal training accuracy were selected. Also, for LDP features, the K value was experimentally fixed at 3. As illustrated in Figure 3.8, which depicts the tomato disease detection accuracy for various K values, the performance

TABLE 3.2 Detection Accuracy (%) Obtained for HOG-SVM and CNN Systems Detection Task Tomato healthy/Target spot Tomato healthy/Mosaic virus Tomato healthy/Yellow leaf curl virus Tomato healthy/Bacterial spot Tomato healthy/Early blight Tomato healthy/Late blight Tomato healthy/Leaf mold Tomato healthy/Septoria leaf spot Tomato healthy/Two spotted spider mite

HOG+SVM 58.56 80.65 86.30 82.80 91.44 92.50 81.50 76.00 85.30

CNN 88.88 97.40 99.87 100 82.75 99.83 98.94 99.91 99.72

40

Multimedia Data Processing and Computing

TABLE 3.3 Tomato Disease Detection Accuracy (%) Obtained by Using CNN and LDP Features Detection Task Healthy/ Target Spot Healthy/ Mosaic virus Healthy/ Yellow leaf curl virus Healthy/ Bacterial spot Healthy/ Early blight Healthy/ Late blight Healthy/ Leaf mold Healthy/ Septoria leaf spot Healthy/ Two spotted spider mite Healthy/ All diseases tomato

CNN 88.88 97.40 99.87

F+SVM 99.70 99.85 100

FC1+SVM 99.90 99.69 100

FC2+SVM 99.50 99.70 100

FC3+SVM 99.60 99.70 100

LDP+SVM 100 100 100

100 82.75 99.83 98.94 99.91

99.92 99.65 100 99.76 99.82

99.92 99.54 99.91 99.76 100

100 99.65 99.83 99.76 99.91

100 99.31 99.83 98.94 99.91

100 100 100 100 100

88,.

96.42

99.27

99.36

99.63

100

95.15

99.46

99.78

99.75

99.66

100

increases from 94.4% for K=1 to 100 for a K=3. This finding can be explained by the fact that there is a need of at least two orientations to detect curved edges since a single orientation cannot highlight linear information in a given shape. When using this configuration, the LDP provides optimal performance that reaches 100% in all detection tasks. In contrast, the end-to-end CNN as well as the CNN-SVM systems provide lower accuracies. Specifically, the use of SVM in replacement of the CNN output improves the detection accuracy from 95.15 with the end-to-end CNN to more than 99% with all CNN-SVM combinations. Also, we can note that fully connected layers provide somewhat more pertinent features than the flattening layer since features allow better accuracy and have smaller size. The best detection accuracy that

FIGURE 3.8  Detection results for various LDP K values.

Comparison of Features for Tomato Plant Leaf Disease Detection

41

is about 99.78% is obtained when using the first fully connected layer that contains 120 nodes. Nevertheless, this still remain less effective than LDP features.

3.4 CONCLUSION This chapter proposed a system for plant-leaf disease detection based on an SVM classifier. The aim was to evaluate the effectiveness of handcrafted gradient features with respect to CNN-based features. Precisely, we proposed the local directional patterns (LDPs) as new plant-leaf image descriptors, which combine texture and edge information obtained by applying the Kirsch detector on images. This combination highlights the textural information in various directions. LDP was compared to the histogram of oriented gradients, which is one of the most commonly used descriptors in computer vision. Also, a comparison with various CNN-based features was carried out. Experiments were conducted to detect nine tomato leaf diseases on data extracted from the PlantVillage dataset. The obtained results reveal that HOG provides medium performance compared to other features. The end-to-end CNN as well as the system associating CNN-based features with SVM provide similar accuracies that are much better than HOG results. Nevertheless, the association of LDP features with SVM outperforms all other systems, since it allows an optimal accuracy when the suitable K value is used. From these outcomes we infer that LDP can be a typical plant-leaf disease descriptor. Further tests on other kinds of species are necessary to confirm again the effectiveness of this descriptor.

REFERENCES 1. Oo, YM., Htun, NC. 2018. Plant leaf disease detection and classification using image processing. International Journal of Research and Engineering, Vol. 5, 9516–9523. 2. Shruthi, U., Nagaveni, V., Raghavendra, B. 2019. A review on machine learning classification techniques for plant disease detection. 2019 5th International Conference on Advanced Computing & Communication Systems, 281–284. 3. El Sghair, M., Jovanovic, R., Tuba, M. 2017. An algorithm for plant diseases detection based on color features. International Journal of Agriculture, Vol. 2, 1–6. 4. Vishnoi, VK., Kumar, K., Kumar, B. 2021. Plant disease detection using computational intelligence and image processing. Journal of Plant Diseases and Protection, Vol. 128, 119–153. 5. Tan, L., Lu, J., Jiang, H. 2021. Tomato leaf diseases classification based on leaf images: A comparison between classical machine learning and deep learning methods. Agr iEngineering, Vol. 3, 3542–3558. 6. Patil, P., Yaligar, N., Meena, S. 2017. Comparision of performance of classifiers-SVM, RF and ANN in potato blight disease detection using leaf images. IEEE International Conference on Computational Intelligence and Computing Research, 1–5. 7. Cruz, A., Ampatzidis, Y., Pierro, R., Materazzi, A., Panattoni, A., De Bellis, L. Luvisi, A. 2019. Detection of grapevine yellows symptoms in Vitis vinifera L. with artificial intelligence. Computers and Electronics in Agriculture, Vol. 157, 63–76. 8. Saleem, M.H., Potgieter, J., Arif, K.M. 2019. Plant disease detection and classification by deep learning. Plants, Vol. 81, 1468. 9. Jabid, T., Kabir, M.H., Chae, O. 2010. Local directional pattern (LDP)–A robust image descriptor for object recognition. IEEE International Conference on Advanced Video and Signal Based Surveillance, 482–487.

42

Multimedia Data Processing and Computing

10. Dalal, N., Triggs, B. 2005. Histograms of oriented gradients for human detection. IEEE International Computer Society Conference on Computer Vision and Pattern Recognition, 886–893. 11. Kusumo, BS., Heryana, A., Mahendra, O., Pardede, H.F. 2018. Machine learning-based for automatic detection of corn-plant diseases using image processing. International Conference on Computer, Control, Informatics and Its Applications, 93–97. 12. Yao, Q., Guan, Z., Zhou, Y., Tang, J., Hu, Y. Yang, B. 2009. Application of support vector machine for detecting rice diseases using shape and color texture features. International Conference on Engineering Computation, 79–83. 13. Mokhtar, U., Bendary, NE., Hassenian, AE., Emary, E., Mahmoud, MA., Hefny, H., Tolba, MF. 2015. SVM-based detection of tomato leaves diseases. Intelligent Systems, Vol. 323, 641–652. 14. Pires, RDL., Gonçalves, DN., Oruê, JPM., Kanashiro, WES., Rodrigues Jr, JF., Machado, BB., Gonçalves, WN. 2016. Local descriptors for soybean disease recognition local descriptors for soybean disease recognition. Computers and Electronics in Agriculture, Vol. 125, 48–55.

4

Delay-sensitive and Energy-efficient Approach for Improving Longevity of Wireless Sensor Networks Prasannavenkatesan Theerthagiri

4.1 INTRODUCTION Technology is rapidly advancing in the current period, and our lives are getting more automated and secure as a result. Wireless sensor networks (WSNs) are an example of a technology that has become increasingly important in our daily lives. As the name implies, it is a form of network (without wires) with dispersed and selfmonitoring devices that use sensors to monitor physical and natural conditions as shown in Figure 4.1. WSNs have become an integral part of numerous applications, ranging from environmental monitoring and disaster management to healthcare, agriculture, and industrial automation. These networks consist of spatially distributed sensor nodes that collaborate to collect, process, and transmit data to a central base station or sink. However, the limited energy resources of battery-powered sensor nodes and the time-sensitive nature of certain applications pose significant challenges to the performance and longevity of WSNs. This research aims to develop a delaysensitive and energy-efficient approach for improving longevity of WSNs, ensuring that the critical constraints of energy consumption and delay sensitivity are addressed effectively. The proposed approach will focus on three primary aspects: adaptive routing, intelligent clustering, and energy-aware scheduling. By considering the interplay between these factors, we aim to develop a comprehensive solution that can optimize the performance and longevity of WSNs while satisfying the requirements of delaysensitive applications.

4.2  THE INTERNET OF THINGS The IoT is a collection of physical entities or objects. This system combines software, microelectronics, and sensors to achieve greater capability through the exchange of evidence with producers, operators, and a variety of other tactics. An integration of cloud computing proficiency and movable devices is employed to make movable DOI: 10.1201/9781003391272-4

43

44

Multimedia Data Processing and Computing

FIGURE 4.1  Wireless Sensor Networks [1].

devices more proficient. The integration of mobile devices with the cloud is beneficial in terms of increasing computational power and storage. The applications of IoT and cloud workout in business are considered in order to recognize the entire distribution, allowed communication, on-demand use, and ideal sharing of various household assets and capabilities. We have the opportunity to enhance the use of current knowledge that is available in cloud environments by combining IoT and cloud. This combination can provide IoT requests with cloud storage [2].

4.3 ROUTING PROTOCOL FOR LOW-POWER AND LOSSY NETWORKS It has been widely discussed that RPL may be used to aid in the routing of sensor nodes. Nodes are typically deployed as a permanent backbone between other nodes. In the RPL static backbone network, a graded routing strategy is used. The geography of movement is very dynamic, with frequent disturbances from neighboring nodes. To the ever-changing landscape, a mobile node will continue to transfer packets it has received to its boundaries (parents) even after it has moved out of their range [3]. Load imbalance in the RPL system is a major contributor to slowdowns, dead spots in the energy distribution, premature node failure, and overall subpar network operation. A major interruption to the RPL network is likely if the affected nodes are only one hop distant from the sink or root. Hence, it is necessary to develop efficient load equalization solutions to avoid these problems [4]. Since the power of the parent node that has been unfairly selected to bear the bulk of the network’s load may deplete much more quickly than that of other nodes, the network may become partially severed as a result of the imbalance. The reliability of the network may suffer if the battery life of that overworked parent node continues to decline. This

Delay-sensitive and Energy-efficient Approach

45

is a huge, widely known disadvantage [3]. Early node death, energy depletion, and buffer occupancy are all results of imbalanced loads. As a result of these issues, load equalization techniques were designed to extend the RPL network’s node and network lifetimes [5]. By 2030, there will be 500 billion internet-enabled devices, claims Cisco. The three main components of the Internet of Things architecture are the application, transport, and sensing layers. The sensor layer is in charge of gathering information. The application layer provides numerous computational instruments for mining data for insights. This layer bridges the gap between the final consumers and the myriad of internet-enabled gadgets on the market. The transport layer is responsible for facilitating communication through the network [1]. Smart health, autonomous driving (intelligent transportation system), smart agriculture, and smart manufacturing are just few of the many areas in which IoT technology is applied [6]. Rapid developments in the IoT have facilitated the development of WSNs. Wi-Fi sensor networks are an integral part of the IoT paradigm’s sensing layer. WSNs typically involve a distributed collection of sensor nodes that can operate independently of one another. A sink node receives the data from the source nodes and either processes it locally or sends it on to another network [7]. In a wide variety of WSN application fields, sensor nodes have restricted storage, computation efficiency, node energy, and power profile [8]. Due to sensors’ low energy reserves, WSNs’ expected lifetime is a challenging factor to address before actual deployment. Shorter lifetimes are experienced by nodes closest to the sink because they are responsible for propagating data from all intermediate nodes [9]. A sink node can be either stationary (usually located in the hub of the WSN) or mobile, allowing for its placement in a variety of environments. Nodes that are close to a static sink and function as a relaying node or router are far more likely to die than nodes that are far from the sink [10]. The amount of energy needed to send messages depends on how far the sensor nodes are from the sink node [11]. Routing load and node energy dissipation can be more effectively balanced with sink mobility. The anchor nodes are located by the mobile sink node, which searches for them based on distance, communication range, and energy. This idea helps extend the life of the network [12] because the anchor node is spared the burden of transporting the data of other nodes. Throughput, coverage, data quality, and security are all enhanced when a mobile sink is used [13]. The remaining parts of this project are structured as follows. In the second part, we take a look at what researchers have accomplished thus far. The proposed actions are detailed in Section 4.3. The effectiveness of static and random sinks is compared in Section 4.4. The final section of the paper concludes the discussion.

4.4  RELATED WORK In the past decade, WSNs have emerged as one of the most promising technologies for many applications. However, WSNs are often deployed in harsh environments and remote locations where battery replacement or recharging is not possible. As a result, energy efficiency has become a significant concern in WSNs as it directly impacts their longevity.

46

Multimedia Data Processing and Computing

Several researchers have proposed different approaches to improve the longevity of WSNs. A literature survey reveals that most of these approaches address either energy efficiency or delay sensitivity but rarely both simultaneously. To overcome this limitation, some researchers have suggested using adaptive duty cycling schemes that consider both parameters to prolong network lifetime. However, there are still some challenges associated with such an approach in terms of providing guarantees on meeting delay requirements while optimizing energy consumption. Therefore, further research is required to find optimal solutions for improving the longevity of WSNs through a balance between delay sensitivity and energy efficiency. In addition, adaptive duty cycling schemes are generally not very effective in addressing the fundamental causes of network degradation. Instead, they rely on existing mechanisms to prolong network lifetime by optimizing system parameters. This is not to say that these schemes are necessarily ineffective. Rather, they act in a different manner than conventional approaches and thus should be used appropriately. To address network degradation, adaptive duty cycling schemes take the form of an iterative process involving a series of steps (Figure 4.2). As mentioned above, there are three types of signal duty cycles: fixed (or static), adaptive, and adaptive variable. Adaptive duty cycling schemes provide for the possibility of duty cycle changes over time. Adaptive duty cycling may be used to reduce power consumption and/or provide for adaptive operation, where the duty cycle is adjusted based on some criteria. For example, an adaptive duty cycling scheme may be used to adaptively adjust the duty cycle in response to changes in ambient temperature and other environmental factors. Other adaptive duty cycling schemes are also contemplated as shown in Tables 4.1, 4.2 and 4.3.

FIGURE 4.2  IoT and cloud computing integration.

Delay-sensitive and Energy-efficient Approach

47

TABLE 4.1 Summarization of RPL Improvements Author Jamal Toutouh et al. [14]

Year 2012

David Carels et al. [15] Belghachi Mohamed and Feham Mohamed [16] H. Santhi et al. [17]

2015

Meer M. Khan et al. [3] Weisheng Tang et al. [18]

2016

Hossein Fotouhi et al. [4]

2017

Patrick Olivier Kamgueu et al. [19]

2018

2015

2016

2016

Approach Using an optimization problem to optimize the routing protocol’s parameter values A new method for advancing down-track updates The RPL protocol’s next hop-selecting technique, making use of the remaining energy and the broadcast interval Novel and efficient routing protocol with higher throughput and reduced end-to-end delay, designed specifically for multi-hop wireless applications A framework for sink-to-sink synchronization Use of CA-RPL, an RPL-based composite routing metric, to avoid congestion. mRPL+ (mobility-controlling structure) combining two hand-off models: 1. hard hand-off, in which a mobile hub must break down a connection before discovering another 2. software hand-off, in which a mobile hub selects a new connection before disconnecting from the current one Examining recent RPL projects and highlighting major commitments to its enhancement, particularly in the areas of topology streamlining, security, and portability

Findings When faced with an optimization challenge, strategies for solving the problem are more effective than those of any other metaheuristic algorithm that has been studied so far. End-to-end latency decreased by up to 40%, while the packet delivery ratio increased by up to 80%, depending on the conditions. With more data on sensor availability for assets and the implementation of energy and delay-aware routing strategies, RPL has become more energy-competent. This enhanced version of the associativity based routing protocol not only provides straightforward and reliable paths but also the most effective and optimal paths between the origin and the destination. By distributing network stress among sink nodes, the network achieves higher throughputs and has a longer life span. Average delays are decreased by 30% using CA-RPL compared to the traditional RPL. There is a 20% reduction in packet loss when the inter packet period is short. For complex traffic flow stacks, a delicate hand-off model can guarantee outstanding unchanging quality (100% PDR) with extremely little hand-off delay (4ms) and exceptionally cheap overhead (like RPL). Lower traffic flow piles benefit more with mRPL+ than RPL because of the faster system detachment times.

RPL has been linked to security concerns, specifically those involving inner hubs as a source of danger. The methods of moderation used to combat the various threats were reviewed and analyzed.

48

Multimedia Data Processing and Computing

TABLE 4.2 Research Gaps Author Name Hyung-Sin Kim et al., 2015 [20]

Rahul Sharma and T. Jayavignesh, 2015 [2]

Amol Dhumane et al., 2015 [21]

Fatma Somaa, 2017 [22]

Methodology (QU-RPL) is a queue deployment-based RPL that improves end-to-end delay and packet transfer performance significantly. Two objective functions were used. To examine the performance of RPL in various radio models, we used 1. expected transmission count, 2. objective function zero To examine the operation of routing procedure above low power and lossy network (RPL), use Internet of Things routing standard. Implement Bayesian statistics for estimating the variability of sensor hub speeds. To aid with RPL flexibility, we introduce the mobility-based braided multi-way RPL (MBM-RPL).

Licai Zhu et al., 2017 [23]

Use that adaptive multipath traffic loading technique based on RPL.

Jad Nassar et al., 2017 [24]

Optimize for multiple goals simultaneously – in this case, delay, node residual energy, and connection quality. To better distribute data traffic across the network, a new RPL metric has been implemented.

Mamoun Qasem et al., 2017 [25]

Hossein Fotouhi et al., 2017 [4]

A system was developed that takes three elements into account: window size, hysteresis margin, and stability monitoring.

Research Gap Packet losses are common in high traffic due to overcrowding. RPL has a serious load-balancing difficulty when it comes to routing parent selection. The network becomes congested when there is an excessive number of overhead packets being created to retransmit packets that have been lost. Power usage increased as a result of data packet buffering and channel monitoring. Routing rules that are more traditional bring their routing table up to date from time to time. This strategy for bringing RPL up to current on a regular basis is ineffective. Each proposed solution to the problem of RPL flexibility rested exclusively on the existence of a different route to the sink from each Destination-Oriented Direction Acyclic Graph (DODAG) hub. None of these setups seemed to employ a fallback plan in case the primary line went down. Hubs around the sink continue to use more energy as a result of the increased traffic. They remain the system’s bottlenecks for the time being. However, traffic in Singapore is not always consistent.

In RPL, a parental node can serve multiple children if that is the option selected by the parent. As their energy supplies decrease much more quickly than other parent nodes, the overworked preferred parents will become vulnerable nodes. Despite the fact that the goal of these criteria is to address mobility concerns, they have a number of drawbacks. (Continued)

49

Delay-sensitive and Energy-efficient Approach

TABLE 4.2 (Continued) Research Gaps Author Name Hanane Lamaazi et al., 2018 [5]

Vidushi Vashishth, 2019 [26]

Methodology The implementation of RPL is evaluated using three configurations: network scalability, numerous sinks, and movement models. An optimization technique for conserving energy in IoT can make use of clustering, cluster head selection, and less energy-expensive path calculation for effective routing.

Research Gap RPL activity estimates cannot be applicable to all of them in the same situation.

Only a few nodes in the network are dynamically involved in message production and transmission. The remaining nodes in the network waste energy by waiting for interruptions or certain events to occur.

4.5  ENERGY AND TIME EFFICIENCY NETWORK MODEL The performance of the entire network can be enhanced by utilizing different tactics or variables as described in this section. Network lifetime and delay therefore play a significant role among these characteristics. Our work is solely focused on extending network life and reducing delay. N numbers of heterogeneous sensors have been placed throughout a region. A mobile sink node is allowed to roam freely throughout the whole wireless sensor network, while a static sink node is used at the network’s hub. (Static sinks are stationary with a constant point, and sit inside or closer to the detecting zone.) The same fixed communication radius is used for data transfer between sinks and all sensor nodes.

4.5.1  Energy Efficiency Network Model Experience gained from cellular and wireless heterogeneous networks shows that power consumption does not vary with the data load carried by the network while TABLE 4.3 Research Gaps No. 1 2 3 4 5 6 7 8

Name Sensor Nodes Count MAC type Routing Protocol Initial Energy Idle Power Receiving Power Transmission Power Sleep Power

Values 25, 50 Mac/802_11 RPL 100j 675e-6 6.75e-6 10.75e-5 2.5e-8

50

Multimedia Data Processing and Computing

it is being set up or running. Assuming constant power consumption and ideal data-traffic conditions, we can calculate the average power consumption (Pci) for a base section (BS) as

(

)

Pci = N sec N ant Ai Ptx + B j + PBHi (4.1)



Nant is the total number of antennas at that base station, and Nsec is the total number of sectors. The transmitted power, or Ptx, of each base station is different from the combined average power, or Pci. The coefficient Ai represents the portion of Pci that is directly tied to the average transmitted power from a base station, whereas the coefficient Bi represents the portion of Pci that is used regardless of the average transmitted power. These are crucial components that describe data on energy efficiency in base stations. During the transmission, PBHi is used to determine the total amount of power used. Energy efficiency (EE) is defined as the ratio of data delivered to energy spent, and its formulation is as follows: EE =



Overall data rate RT (4.2) = Total power consumed PCT

RT is data rate: K

Rn =



∑ r n (4.3) k

( k =1)

where K aggregates to the total sub channels assumed for n users. The whole data ratio for totally users can be written as N

Rt =



∑ Rn (4.4) ( n =1)

Data assumed to each user can be labeled as a function of received power:

Prxn  Rn = BWnlog2  1 +  (4.5)  In 

Therefore, the overall data rate movement for all users for some base station in heterogeneous network can be written as N



Rt = η

 (4.6)  ∑ Nrblog2  1 + Prxn In  n =1

The η normally equals 1 (correction factor). From here the EE of a specific base station with consumed power Pc can then be written as

EEi =

Rt , i (4.7) Pc, i

The above equation gives us the required energy efficiency of a station.

51

Delay-sensitive and Energy-efficient Approach

4.5.2 Time Efficiency Network Model The EE model previously described provides an understanding of a certain heterogeneous network’s effectiveness in a given location. We must choose the heterogeneous network area that is most fully exploited. To do it, we must compute the efficiency over a specific time period. Assuming T het to be the length of time for total data transfer across a heterogeneous network, calculating time efficiency is as follows: Te =



EEhet (4.8) T het

Te is then measured in bits per joule per second. Te origins can be traced back to the assumption that two stations have some EE. One macro station, numerous submacro stations, and numerous pico stations make up BSs. Both the data rate and the power consumption for each station must be calculated. A heterogeneous network with a single macro base station, M micro base stations, and P pico base stations has the following energy efficiency: M

Rmacro +

EEhet = Pmacro +



P

Rmicro +

∑Rpico

( M =1)

( P =1)

M

P

( M =1)

( P =1)

∑ Pmicro + ∑Ppico

(4.9)

EEhet signifies the energy effectiveness of the entire diverse system. And if we have the T het, we can compute the time competence for the energy efficiency as follows: Rmacro + Pmacro +

Te =

M

P

( M =1)

( P =1)

M

P

∑ Rmicro + ∑Rpico ∑

Pmicro +

( M =1)

∑Ppico

( P =1)

T ( HETEROGENEOUS )

(4.10)

Area energy efficiency (AEE), which is defined as the bit/joule/unit area, can also be used to compute area time efficiency. You can write the AEE for a certain base station as

AEEi =

EEi , (4.11) ABs, i

where EEi and ABS signify the EE in bit/joule. The area time efficiency (ATE) can be found in a similar manner. Bit/joule/ second/unit area is used to describe its unit. A heterogeneous network area’s ATE can be expressed as

ATEi =

Te, i (4.12) ABs, i

52

Multimedia Data Processing and Computing

4.6  RESULTS AND ANALYSIS Packet distribution percentage, throughput, end-to-end delay, and energy consumption metrics are considered to measure the efficiency of WSN. In the network simulator, two alternative scenarios with varied numbers of nodes (25 and 50) and retaining both the stationary and movable sink are created using the routing protocol for low power and lossy network (RPL). Figure 4.3 shows that when using RPL with a fixed sink node, the average end-toend delay for a network of 25 nodes is 0.004717 milliseconds, whereas when using a mobile sink node, the average end-to-end delay is 0.00165 milliseconds. Using RPL, the average end-to-end latency for a network of 50 nodes is 0.002483 milliseconds when the sink node is fixed in place but only 0.001686 milliseconds when the sink node is mobile. Figure 4.4 displays that the throughput of RPL with a stationary sink node is 25.0735 kbps, whereas the throughput with a mobile sink node is 26.5065 kbps. Similarly, the throughput of RPL with 50 nodes is 17.095 kbps with a stationary sink node, and 18.7855 kbps with a moveable sink node. According to Figure 4.5, the Packet Delivery Ratio via RPL for 25 nodes is 94.59 percent for stationary sink nodes and 100.00 percent for movable sink nodes. For a network of 50 nodes, the Packet Delivery Ratio using RPL with a stationary sink node is 91.00%, and it is 100.00% with a movable sink node. According to Figure 4.6, the amount of energy consumed by RPL with a stationary sink node is 259.513 (J) and the amount of energy consumed by RPL with a movable sink node is 253.395 (J) for 25 nodes, and the amount of energy consumed by RPL with a stationary sink node is 519.406 (J) and the amount of energy consumed by RPL with a movable sink node is 511.172 (J) for 50 nodes.

FIGURE 4.3  Average end-to-end delay.

Delay-sensitive and Energy-efficient Approach

FIGURE 4.4  Throughput.

FIGURE 4.5  Packet delivery ratio.

FIGURE 4.6  Consumed energy.

53

54

Multimedia Data Processing and Computing

4.7  CONCLUSION AND FUTURE SCOPE WSNs can be used to detect and collect information about the weather, natural disasters, and patients. Saving energy is a critical issue in WSNs. Because deployed nodes have limited battery capacity and it is often difficult to replace them, it is vital to conserve energy. When WSNs are put into hostile environments where human contact is limited and refreshing or switching wireless nodes is unfeasible, strategies to reduce power consumption are required to extend network lifetime. Because wireless nodes have limitations, such as low power, low energy, and restricted resources, care must be taken while building a mobility management strategy. The simulation results revealed that using a movable sink improves network lifetime, E2E delay, packet delivery ratio, and throughput. Existing research can be expanded by taking into account several movable sinks as well as network metrics such as routing overheads and latency.

REFERENCES 1. Ibrar Yaqoob et al., “Internet of Things architecture: Recent advances, taxonomy, requirements, and open challenges”, IEEE Wireless Communication, Vol. 24, No. 3, pp. 10–16, Jun. 2017. 2. Rahul Sharma, and Jayavignesh T., “Quantitative analysis and evaluation of RPL with various objective functions for 6LoWPAN”, Indian Journal of Science and Technology, Vol. 8, No. 19, 2015. 3. Meer M. Khan, M. Ali Lodhi, Abdul Rehman, Abid Khan, and Faisal Bashir Hussain, “Sink-to-sink coordination framework using RPL: Routing protocol for low power and lossy networks,” Journal of Sensors, Vol. 2016, 2635429, 2016. 4. Hossein Fotouhi, Daniel Moreira, Mário Alves, and Patrick Meumeu Yomsi, “mRPL+: A mobility management framework in RPL/6LoWPAN,” Computer Communications, Vol. 104, pp. 34–54, 2017. 5. Hanane Lamaazi, Nabil Benamar, and Antonio J. Jara, “RPL-based networks in static and mobile environment: A performance assessment analysis,” Journal of King Saud University – Computer and Information Sciences, Vol. 30, No. 3, pp. 320–333, 2017. 6. Kinza Shafique et al., “Internet of Things (IoT) for next-generation smart systems: A review of current challenges, future trends an prospects for emerging 5G-IoT scenarios”, IEEE Access, Vol. 8, pp. 23022–23040, Feb. 6, 2020. 7. Priyanka Rawat et al., “Wireless sensor networks: A survey on recent developments and potential synergies”, The Journal of Supercomputing, Vol. 68, pp. 1–48, Apr. 2013. 8. Ian F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci, “A survey on sensor networks,” IEEE Communications Magazine, Vol. 40, pp. 102–114, 2002. 9. Xiaobing Wu et al., “Dual-Sink: Using Mobile and Static Sinks for Lifetime Improvement in Wireless Sensor Networks”, 16th IEEE International Conference on Computer Communications and Networks. Aug. 2007. 10. Majid I. Khan et al., “Static vs. mobile sink: The influence of basic parameters on energy efficiency in wireless sensor networks”, Computer Communications, Vol. 36, pp. 965–978, 2013. 11. Euisin Lee et al., “Communication model and protocol based on multiple static sinks for supporting Mobile users in wireless sensor networks”, Journal in IEEE Transactions on Consumer Electronics, Vol. 56, No. 3, pp. 1652–1660, Aug. 2010. 12. Yasir Saleem et al., “Resource Management in Mobile sink based wireless sensor networks through cloud computing”, in: Resource Management in Mobile Computing Environments, pp. 439–459. Springer, Cham, 2014.

Delay-sensitive and Energy-efficient Approach

55

13. Abdul Waheed Khan et al., “A Comprehensive Study of Data Collection Schemes Using Mobile Sinks in Wireless Sensor Networks”, Sensors, Vol. 14, No. 2, pp. 2510–2548, Feb. 2014. 14. Jamal Toutouh, José Garćia-Nieto, and Enrique Alba, “Intelligent OLSR routing protocol optimization for VANETs”, IEEE Transactions on Vehicular Technology, Vol. 61, No. 4, pp. 1884–1894, 2012. 15. David Carels, Eli De Poorter, Ingrid Moerman, and Piet Demeester, “RPL mobility support for point-to-point traffic flows towards Mobile nodes”, International Journal of Distributed Sensor Networks, Vol. 2015, 470349, 2015. 16. Belghachi Mohamed, and Feham Mohamed, “QoS routing RPL for low power and lossy networks,” International Journal of Distributed Sensor Networks, Vol. 2015, 971545, 2015. 17. Santhi H, Janisankar N, Aroshi Handa, and Aman Kaul, “Improved associativity based routing for multi hop networks using TABU initialized genetic algorithm,” International Journal of Applied Engineering Research, Vol. 11, No. 7, pp. 4830–4837, 2016. 18. Weisheng Tang, Xiaoyuan Ma, Jun Huang, and Jianming Wei, “Toward improved RPL: A congestion avoidance multipath routing protocol with time factor for wireless sensor networks”, Journal of Sensors, Vol. 2016, 8128651, 2016. 19. Patrick Olivier Kamgueu, Emmanuel Nataf, and Thomas DjotioNdie, “Survey on RPL enhancements: A focus on topology, security and mobility”, Computer Communications, Vol. 120, pp. 10–21, 2018. https://doi.org/10.1016/j.comcom.2018.02.011. 20. Hyung-Sin Kim, Jeongyeup Paek, and Saewoong Bahk, “QU-RPL: Queue Utilization based RPL for Load Balancing in Large Scale Industrial Applications”, 2015 12th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON), Seattle, WA, USA, 2015, pp. 265–273, doi: 10.1109/SAHCN.2015.7338325. 21. Amol Dhumane, Avinash Bagul, and Parag Kulkarni, “A review on routing protocol for low powerand lossy networks in IoT,” International Journal of Advanced Engineering and Global Technology, Vol. 03, No. 12, December 2015. 22. Fatma Somaa, “Braided on Demand Multipath RPL in the Mobility Context”, 2017 IEEE 31st International Conference on Advanced Information Networking and Applications (AINA), Taipei, Taiwan, 2017, pp. 662–669, doi: 10.1109/AINA.2017.168. 23. Licai Zhu, Ruchuan Wang, and Hao Yang, “Multi-path data distribution mechanism based on RPL for energy consumption and time delay”, Information, Vol. 8, 2017. doi:10.3390/info8040124. 24. Jad Nassar, Nicolas Gouvy, and Nathalie Mitton, “Towards Multi-instances QoS Efficient RPL for Smart Grids”, PE-WASUN 2017 - 14th ACM International Symposium on Performance Evaluation of Wireless Ad Hoc, Sensor, and Ubiquitous Networks, Nov. 2017, Miami, FL, United States. pp. 85–92. 25. Mamoun Qasem, Ahmed Yassin Al-Dubai, Imed Romdhani, and Baraq Ghaleb, “Load Balancing Objective Function in RPL”, https://www.researchgate.net/ publication/313369944. 26. Vidushi Vashishth, “An energy efficient routing protocol for wireless Internet-of-Things sensor networks”, arXiv:1808.01039v2 [cs.NI], Mar. 8, 2019.

5

Detecting Lumpy Skin Disease Using Deep Learning Techniques Shiwalika Sambyal, Sachin Kumar, Sourabh Shastri, and Vibhakar Mansotra

5.1 INTRODUCTION Lumpy skin disease has created great chaos in Asian countries. It is a type of disease that is host specific and it leaves the cattle with a very weak body, infertility, milk reduction, and other serious issues. It may cause death in certain cases. Lumpy skin disease is identified under capripoxvirus genus which is a subfamily of chordopoxivirinae and family poxviridae [1]. The virus is highly resistant. Having a length of 230–300 nanometers, it can cross-react with other capripoxviruses. Clinical signs of lumpy skin disease include fever, lymph node swelling, and skin nodules all over the body that become visible as shown in Figure 5.1. Cattle that are severely infected can suffer from ulcerative lesions in the eye, nasal cavities, and in almost all the organs inside the body [2, 3]. The reported incubation period is one to four weeks. The exact origin of the disease is still unknown, but according to the state of art, the first case reported was in Zambia in 1929, [4] and for a long time it remain limited to Africa. Recently, outbreaks of lumpy skin disease have been reported in China, Bhutan, Nepal, Vietnam, Myanmar, Thailand, Malaysia, and India. This is considered a matter of concern for the dairy industry and livestock [3]. In India, more than 16.42 lakh cattle have been infected, and more than 75,000 deaths were reported from July to September 2022. The population of livestock in Gujrat state was 26.9 million in 2019 [5]. So the spreading of this dreadful disease is a threat to India’s livestock. Lumpy skin disease is transmitted by an arthropod vector; flying insects like mosquitoes and flies are identified as mechanical vectors [6]. Direct contact is considered a minor source of the transmission of infection. The current diagnostic test for diagnosis of the lumpy skin disease is reverse transcription polymerase chain reaction (RTPCR) [7]. Thorough studies and research have been conducted that indicate that hybrid deep learning models are capable of detecting skin diseases [8, 9]. An indigenous vaccine has been developed in India called “Lumpi-ProVacInd” but is not yet launched for commercial use. Currently, live attenuated vaccines are used against lumpy skin disease, but their use is not recommended because of potential safety issues [10]. Currently, different awareness campaigns have been launched to make farmers aware of the disease so that appropriate precautions can be taken. In a country like India where livestock accounts for 4.11% of the country’s GDP, 56

DOI: 10.1201/9781003391272-5

Detecting Lumpy Skin Disease Using Deep Learning Techniques

57

FIGURE 5.1  Lumpy skin.

this viral disease can be a great threat to the economy. Currently, control measures adopted by Asian countries are zoning, continuous surveillance, movement restriction of infected cattle, and official disposal and destruction of animal waste. We have used convolutional neural networks for feature extraction of the image dataset and then Softmax for classification. Many deep learning models have been proposed for the detection of diseases [11]. Few attempts have been made to predict the disease. Research [12] has used the resampling method over random forest to detect lumpy disease. Different attributes related to geographical condition and other attributes related to cattle breed have been used. Deep learning techniques have been used earlier for the prediction, diagnosis, and identifying DNA patterns [13–18].

5.2  MATERIAL AND METHODS 5.2.1  Dataset The dataset consists of 1024 images, which include 700 normal skin images and 324 lumpy skin images. However, the dataset may be biased because the appropriate ratio of both classes was not taken due to the non-availability of the dataset. The dataset distribution pie chart is given in Figure 5.2 and Table 5.1.

FIGURE 5.2  Dataset distribution of lumpy images and normal skin images.

58

Multimedia Data Processing and Computing

TABLE 5.1 Dataset Description and References Class Lumpy skin images Normal skin images

Number of Images References (Dataset Links) 324 [19] 700 Total Images used for the experiment: 1024

5.2.2  Research Methodology In this section, we briefly discuss the whole methodology of our proposed model. In the first part, dataset preprocessing was done in which images were randomly shuffled and resized into 256 × 256 pixels. In the next step, the dataset was divided into appropriate ratios (i.e., 80% and 20%), 80% being used for training and 20% data for testing. We further split the 20% testing data into validation set and testing set. The validation dataset is used to evaluate the model after each epoch, whereas the remaining testing data is used to test the model for the evaluation of the performance of the proposed model. In the next step, our proposed model is trained and then tested and evaluated on different parameters. In the end step, the model will be capable of classifying lumpy skin and normal skin. The flow of the whole process is given in Figure 5.3.

FIGURE 5.3  Flow of the research process.

Detecting Lumpy Skin Disease Using Deep Learning Techniques

59

5.2.3  Parameter Tuning The dataset consists of images of different sizes, so in the first step, which is datapreprocessing, we resized the images in one size (i.e., 256 × 256 pixels). The resized image is given in Figure 5.4. The main reason for choosing 256 × 256 pixels is that size of the maximum images was around 256 × 256 pixels. After resizing the images, they are well shuffled and then randomly divided into 80% and 20% for training and testing. After data preprocessing and splitting the dataset, we run 40 epochs having a batch size of 20. With each epoch, internal parameters are adjusted to produce an overall optimized result. We have used an Adam optimizer for overall loss reduction and to produce an optimized result.

5.2.4  Proposed Architecture Our proposed deep learning model can distinguish between lumpy skin and normal skin images in a fraction of a second with very little or no human intervention. It consists of seven convolutional layers. Each convolutional layer is followed by the activation function rectified linear unit (ReLU) and pooling. We have used max-pooling in this model. After seven convolutional layers, each followed by an activation function and pooling layer, a flatten layer is added. The main function of this layer is to convert the multidimensional matrix into a single-dimensional array so that it can be fed into the dense layer. With each convolutional layer, the ReLU activation function is used to decrease the loss, while in the end Softmax activation function is deployed to produce the binary result. The kernel size is 3 × 3. The drop of 40% is taken at the end. The order of the layers is given below:

C1R1M1 − C2R2M2 − C3R3M3 − C4R4M4 − C5R6M6 − C7R7M7

FIGURE 5.4  Resized 256 × 256 pixel lumpy skin image.

60

Multimedia Data Processing and Computing

FIGURE 5.5  Architecture of the proposed model.

Here, C stands for convolutional layer, R stands for ReLU activation function, and M stands for max-pooling layer. The numbers used with C, R, and M signify the number of layers. The architecture of our proposed model is given in Figure 5.5.

5.3  MODEL EVALUATION AND RESULTS We have added a brief description of the model, results obtained from the model, and the evaluation of the model.

5.3.1 Environment of Implementation The whole experiment is performed in Google Collaboratory in which Python language version 3 was used to write the whole code of the model. In Google Collab, we have used 128 GB Ram and NVIDIA Tesla K80 GPU, which helped to smoothly run the deep learning model.

5.3.2  Description of the Model Table 5.2 contains a detailed description of the model. The pool size used is 2 × 2, while the kernel size is 3 × 3. More than 25 million parameters were used to train the proposed model.

5.3.3  Results and Evaluation We have used different parameters to evaluate our model’s efficiency. We used values of true positive (TP), true negative (TN), false positive (FP), and false negative (FN) from the confusion matrix for the evaluation of the model. The confusion matrix is given in Figure 5.6. The results obtained after evaluation are recorded in Table 5.3. The loss curve and the accuracy curve generated after the training of the model are given in Figures 5.7 and 5.8, respectively.

61

Detecting Lumpy Skin Disease Using Deep Learning Techniques

TABLE 5.2 Description of the Proposed Model Layer conv2d (Conv2D) activation (Activation) max_pooling2d conv2d_1 (Conv2D) activation_1 (Activation) max_pooling2d_1 conv2d_2 (Conv2D) activation_2 (Activation) conv2d_3 (Conv2D) activation_3 (Activation) conv2d_4 (Conv2D) activation_4 (Activation) max_pooling2d_2 conv2d_5 (Conv2D) activation_5 (Activation) max_pooling2d_3 conv2d_6 (Conv2D) max_pooling2d_4 conv2d_7 (Conv2D) activation_7 (Activation) max_pooling2d_5 flatten (Flatten) dense (Dense) dropout (Dropout) dense_1 (Dense) activation_8 (Activation) Total parameters: 25,300,066 Trainable parameters: 25,300,066 Non-trainable parameters: 0

FIGURE 5.6  Confusion matrix.

Output Shape (None, 254, 254, 16) activation (Activation) (None, 127, 127, 16) (None, 125, 125, 32) (None, 125, 125, 32) (None, 62, 62, 32) (None, 60, 60, 64) (None, 60, 60, 64) (None, 58, 58, 128) (None, 58, 58, 128) (None, 56, 56, 256) (None, 56, 56, 256) (None, 28, 28, 256) (None, 26, 26, 512) (None, 26, 26, 512) (None, 13, 13, 512) (None, 11, 11, 1024) (None, 5, 5, 1024) (None, 3, 3, 2048) (None, 3, 3, 2048) (None, 1, 1, 2048) (None, 2048) (None, 64) (None, 64) (None, 2) (None, 2)

Parameters 448 0 0 4640 0 0 18,496 0 73,856 0 295,168 0 0 1,180,160 0 0 0 0 18,876,416 0 0 0 131,136 0 130 0

62

Multimedia Data Processing and Computing

TABLE 5.3 Evaluation of the Model Evaluation metric Accuracy Precession Specificity Sensitivity

FIGURE 5.7  Loss curve.

FIGURE 5.8  Accuracy curve.

Result 88.8% 85.7% 97.1% 60%

Detecting Lumpy Skin Disease Using Deep Learning Techniques

63

5.4  CONCLUSION AND FUTURE WORK The deep learning model we proposed has achieved accuracy up to 88.8%, while 97.1% specificity and 85.7% precession were achieved. We should also consider other factors to evaluate the model, such as the time taken by the model to produce the result, resources used by the model, etc. Due to the unavailability of a large dataset, we were able to include only 324 lumpy skin images which can hamper the accuracy of the model. So the model could be trained on a larger dataset for better performance. Also, work can be extended by proposing the models that are trained on the dataset of a particular breed and include other features. It is high time that we should make use of AI to tackle diseases, or in case of some other emergency, because AI holds the potential to come up with better functionalities and features in comparison to a human. We can further extend this work to find out the various stages of the lumpy disease and find the severity of the disease. The noisy dataset is one of the concerns. So proper techniques can be followed to remove noisy data to enhance the efficiency of the model.

ACKNOWLEDGMENTS Funding information: This investigation acknowledges no precise support from public, commercial, or non-profit funding organizations. Conflict of interest: The researchers state that they do not have any competing interests. Ethical approval: This publication does not contain any human or animal research done by any of the authors.

REFERENCES 1. E. R. Tulman, C. L. Afonso, Z. Lu, L. Zsak, G. F. Kutish, and D. L. Rock, “Genome of Lumpy Skin Disease Virus,” J. Virol., vol. 75, no. 15, pp. 7122–7130, 2001. doi: 10.1128/JVI.75.15.7122-7130.2001. 2. S. Hansen, R. Pessôa, A. Nascimento, M. El-Tholoth, A. Abd El Wahed, and S. S. S. Sanabani, “Dataset of the Microbiome Composition in Skin lesions Caused by Lumpy Skin Disease Virus via 16s rRNA Massive Parallel Sequencing,” Data Brief., vol. 27, Dec. 2019. Accessed: Sep. 20, 2022. [Online]. Available: https://pubmed.ncbi.nlm.nih. gov/31763412/. 3. P. Hunter, and D. Wallace, “Lumpy Skin Disease in Southern Africa: A Review of the Disease and Aspects of Control,” J. S. Afr. Vet. Assoc., vol. 72, no. 2, pp. 68–71, 2001. doi: 10.4102/jsava.v72i2.619. 4. J. A. Woods, “Lumpy Skin Disease Virus,” Virus Infect. Ruminants, pp. 53–67, 1990. doi: 10.1016/b978-0-444-87312-5.50018-7. 5. R. Mani, and M. J. Beillard, “Report Name: Outbreak of Lumpy Skin Disease in Cattle Raises Alarm in Cattle-rearing Communities in the State of Gujarat,” pp. 1–2, 2022. 6. A. Rovid Spickler, “Lumpy Skin Disease Neethling, Knopvelsiekte,” no. July, pp. 1–5, 2003. 7. E. Afshari Safavi, “Assessing Machine Learning Techniques in Forecasting Lumpy Skin Disease Occurrence Based on Meteorological and Geospatial Features,” Trop. Anim. Health Prod., vol. 54, no. 1, 2022. doi: 10.1007/s11250-022-03073-2.

64

Multimedia Data Processing and Computing

8. P. R. Kshirsagar, H. Manoharan, S. Shitharth, A. M. Alshareef, N. Albishry, and P. K. Balachandran, “Deep Learning Approaches for Prognosis of Automated Skin Disease,” Life, vol. 12, no. 3, 2022. doi: 10.3390/life12030426. 9. A. Haegeman et al., “Comparative Evaluation of Lumpy Skin Disease Virus-Based Live Attenuated Vaccines,” Vaccines, vol. 9, no. 5, 2021, doi: 10.3390/vaccines9050473. 10. E. S. M. Tuppurainen, and C. A. L. Oura, “Review: Lumpy Skin Disease: An Emerging Threat to Europe, the Middle East and Asia,” Transbound. Emerg. Dis., vol. 59, no. 1, pp. 40–48, 2012. doi: 10.1111/j.1865-1682.2011.01242.x. 11. S. Shastri, P. Kour, S. Kumar, K. Singh, and V. Mansotra, “GBoost: A Novel GradingAdaBoost Ensemble Approach for Automatic Identification of Erythemato-Squamous Disease,” Int. J. Inf. Technol., vol. 13, no. 3, pp. 959–971, 2021. doi: 10.1007/ S41870-020-00589-4. 12. S. Suparyati, E. Utami, and A. H. Muhammad “Applying Different Resampling Strategies in Random Forest Algorithm to Predict Lumpy Skin Disease,” JURNAL RESTI, vol. 5, no. 158, pp. 555–562, 2022. 13. V. Sourabh, P. Mansotra, S. Kour, and Kumar, “Voting-Boosting: A Novel Machine Learning Ensemble for the Prediction of Infants’ Data,” Indian J. Sci. Technol., vol. 13, no. 22, pp. 2189–2202, 2020. doi: 10.17485/ijst/v13i22.468. 14. S. Shastri, P. Kour, S. Kumar, K. Singh, A. Sharma, and V. Mansotra, “A Nested Stacking Ensemble Model for Predicting Districts With High and Low Maternal Mortality Ratio (MMR) in India,” Int. J. Inf. Technol., vol. 13, no. 2, pp. 433–446, 2021. doi: 10.1007/s41870-020-00560-3. 15. S. Shastri, I. Kansal, S. Kumar, K. Singh, R. Popli, and V. Mansotra, “CheXImageNet: a Novel Architecture for Accurate Classification of Covid-19 With Chest x-Ray Digital Images Using Deep Convolutional Neural Networks,” Health Technol. (Berl), vol. 12, no. 1, pp. 193–204, 2022. doi: 10.1007/S12553-021-00630-X. 16. Sachin. Kumar et al., “LiteCovidNet: A Lightweight Deep Neural Network Model for Detection of COVID-19 Using X-Ray Images,” Int. J. Imaging Syst. Technol., vol. 32, no. 5, pp. 1464–1480, 2022. doi: 10.1002/IMA.22770. 17. S. Shastri, S. Kumar, K. Singh, and V. Mansotra, “En-Fuzzy-ClaF: A Machine Learning–Based Stack-Ensembled Fuzzy Classification Framework for Diagnosing Coronavirus,” Society 5.0 and the Future of Emerging Computational Technology, pp. 123–138, Jun. 2022, doi: 10.1201/9781003184140-8. 18. S. Shastri, S. Kumar, K. Singh, and V. Mansotra, “Designing Contactless Automated Systems Using IoT, Sensors, and Artificial Intelligence to Mitigate COVID-19,” Internet of Things, pp. 257–278, Mar. 2022, doi: 10.1201/9781003219620-13. 19. S. Kumar, and S. Shastri, “Lumpy Skin Images Dataset,” vol. 1, 2022, doi: 10.17632/ W36HPF86J2.1.

6

Forest Fire Detection Using a Nine-Layer Deep Convolutional Neural Network Prabira Kumar Sethy, A. Geetha Devi, and Santi Kumari Behera

6.1 INTRODUCTION Forests are essential natural resources. They conserve water and minerals and protect humankind from pollution and other natural calamities. In addition, they provide the materials used to maintain economic stability [1]. Hence, there is an important need to protect the forests. In recent times, there have been many forest fire (FF) cases. These incidents may be due to human-made mistakes, dry environments, and high temperatures due to increased carbon dioxide. It causes extensive disaster to the world’s environmental balance, ecology, and economy. Traditional monitoring systems by humans may lead to delayed alarms. In protecting forests from fire, many governments around the globe are interested in developing strategies for automatic surveillance systems for detecting FFs. Many FF detection systems have been developed, such as satellite imaging systems, optical sensors, and digital imaging methods [2]. However, these methods are not highly efficient as they have drawbacks such as power consumption, latency, accuracy, and implementation cost. These drawbacks can be addressed using artificial intelligence technology-based surveillance systems. Object detection and recognition systems [3–5] using machine learning algorithms are a part of advanced computer vision technology. Due to high accuracy, storage capacity, and fastperforming graphics processing units (GPUs), machine learning plays an important role in this area, even though there is a large requirement to develop better algorithms when larger datasets are involved. Deep convolutional neural networks (deep CNNs) are a class of machine learning algorithms in which more complexity is involved in the network. These networks can handle larger datasets. Deep CNNs have many applications, such as object detection, recognition, image classification, speech recognition, natural language processing, etc. [6]. Transfer learning in deep CNNs helps handle larger datasets with less computational time and lower complexity by reducing the training data size [7]. Using transfer learning, we can incorporate the knowledge from a previously trained model into our own [8]. There are a variety of transfer learning techniques are available in deep CNNs, such as AlexNet, VGG-16, ResNet, etc. [9, 10]. DOI: 10.1201/9781003391272-6

65

66

Multimedia Data Processing and Computing

The remainder of this chapter is organized as given. Section 6.2 presents a literature survey. Section 6.3 describes the methodology, and Section 6.4 explains the results and discussion related to the work. Finally, Section 6.5 concludes the paper and directs toward future research.

6.2  LITERATURE SURVEY Many researchers proposed their work on FF detecting systems based on satellite imaging. For FF detection, Guangmeng and Mei [2] employed intermediate resolution imaging spectro-radiometer data, while Li et al. [11] used advanced veryhigh-resolution radiometer (AVHRR) pictures [12]. The main drawback of satellite imaging systems is that the images will be acquired every one or two days, which is unsuitable for detecting FFs. The climate variations also affect the quality of the images acquired [11]. FF detection can also be achieved through wireless sensor networks (WSNs). In this method, sensors must be implanted in the forest, and the FF can be monitored by various parameters such as humidity and temperature [13]. Continuous analysis of the fire weather index component is used to identify an FF in [14]. WSN methods are more rapid than satellite imaging systems, but they are not economical due to the high-powered sensors that need to be implanted in the system. With the progress in computer vision technology, FFs can be detected by exploring image processing algorithms [15–17]. These techniques utilize image feature information to detect FFs accurately and effectively. Real-time implementation is possible with improved performance, and it is economically cost effective. Several researchers have used color models to classify flame pixels [18–21]. Various researchers have utilized temporal and spatial wavelet analysis to detect FFs [22, 23]. V. Vipin [24] proposed a classification algorithm based on RGB and YCbCr color models. In [25] a tunnel fire-detection method based on camera footage was suggested. This scheme reduces the algorithm complexity, but it has false predictions. It is clear from the literature that machine learning algorithms can handle various closely related objectives very effectively. To introduce expert knowledge into the system and for better decision-making, a powerful and flexible framework can be achieved by machine learning. Various types of research are based on machine learning to detect FFs. P. Foggia et al. proposed an ensemble-based FF detection algorithm using flame movement, shape features and color. Muhammad et al. [26] suggested a structure fine-tuning the convolutional neural network for fire detection.

6.3  MATERIALS AND METHODS This section describes the dataset and proposed methodology.

6.3.1  About the Dataset This dataset [27] has been compiled specifically to solve the FF detection issue. The resolution of each of the photos in the dataset is 250 × 250, and each of the images has three channels. The photographs were located by doing searches using a variety of search phrases across a number of different search engines. After that,

Forest Fire Detection Using a Nine-Layer Convolutional Neural Network

67

FIGURE 6.1  Images serving as examples: (a) a fire in the forest and (b) no fire in the forest.

these photographs are meticulously examined in order to crop and eliminate any unnecessary components, such as people or fire-fighting equipment, so that each image only displays the relevant fire location. The data collection was created for the purpose of solving the binary issue of determining whether or not a forest landscape has been affected by fire. It is a balanced dataset that has a total of 1520 photos, with 760 images belonging to each class [27]. The suggested research is subjected to a rigorous process of 10-fold cross-validation. Figure 6.1 depicts the samples for your viewing pleasure.

6.3.2  Proposed Methodology The proposed methodology comprises two stages. In the first stage, it suggested an FF fighting system with the help of a drone, peer-to-peer communication, and a control station where a designed deep fire-detection system is installed, as seen in the illustration for Figure 6.2. The operational flow of the suggested FF-fighting system is given in Figure 6.3. The second stage of the proposed method is the development of a model for deep learning for distinguishing fire and no-fire landscape images. Here, a nine-layer deep CNN is utilized for FF detection. Figure 6.4 is an illustration of the architecture of the deep CNN model that was suggested. There are various convolutional operations in deep CNN. All these convolution layers will generate multiple features of the data. The features may be common in

68

Multimedia Data Processing and Computing

FIGURE 6.2  Method for fighting forest fires.

FIGURE 6.3  Methodical approach to the operation of the fire-fighting system.

Forest Fire Detection Using a Nine-Layer Convolutional Neural Network

69

FIGURE 6.4  Deep convolutional neural network model with nine layers of architecture.

the initial layers. But more detailed features will be extracted in the deeper layers. The dimensionality will be reduced from one layer to the other as we go deep into the network by the pooling layers. The down-sampling operation carries out the dimensionality reduction in the pooling layers. In the proposed algorithm, the Max pooling operation is utilized, which performs better than the average pooling. In order to prevent the issue of overfitting, the dropout layer has also been included into the proposed CNN. The output of the convolution and pooling layers is transferred to the dense layer, and the classification operation is carried out by the dense layer itself. In order to get the best possible result, the deep neural networks will undergo extensive training using an iterative process. Training the data in tiny batches is accomplished with the help of the gradient descent optimization technique, which is used by the suggested model. As a result of this, the optimization method goes by the name of batch gradient descent. When there is a significant amount of training data, it is extremely difficult to perform the gradient descent technique. A mini-batch of the training data may be used to compute and update the weights of the loss function. Mini-batch training data can be used. The calculation time required by the model will decrease as a result of the employment of smaller training datasets (mini-batches). The effectiveness of the model is enhanced as a result of this as well.

6.4  RESULTS AND DISCUSSION The proposed CNN for the FF detection model is run in MATLAB 2021a on an HP Victus laptop with Windows 11, a core i7 processor of the 12th generation, 16 gigabytes of random access memory (RAM), and a 4 gigabyte RTX050Ti graphics processing unit (GPU). The initial learning rate of the models is set to 0.001; the minibatch size is set to 64; the maximum epoch is set to 4; the validation frequency is set to 30; and stochastic gradient descent with momentum (SGDM) optimization is used. The deep CNN model that was suggested was trained and evaluated with the help of the FF dataset that was obtained from the Kaggle repository. A 10-fold cross-validation was used to verify the accuracy of the suggested approach. Figures 6.5 and 6.6 depict the training progress and performance relative to each epoch of the model, respectively. Figure 6.7 presents a visual representation of the confusion matrix that the suggested model employs.

70

Multimedia Data Processing and Computing

FIGURE 6.5  The nine-layer deep CNN model is making progress in its training.

FIGURE 6.6  Nine-layer model after four epochs.

FIGURE 6.7  Confusion matrix of fire detection model.

Forest Fire Detection Using a Nine-Layer Convolutional Neural Network

71

From Figure 6.7, it is clear that the proposed nine-layer deep convolutional network has the potential to distinguish between fire and no-fire landscape images. In the testing phase, only three images of fire were erroneously detected as no-fire out of 76 images, and two images of no-fire were erroneously detected as fire out of 76 images. This implies the high efficacy of the proposed method.

6.5 CONCLUSION FF fighting is a challenging task utilizing humans as the forest spreads across large areas. It needs daily visiting of each small forest region within a small interval of time. Further, it is labor-intensive to extinguish or control the fire manually. Hence, here a FF fighting system is proposed utilizing a drone. Again, a nine-layer deep convolutional neural network is designed to distinguish between fire and no-fire landscape images. The model achieved 96.71% accuracy. Further, it can implement integration with the Internet of Things (IoT).

REFERENCES 1. G. Winkel, M. Sotirov, and C. Moseley, “Forest environmental frontiers around the globe: old patterns and new trends in forest governance,” Ambio, vol. 50, no. 12, pp. 2129–2137, 2021. 2. G. Guangmeng, and Z. Mei, “Using MODIS land surface temperature to evaluate forest fire risk of Northeast China,” IEEE Geoscience and Remote Sensing Letters, vol. 1, no. 2, pp. 98–100, 2004. 3. H. Alkahtani, and T. H. H. Aldhyani, “Intrusion detection system to advance the Internet of Things infrastructure-based deep learning algorithms,” Complexity, vol. 2021, Article ID 5579851, pp. 18, 2021. 4. S. N. Alsubari, S. N. Deshmukh, M. H. Al-Adhaileh, F. W. Alsaade, and T. H. H. Aldhyani, “Development of integrated neural network model for identification of fake reviews in Ecommerce using multidomain datasets,” Applied Bionics and Biomechanics, vol. 2021, Article ID 5522574, pp. 11, 2021. 5. H. Alkahtani, and T. H. H. Aldhyani, “Botnet attack detection by using CNN-LSTM model for the Internet of Things applications,” Security and Communication Networks, vol. 2021, Article ID 3806459, pp. 23, 2021. 6. R. Wason. Deep learning: evolution and expansion. Cognitive Systems Research, vol. 52, pp. 701–708, 2018. DOI: 10.1016/j.cogsys.2018.08.023. 7. M. Shaha, and M. Pawar. Transfer learning for image classification. In: 2018 Second International Conference of Electronics, Communication and Aerospace Technology; 2018. pp. 656–660. DOI: 10.1109/ICECA.2018.8474802. 8. Y.-D. Zhang, Z. Dong, X. Chen, W. Jia, S. Du, and K. Muhammad, et al. Image based fruit category classification by 13-layer deep convolutional neural network and data augmentation. Multimedia Tools and Applications, vol. 78, no. 3, pp. 3613–3632, 2019. DOI: 10.1007/s11042-017-5243-3. 9. A. G. Evgin Goceri. On The Importance of Batch Size for Deep Learning. In: Yildirim Kenan, editor. International Conference on Mathematics: An Istanbul Meeting for World Mathematicians Minisymposium on Approximation Theory, Minisymposium on Mathematics Education; 2018. pp. 100–101.

72

Multimedia Data Processing and Computing

10. S.-H. Wang, C. Tang, J. Sun, J. Yang, C. Huang, and P. Phillips, et al. Multiple sclerosis identification by 14-layer convolutional neural network with batch normalization, dropout, and stochastic pooling. Frontiers in Neuroscience, vol. 12, 818, 2018. DOI: 10.3389/fnins.2018.00818. 11. Z. Li, S. Nadon, and J. Cihlar, “Satellite-based detection of Canadian boreal forest fire: development and application of the algorithm,” International Journal of Remote Sensing, vol. 21, no. 16, pp. 3057–3069, 2000. 12. K. Nakau, M. Fukuda, K. Kushida, and H. Hayasaka, “Forest fire detection based on MODIS satellite imagery, and comparison of NOAA satellite imagery with fire fighters’ information,” in IARC/JAXA Terrestrial Team Workshop, pp. 18–23, Fairbanks, Alaska, 2006. 13. L. Yu, N. Wang, and X. Meng, “Real-time forest fire detection with wireless sensor networks,” in Proceedings of IEEE International Conference on Wireless Communications, Networking and Mobile Computing, pp. 1214–1217, Wuhan, China, 2005. 14. M. Hefeeda, and M. Bagheri, “Wireless sensor networks for early detection of forest fires,” in IEEE International Conference on Mobile Ad hoc and Sensor Systems, pp. 1–6, Pisa, Italy, 2007. 15. B. C. Ko, J. Y. Kwak, and J. Y. Nam, “Wild fire smoke detection using temporal, spatial features and random forest classifiers,” Optical Engineering, vol. 51, no. 1, Article ID 017208, 2012. 16. L. Ma, K. Wu, and L. Zhu, “Fire smoke detection in video images using Kalman filter and Gaussian mixture color model,” in IEEE International Conference on Artificial Intelligence and Computational Intelligence, pp. 484–487, Sanya, China, 2010. 17. M. Kandil and M. Salama, “A new hybrid algorithm for fire vision recognition,” in IEEE EUROCON 2009, pp. 1460–1466, St. Petersburg, Russia, 2009. 18. T. H. Chen, P. H. Wu, and Y. C. Chiou, “An early fire detection method based on image processing,” in International conference on Image processing (ICIP), pp. 1707–1710, Singapore, 2004. 19. T. Çelik, and H. Demirel, “Fire detection in video sequences using a generic color model,” Fire Safety Journal, vol. 44, no. 2, pp. 147–158, 2009. 20. W. B. Horng, J. W. Peng, and C. Y. Chen, “A new image-based real-time flame detection method using color analysis,” in Proceedings of IEEE Networking, Sensing and Control, pp. 100–105, Tucson, AZ, USA, 2005. 21. G. Marbach, M. Loepfe, and T. Brupbacher, “An image processing technique for fire detection in video images,” Fire Safety Journal, vol. 41, no. 4, pp. 285–289, 2006. 22. B. U. Töreyin, “Fire detection in infrared video using wavelet analysis,” Optical Engineering, vol. 46, no. 6, Article ID 067204, 2007. 23. S. Ye, Z. Bai, H. Chen, R. Bohush, and S. Ablameyko, “An effective algorithm to detect both smoke and flame using color and wavelet analysis,” Pattern Recognition and Image Analysis, vol. 27, no. 1, pp. 131–138, 2017. 24. V. Vipin, “Image processing based forest fire detection,” International Journal of Emerging Technology and Advanced Engineering, vol. 2, no. 2, pp. 87–95, 2012. 25. B. Lee, and D. Han, “Real-time fire detection using camera sequence image in tunnel environment,” in International Conference on Intelligent Computing, pp. 1209–1220, Springer, Berlin, Heidelberg, 2007. 26. K. Muhammad, J. Ahmad, and S. W. Baik, “Early fire detection using convolutional neural networks during surveillance for effective disaster management,” Neurocomputing, vol. 288, pp. 30–42, 2018. 27. A. Khan, B. Hassan, S. Khan, R. Ahmed, and A. Adnan, “deepFire: a novel dataset and deep transfer learning benchmark for forest fire detection,” Mobile Information System, vol. 2022, p. 5358359, 2022.

7

Identification of the Features of a Vehicle Using CNN Neenu Maria Thankachan, Fathima Hanana, Greeshma K V, Hari K, Chavvakula Chandini, and Gifty Sheela V

7.1 INTRODUCTION The extremely sophisticated and developed areas such as cities, the number of vehicles, their models, and other features are more distinguishable, and the need for identification of vehicles that have been involved in a crime is also increasing at an alarming rate. During the course of this study, different features of cars, such as the color, model, logo, etc., are used to identify the vehicle. The access of vehicles to an organization’s property can be automatically authenticated for security measures in accordance with the organization’s policies. DNN and CNN are artificial neural networks (ANNs), which are used in this chapter to identify the characteristic features of vehicle. The majority of the time, they are used to identify patterns in images and videos that can be used as marks. A deep learning (DL) technology is related to a DNN, which has three or four layers (input and output layers included). In image processing, CNNs are often used, as these are one of the most popular neural network architectures. Image processing is performed by using images that come from different types of datasets including VeRi-776, Vehicle ID, VERI Wild, Stanford cars, PLUS, Compcars, and others. VERI-776 is a dataset that can be used for vehicle re-identification. A dataset called Vehicle ID contains car images that have been captured during the daytime via multiple surveillance cameras. The dataset consists of 26,667 vehicles out of the entire dataset, all of which have corresponding ID labels to identify them (Figure 7.1). Process to identify vehicle: • • • • • •

Capture of Image Detection and Identification of the vehicle Recognition of license plate Recognition of logo Recognition of model Re-identification of vehicle

In this chapter, we recommend a method for identifying the characteristic features of vehicle based on different techniques. After capturing the image of the vehicle, it DOI: 10.1201/9781003391272-7

73

74

Multimedia Data Processing and Computing

FIGURE 7.1  Steps used in identification of the features of a vehicle.

should be processed, localized, undergo connected component analysis, and be segmented. It has been demonstrated that DNNs [1] are capable of classifying vehicle images in order to select the camera view position in the image. It makes use of a magnetic sensor [2] as well as a new dataset-based CNN [3] called Vehicle-Rear for the detection and identification of the vehicle. It is possible to perform automatic recognition of license plate characters using a CNN classifier in conjunction with a spatial transformer network (STN) [4]. It is presented in this chapter that automatic recognition of license plates is being done using the following tools: [5] Python, Open computer vision (OpenCV), MATLAB, [6] Optical K-Means with CNN, and [7] improved optical character recognition (OCR) technology. Besides that, other features of the vehicle can be detected using [8] modified HU invariant moments, support vector machines (SVM), [9] overlapping edge magnitudes, and [10] the you only look once (YOLO) V3 model for fast detection of vehicle logos. It is possible to recognize a vehicle model using a CNN (principal component analysis network) based on PCANet [11] and a visual attention model based on the Lightweight Recurrent Attention Unit (LRAU) [12]. The vehicle re-ID function is used to identify a similar vehicle in an image library. There are several directional deep learning features that can be used for re-identification, such as quadruple deep learning (QDL) [13], unmanned aerial vehicle (UAV) [14], context-based ranking method (MRM) [15], structural analysis of attributes (SAA) [16], Joint Feature and Similarity Deep Learning (JFSDL) [17], Machine Learning Scaling Library (MLSL) [18], our proposed Deep Framer Network (DFN)[19], partial attention (PA), and multi-attribute learning (MAL) [20].

7.2  LITERATURE REVIEW In this chapter, different methodologies for image capture, detection and identification of vehicle, automatic recognition of license plates, recognition of license plate, and detection of vehicle model are discussed.

Identification of the Features of a Vehicle Using CNN

75

7.2.1  Image Capture To identify the features of a vehicle, we must first capture an image that is able to identify the features [1]. To create a DNN model in Keras and TensorFlow libraries in the language of Python, we used a machine learning algorithm to classify vehicle images. The best camera position to record vehicles in traffic is determined based on the results of experimentation; these are then used to determine the highway occupancy rates. In addition to monitoring vehicle traffic, accidents, and the access of unauthorized vehicles, these results can also be used to ensure security against vehicular theft or vandalism. A preprocessing step is performed on the captured image by resizing it and converting the color spaces used in the image. By resizing the image captured in raw format or encoded in multi-media standards, it may be possible to avoid the system becoming slow due to the large size of the image. Infrared radiation (IR) or photographic cameras are used to capture the images in a raw format or encoded in multimedia standards. In these images, there will be three channels that are converted to grayscale – red, green, and blue – each of which is in RGB mode. As soon as the images have been preprocessed, the localization process is carried out by highlighting the characters and removing the background from the image. In this case, the threshold method is used as a technique by using a technique called “image processing.” As a result, the image pixels are truncated to two values based on the threshold value set by the user. As part of the connected component algorithm, undesired areas of the image are removed from the image before character analysis begins. In the next step, the connected components (blobs) of the network will be labeled and extracted. By using a process called segmentation, a set of labeled blobs can be cropped out, and they can then be selected for optical character recognition by using machine learning algorithms. It is also possible to carry out a classification and recognition of a vehicle using data augmentation and the CNN model [21] by augmentation of data.

7.2.2  Identification and Detection of Vehicle This chapter discusses an improved robust method for detection and identification of a vehicle using a magnetic sensor. Instead of a loop and video camera, a magnetic sensor measures magnetic field variations caused by changes in vehicle positions. A novel vehicle detection algorithm introduces a short-term difference signal from original magnetic signals. The parking module increases the performance of the detection by making it more robust. A more advanced algorithm for this approach is the gradient tree boosting algorithm, which provides better performance and accuracy. For the purpose of vehicle identification, there is a new dataset called Vehicle-Rear, which contains high-quality videos, accurate and precise information about the basic features of 3,000 plus vehicles. There are also two-stream CNNs that are being developed in conjunction with these datasets for features such as the appearance of vehicle and the license plate number.

7.2.3 Automatic License Plate Recognition In India, where vehicle license plates are of different sizes, fonts, and numbers of lines, automatic recognition of license plate is a necessity. A deep CNN method

76

Multimedia Data Processing and Computing

is used in this research to perform automatic license plate recognition (ALPR). Detecting license plates can be accomplished by a CNN classifier with training for individual characters [4]. In addition to recognizing the license plates of vehicles automatically, it has a wide range of advancements in all areas. Despite the fact that it is facing complex characteristics due to disruptive effects such as speed and light, ALPR makes use of free and proprietary software’s such as Python and the OpenCV library [5]. Another method to perform recognition of license plates is use of various clustering methods using segmentation and a CNN model called OKM-CNN. It mainly focuses on three stages, such as the examination of license plates (LP), the clustering of license plates, and the recognition of license plates using the CNN model, which adheres to the OKM clustering technique. A combination of an improvised algorithm of Bernsen (IBA) and a connection-based component analysis (CCA) model is utilized in order to detect and locate license plates [6]. In addition to the use of optical means of character-based recognition of license plate, ANN can also assist in automatic recognition of number plate via the use of improved OCR-based recognition of license plates. Over the past years, many changes have been done in the world of intelligence transportation. Based on this methodology, the whole system can be divided into three main components, namely license plate localization, character segmentation, and character recognition, which can be used to measure the quality of the data. As a result of using 300 motor vehicle license plate images from both domestic and international markets, we are able to achieve an accuracy of 94.45% within less than 1 second under different angles and environmental conditions in comparison with other methods. Compared to other methods, the proposed method is less time consuming and has high adaptability to inputs from untrained tests. There are several different technologies used for the purpose of identification of license plates, which all depend on the quality of the image captured by CCTV cameras [7].

7.2.4  Vehicle Logo Recognition A vehicle logo recognition system is used by law enforcement agencies for the purpose of identifying vehicles and as more reliable and accurate evidence when it comes to escapes and vehicle tracking. Under low-illumination conditions, it can be very difficult to locate the logo of various vehicles. To locate vehicle logo accurately, we can use a number of steps. A modified Hounsfield unit (HU) invariant moment and SVM have been used in order to identify a vehicle’s logo. In low illumination conditions, the Grey wolf optimizer (GWO) method is used to improve the recognition accuracy of the algorithm as a result of its use in low-light conditions. On the basis of results, it can be concluded that the addition of invariant moments to the logo recognition algorithm increases the precision of recognition. There is a possibility that GWO will be used instead of CV in order to enhance the recognition accuracy and rate [8]. Besides using different technologies, a concept of vehicle logo recognition (VLR) can also be proposed, such as local anisotropy of vehicle logo images, patterns of oriented edge magnitudes (POEM), and an enhanced POEM version specifically designed for describing vehicle logos called enhanced overlapping

Identification of the Features of a Vehicle Using CNN

77

POEM (OE-POEM). The dimensions of a feature can be reduced using whitened principal component analysis (WPCA), and collaborative representation-based classifier (CRC). In addition, this method provides a new dataset of vehicle logos called HFUT-VL. In comparison with existing datasets for logo identification, this dataset is larger and more effective. It is well known that VLR methods based on gradient orientation provide better accuracy than the basic VLR method [9]. Several DL algorithms, including you only look once (YOLO) and CNN, can be used to recognize logos of vehicles in complex scenes. A new dataset called VLD-30 [10] is introduced to enhance YOLOv3 with new features that mimic the original YOLOv3 model.

7.2.5  Vehicle Model Recognition A principal component analysis (PCA) network–based CNN (PCNN) can be used to recognize vehicle models using one discriminatory local feature of a vehicle, its headlamp. By employing the proposed method, local features can be detected and segmented from vehicle images without the need to be precise. The vehicle headlamp image can be used to extract hierarchical features, as well as to reduce the computational complexity of a traditional CNN system [14]. The vehicle make and model recognition (VMMR) system provides accurate and fast information about a vehicle. Lightweight recurrent attention units (LRAUs) are used in a method to enhance the CNN’s ability to extract features. By generating attention masks, LRAUs extract discriminative features such as logos and headlights. As a result of receiving feature maps by the recurrent units and a prior attention state by the preceding of LRAU. By adding these recurrent units to the CNN architecture, multilayered features can be obtained, which facilitates the extraction and combination of discriminative features of different scales [12].

7.2.6 Re-identification of a Vehicle The purpose of repeated identification of a vehicle is to identify a target vehicle without overlapping the views of different cameras. In order to improve these repeated identifications, DL networks based on quadruple features (QD-DLF) are used. In addition to the DL architectures, quadruple directional DL networks also use different layers of feature pooling to combine directional features. A given square vehicle image is processed using a DL architecture and massively connected CNNs to extract basic features from the inserted picture. In order to restrict the feature maps into different directional feature maps, quadruple directional-based DL networks utilize a variety of directional featured pooling layers, including horizontal layers, vertical layers, diagonal layers, and anti-diagonal pooling layers in average. After the feature maps have been spatially normalized by concatenating them, they can be used to re-identify vehicles [13]. Another methodology for the re-identification of a vehicle is the region aware deep models (RAM) approach, which extracts different peculiarities from a series of local areas in addition to the external features. This methodology makes use of two large amounts of Re-ID of vehicle datasets such as VERI and Vehicle ID [22]. Unmanned aerial vehicles (UAVs) are used to take aerial videos for re-identification.

78

Multimedia Data Processing and Computing

We introduce a new dataset called UAV, which contains more than 41,000 images of 4,000-plus vehicles that were captured by unmanned aerial vehicles. This methodology offers a more extreme and realistic approach to re-identification of a vehicle [14]. For re-identification of vehicles, it is also possible to extract several local regions in addition to global features using MRM. An STN-based localization model is designed to localize more distinct visual cues in local regions. To generate a list of ranks based on the similarity of neighbors, context and content are taken into consideration. The multi-scale attention (MSA) framework can also be applied to vehicle identification by taking into account a multi-scale mechanism in conjunction with an attention technique. Our goal is to utilize attention blocks within each scale subnetwork and bilinear interpolation techniques on the backbone of the network as complementary and discriminative information which will generate feature maps that contain local information as well as global information for each scale. There is also the possibility of adding more than one attention block to each subnetwork to get more discriminative information about vehicles [23]. It is also possible to initiate re-identification and repulsion by using structural analysis of attributes which uses the dataset VAC21 which has 7,130 images of different types of vehicles. A hierarchical labeling process with bounding boxes was used to divide 21 structural attributes into 21 classes. A state-of-the-art one-stage detection method and single detection method are used to provide a basic model for detecting attributes here. In this chapter, we also represent a method for re-identification and retrieval of vehicles based on regions of interest (ROIs), where a deep feature of the ROI is used as a discriminative identifier by encoding information about the outline of the vehicle. These deep features are inserted into a developing model in order to improve the accuracy of the model. Additionally, we can enhance the accuracy of the detection of small objects by adding proposals from low layers. For re-identification, a Siamese deep network is utilized in conjunction with deep learning methods to extract DL features from an input image pair of vehicle using joint feature and similarity deep learning methods (JFSDL). Using these joint identification and VERIfication supervision methods, re-identification can be achieved in a short amount of time. The process of doing this can be accomplished by linearly combining two simple functions and one different similarity learning function. As part of the variant similarity learning function, the score of similarity between two input images of vehicle is calculated by showing the unit-wise absolute difference and multiplying the corresponding DL pair of features simultaneously in coordination with a group of learned mass coefficients. In this study, the results of an experiment showed that the JFSDL method is more efficient than the multiple state-of-the-art methods for re-identification of vehicles [17]. The re-identification of vehicles that share the same features presents many challenges due to the small differences between them. In addition to multiple labels similarity learning (MLSL), a DL-based model for improved vehicle representations has been developed using that method. This method employs a Siamese network that uses three different attributes of a vehicle – ID number, color, and type – and a regular CNN-used feature for learning feature representations using the vehicle ID attributes [18]. We recommend a two-layer repeated ranking structure based on fine-grained discriminative networks (DFNs) that are combined with fine-grained

79

Identification of the Features of a Vehicle Using CNN

and Siamese networks to re-identify vehicles. Siamese networks can be used to reidentify general objects by using two ways of the network, while DFN are capable of detecting differences as well [19]. In order to re-identify vehicles that are located in local regions that contain more different information, we can utilize semi-attention and multilayered-attribute learning networks. As a result of this methodology, multiple vehicle keypoint detection models are used to extract multi-attribute features [20]. There have also been proposals to extract a vehicle’s discriminative features using a three-layered adaptive attention network GRMF. To extract useful features from a network, we can divide it into the branches with three perspectives, such as random location, channel information, and localized information. By using two effective modules of global relational attention, we are able to capture the global structural information. By using the global relativity between the point and all other points of nodes, we will get the priority level of the node or point of interest. There is an introduction to a suitable local partition that is able to capture accurate local information and solve the problem of mistakes in alignment and variations of consistency within parts. This method uses a multilayer attributes driven vehicle identification system combined with temporal ranking using a spatial method to extract the different features of appearance of the vehicle. To build the similarly appearing sets, we construct them from the spatial and temporal relationship among vehicles using multiple cameras. Then we use the Jaccard distance between the similarly appearing sets for repeated ranking [24]. In addition to extracting appearance, color, and model features, all of these methodologies are also used to enhance the different representations of original vehicle images and allow them to be used for re-identification of vehicles. Table 7.1 shows an overview of various works using different methodologies in identification of characteristic features of a vehicle. TABLE 7.1 Methodology Used Author [18] Alfasly et al.

Methodology MLSL

[12] Boukerche et al.

VMMR

[3] De Oliveira et al. [2] Dong et al.

CNN Robust vehicle detection CNN CNN, STN

[21] Hicham et al. [4] Jain et al.

Dataset VERI-776, Vehicle ID, VERI-Wild Stanford, CompCars, NTOU-MMR Vehicle-Rear _ _ Dataset from CCTV footages

Result Accuracy: 74.21% Stanford Cars: 93.94% CompCars: 98.31% NTOU-MMR: 99.4% F-score: 98.92% Accuracy: 80.5% Accuracy: 90% Accuracy for single-type license plates: 97% Double-type license plates: 94% (Continued)

80

Multimedia Data Processing and Computing

TABLE 7.1 (Continued) Methodology Used Author [24] Jiang et al [7] Kakani et al. [22] Liu et al. [1] Mariscal-García et al. [15] Peng et al.

Methodology Multilayered attribute driven vehicle re-id ANN Region aware deep model DNN

Dataset VERI776, Vehicle ID.

Result _

_ VERI,VehicleID

Accuracy: 94.45% Accuracy for VERI: 94% Accuracy: 88%

Dataset from Stanford University Vehicle ID, VERI-776

[6] Pustokhina et al.

Multi-region features learning model Context-based ranking OKM-CNN

[5] Sajjad et al.

OCR using Python

[11] Soon et al.

PCA network-based CNN (PCNN) DNN GRMF

PLUS, CompCars UAV-VeID VERI776, Vehicle ID.

Accuracy for License plate localization: 92% Character separation: 95.7% Character recognition: 94.3% PLUS: 99.51% CompCars: 89.83% _ _

Partial attention of multilayer-attribute learning network DFN Modified YOLOv3 OE-POEM Support vector machine, Cross validation, Gray Wolf Optimize ROI vehicle re-id and retrieval Multiscale attention Re-id of vehicle JFSDL Quadruple directional deep learning network

VERI,VehicleID

_

VERI-776, Vehicle ID VLD-30 HFUT-VL _

_ _ _ Accuracy for CV: 89.86% GWO: 96.25

VAC21, Vehicle ID

_

VERI776, Vehicle ID, PKU-VD VehicleID, VERI VERI, VehicleID

_

[14] Teng et al. [25] Tian et al. [20] Tumrani et al.

[19] Wang et.al. [10] Yang et al. [9] Yu et al. [8] Zhao et al.

[16] Zhao et al. [23] Zheng et al. [17] Zhu et al. [13] Zhu et al.

Stanford FZU Cars, HumAIn 2019 Challenge _

_

Accuracy: 98.10%

_ Accuracy: 83.74%

Identification of the Features of a Vehicle Using CNN

81

7.3 CONCLUSION This chapter describes different methodologies used for identifying the characteristics and features of a vehicle. The different methods are explained, such as image capturing, detection, identification, automatic recognition of license plates, logo recognition, model recognitions, and re-identification process of vehicles, using different technologies and algorithms. All these methodologies are based on ANN, as the combination of CNN and the DNN algorithms. In each case, different test results can be obtained by utilizing different datasets, which provide certain accuracy. We should provide an initiative to improve the accuracy of the test results.

7.4  OPEN RESEARCH AREAS In order to identify features of vehicle, different methodologies are proposed, each using a different dataset. Using research and practice, it is possible for existing methodologies to be improved in terms of accuracy by overcoming certain challenges associated with vehicle feature recognition, including low-illumination conditions, viewpoint variations, changing weather conditions, and the presence of the same model of vehicle. Conflicts of Interest: No conflicts exist for the authors of this paper, either financial or non-financial.

REFERENCES 1. Mariscal-García, C., Flores-Fuentes, W., Hernández-Balbuena, D., RodríguezQuiñonez, J. C., Sergiyenko, O., González-Navarro, F. F., & Miranda-Vega, J. E. (2020, June). Classification of vehicle images through deep neural networks for camera view position selection. In 2020 IEEE 29th International Symposium on Industrial Electronics (ISIE) (pp. 1376–1380). IEEE. 2. Dong, H., Wang, X., Zhang, C., He, R., Jia, L., & Qin, Y. (2018). Improved robust vehicle detection and identification based on a single magnetic sensor. IEEE Access, 6, 5247–5255. 3. De Oliveira, I. O., Laroca, R., Menotti, D., Fonseca, K. V. O., & Minetto, R. (2021). Vehicle-rear: A new dataset to explore feature fusion for vehicle identification using convolutional neural networks. IEEE Access, 9, 101065–101077. 4. Jain, V., Sasindran, Z., Rajagopal, A., Biswas, S., Bharadwaj, H. S., & Ramakrishnan, K. R. (2016, December). Deep automatic license plate recognition system. In Proceedings of the Tenth Indian Conference on Computer Vision, Graphics and Image Processing (pp. 1–8). ACM. 5. Sajjad, K. M. (2010). Automatic License Plate Recognition using Python and OpenCV. Department of Computer Science and Engineering MES College of Engineering. 6. Pustokhina, I. V., Pustokhin, D. A., Rodrigues, J. J., Gupta, D., Khanna, A., Shankar, K., … & Joshi, G. P. (2020). Automatic vehicle license plate recognition using optimal k-means with convolutional neural network for intelligent transportation systems. IEEE Access, 8, 92907–92917. 7. Kakani, B. V., Gandhi, D., & Jani, S. (2017, July). Improved OCR based automatic vehicle number plate recognition using features trained neural network. In 2017 8th international conference on computing, communication and networking technologies (ICCCNT) (pp. 1–6). IEEE.

82

Multimedia Data Processing and Computing

8. Zhao, J., & Wang, X. (2019). Vehicle-logo recognition based on modified HU invariant moments and SVM. Multimedia Tools and Applications, 78(1), 75–97. 9. Yu, Y., Wang, J., Lu, J., Xie, Y., & Nie, Z. (2018). Vehicle logo recognition based on overlapping enhanced patterns of oriented edge magnitudes. Computers & Electrical Engineering, 71, 273–283. 10. Yang, S., Zhang, J., Bo, C., Wang, M., & Chen, L. (2019). Fast vehicle logo detection in complex scenes. Optics & Laser Technology, 110, 196–201. 11. Soon, F. C., Khaw, H. Y., Chuah, J. H., & Kanesan, J. (2018). PCANet-based convolutional neural network architecture for a vehicle model recognition system. IEEE Transactions on Intelligent Transportation Systems, 20(2), 749–759. 12. Boukerche, A., & Ma, X. (Aug. 2022). A novel smart lightweight visual attention model for fine-grained vehicle recognition. IEEE Transactions on Intelligent Transportation Systems, 23(8), 13846–13862. 13. Zhu, J., Zeng, H., Huang, J., Liao, S., Lei, Z., Cai, C., & Zheng, L. (2019). Vehicle reidentification using quadruple directional deep learning features. IEEE Transactions on Intelligent Transportation Systems, 21(1), 410–420. 14. Teng, S., Zhang, S., Huang, Q., & Sebe, N. (2021). Viewpoint and scale consistency reinforcement for UAV vehicle re-identification. International Journal of Computer Vision, 129(3), 719–735. 15. Peng, J., Wang, H., Zhao, T., & Fu, X. (2019). Learning multi-region features for vehicle re-identification with context-based ranking methods. Neurocomputing, 359, 427–437. 16. Zhao, Y., Shen, C., Wang, H., & Chen, S. (2019). Structural analysis of attributes for vehicle re-identification and retrieval. IEEE Transactions on Intelligent Transportation Systems, 21(2), 723–734. 17. Zhu, J., Zeng, H., Du, Y., Lei, Z., Zheng, L., & Cai, C. (2018). Joint feature and similarity deep learning for vehicle re-identification. IEEE Access, 6, 43724–43731. 18. Alfasly, S., Hu, Y., Li, H., Liang, T., Jin, X., Liu, B., & Zhao, Q. (2019). Multilabel-based similarity learning for vehicle re-identification. IEEE Access, 7, 162605–162616. 19. Wang, Q., Min, W., He, D., Zou, S., Huang, T., Zhang, Y., & Liu, R. (2020). Discriminative fine-grained network for vehicle re-identification using two-stage reranking. Science China Information Sciences, 63(11), 1–12. 20. Tumrani, S., Deng, Z., Lin, H., & Shao, J. (2020). Partial attention and multi-attribute learning for vehicle re-identification. Pattern Recognition Letters, 138, 290–297. 21. Hicham, B., Ahmed, A., & Mohammed, M. (2018, October). Vehicle type classification using convolutional neural networks. In 2018 IEEE 5th International Congress on Information Science and Technology (CiSt) (pp. 313–316). IEEE. 22. Liu, X., Zhang, S., Huang, Q., & Gao, W. (2018, July). Ram: a region-aware deep model for vehicle re-identification. In 2018 IEEE International Conference on Multimedia and Expo (ICME) (pp. 1–6). IEEE. 23. Zheng, A., Lin, X., Dong, J., Wang, W., Tang, J., & Luo, B. (2020). Multi-scale attention vehicle re-identification. Neural Computing and Applications, 32(23), 17489–17503. 24. Jiang, N., Xu, Y., Zhou, Z., & Wu, W. (2018, October). Multi-attribute driven vehicle re-identification with spatial-temporal re-ranking. In 2018 25th IEEE international conference on image processing (ICIP) (pp. 858–862). IEEE. 25. Tian, X., Pang, X., Jiang, G., Meng, Q., & Zheng, Y. (2022). Vehicle re-identification based on global relational attention and multi-granularity feature learning. IEEE Access, 10, 17674–17682.

8

Plant Leaf Disease Detection Using Supervised Machine Learning Algorithm Prasannavenkatesan Theerthagiri

8.1 INTRODUCTION In spite of the fact that agriculture accounts for more than 70% of India’s labor force, India is considered to be a developed nation. When it comes to selecting the appropriate varieties of crops and pesticides for their plants, farmers have a number of alternatives available to them. Because it might be challenging, the diagnosis of plant diseases has to be completed as quickly as possible. In the beginning, a field expert carried out manual checks and examinations of the plant diseases. The processing of this requires a significant amount of time and a substantial amount of labor. The visual assessment of plant diseases is a subjective activity that is susceptible to psychological and cognitive processes that can lead to prejudice, optical illusions, and, ultimately, mistake [1]. Despite the fact that human vision and cognition are remarkable at finding and interpreting patterns, the visual assessment of plant diseases is a task that is prone to error because it is a subjective activity. Expert observation with the naked eye is the method that is used the most often in the process of identifying plant diseases [2]. On the other hand, this requires constant monitoring by trained professionals, which, on large farms, may be prohibitively costly. The automatic detection of plant diseases is an important area of study because it has the potential to assist in the monitoring of huge fields of crops and, as a result, the identification of disease signals on plant leaves as soon as they occur on plant leaves. Because of this, researchers are looking for a method that can accurately diagnose illnesses while also being fast, automated, and cost effective [3]. Monitoring the leaf area is a useful technique for examining the physiological elements of plant growth, including photosynthesis and transpiration processes. It also helps in estimating the amount of damage caused by leaf diseases and pastes, determining the amount of stress produced by water and the environment, and selecting the amount of fertilizer that is required for optimal management and treatment. According to FAO world agricultural data from 2014, India is the top producer of a range of fresh fruits and vegetables. India was one of the top five agricultural producers in the world in 2010, producing over 80% of all agricultural goods, including cash commodities like coffee and cotton [4]. In 2011, India was among the top five global producers of animal and poultry meat, having one of the quickest growth rates. DOI: 10.1201/9781003391272-8

83

84

Multimedia Data Processing and Computing

8.2  LITERATURE SURVEY Savita N. Ghaiwat and colleagues [5] describe the results of an investigation into different classification methods for the categorization of plant leaf diseases. The k-nearest neighbors technique is appropriate for the present testing environment and for making predictions about the students’ classes. It is difficult to use SVM for classification if the training data cannot be linearly separated. The technique of leaf segmentation that was presented by Sanjay B. Dhaygude and colleagues [6] is one in which the red, green, blue (RGB) picture is transformed into the hue, saturations, and intensity (HSI) color format. This strategy is broken down into four phases. The first is to change the format of the RGB into HSI. In the second step of the process, green pixels are masked using a threshold value. The third step involves applying the mask to the primary picture and extracting a segmented portion of the primary image. The last stage, often known as the fourth major stage, is when the segmentation is finally finished. Mrunalini R. Badnakhe et al. [7] provide a description of a method for classifying and diagnosing the many different illnesses that might affect plants. A method of identification that is based on machine learning will be of tremendous use to the Indian economy since it will help save time, money, and effort. The approach of color co-occurrence is used in the aforementioned article in order to extract feature sets. Neural networks are applied in order to automatically detect illnesses present in leaf tissue. The proposed approach may be of considerable assistance in accurate identification of the leaf, and it seems to be an important technique in the event of stem and root illness, all while needing less computing labor to implement. Disease detection is said to include a few different steps, as stated in S. Arivazhagan et al.’s research [8]. The following are the four most significant of these factors: First, the input picture is converted into RGB format. Next, the thresholding segmentation approach is used to mask the green pixels that have been individually segmented. After that, the features of the sickness that were collected in order to classify it are given to the person who is doing the classifying. The usefulness of the proposed algorithm is shown by the fact that it is capable of accurately recognizing and categorizing the illnesses with a rate of 94%. The durability of the suggested method was shown by utilizing experimental results from around 500 plant leaves that were stored in a database. Anand H. Kulkarni et al. [9] propose the use of artificial neural networks (ANN) in conjunction with a variety of image processing techniques for the purpose of accurately identifying plant diseases at an early stage. For both the Gabor feature extraction approach and the ANN classifier, our system obtained an accuracy of 91%. A classifier that is based on an artificial neural network (ANN) will integrate a disease’s texture, color, and features in order to categorize and identify it. Sabah Bashir et al. [10] explain how illness detection in the Malus domestic apple may be accomplished via the use of k-mean clustering, texture, and color analysis. It does this by using similar textural and color aspects in both normal and affected parts of the leaf in order to detect and differentiate between the various leaf characteristics. It is possible that in the not-too-distant future, data categorization will entail the use of classifiers such as Bayes, principal component, and k-means clustering.

Plant Leaf Disease Detection

85

According to Smita Naikwadi and colleagues’ [11] research, histogram matching can be utilized in the diagnosis of plant diseases. Histogram matching is performed using edge detection and color characteristics because the disease manifests itself only in the plant’s leaves. Using an edge-detection algorithm and a layer separation approach, the training process separates the layers of an RGB picture into red, green, and blue channels. These channels represent the individual layers of the image. The color co-occurrence texture analysis approach is one that uses spatial gray-level dependency matrices as its foundation. Both the triangle threshold and the simple threshold technique are presented in [12] by Sanjay B. Patil and colleagues. These procedures separate the area around the lesion from the part of the leaf that is affected. The last phase in the process involves using the lesion area as well as the leaf area fraction to categorize the sickness. According to the findings of the study, the suggested approach for evaluating leaf sickness is both fast and accurate, and threshold segmentation can measure the leaf’s surface area. Piyush Chaudhary and colleagues [13] offer a system that segments disease spots on plant leaves by using image processing methods. In this research, the HSI, CIELAB, and YCbCr color spaces are compared with one another in terms of the methodology used to diagnose illness spots. By using the median filter, the picture was given a calmer appearance. In the tenth and last stage, known as stage 1030, the Otsu method may be used for the color component in order to establish the detection threshold for the hazardous site. In their work, Arti N. Rathod and colleagues [14] present a comprehensive explanation of numerous ways for recognizing leaf disease using image processing techniques. These methods may be applied to a variety of leaves. In the current study on methods, the goal is to increase throughput while simultaneously reducing the amount of subjectivity that comes from using naked-eye observation to detect and diagnose plant diseases.

8.3  PROPOSED SYSTEM Figure 8.1 shows a block schematic of the proposed system.

FIGURE 8.1  Block diagram of the plant leaf disease detection.

86

Multimedia Data Processing and Computing

8.3.1  Leaf Disease Image Database The collection includes disease-free and diseased apple, grape, tomato, and corn leaf images of PlantVillage dataset [15]. Seventy-five percent of the data is put to use for instructional reasons, while the remaining 25% is used for diagnostic functions. After that, the image is downsized to 256 × 256 pixels. This database is meticulously developed because it determines the classifier’s efficiency and the suggested system’s performance. Table 8.1 depicts the distribution of the dataset for training and testing.

8.3.2 Image Preprocessing Image preprocessing is used to improve the quality of images before they are processed and analyzed further. The RGB format of the supplied images is used initially. The RGB photos are first transformed to gray scale. The images that were obtained are a little noisy. The color transformation is used to determine an image’s color and brightness. The quality of a picture may be improved with the use of a median filter.

8.3.3 Feature Extraction The feature is computed as a consequence of one or more measurements, each of which identifies a measurable attribute of an item, and it assesses some of the object’s most important characteristics while doing so. Both low-level and high-level traits may be used in the classification of all qualities. It is possible to obtain low-level features directly from the source pictures; however, high-level feature extraction necessitates the extraction of low-level features first. One of the characteristics of the surface is the texture. The geographical distribution of different shades of gray inside a neighborhood is what characterizes that neighborhood. Because a texture shows its properties via both the positions of its pixels and the values of those pixels, there are many different methods to classify textures. The size or resolution at which a picture TABLE 8.1 Dataset Distribution Dataset Plants Apple, black rot Apple, cedar apple rust Apple, healthy Corn (maize), gray leaf spot Corn (maize), common rust Corn (maize), healthy Grape, black rot Grape, esca (black measles) Grape, healthy Tomato, early blight Tomato, healthy Tomato, late blight

Total Images 621 275 1628 513 1192 1157 1180 1383 423 1000 1588 1902

Training Images 497 220 1299 411 954 925 944 1107 339 800 1270 1521

Testing Images 124 55 329 102 238 232 236 276 84 200 318 381

87

Plant Leaf Disease Detection

is shown may have an effect on the appearance of the image’s texture. A texture that seems to have distinct characteristics when seen on a smaller scale might change into a more uniform appearance as the scale is increased [16]. 8.3.3.1  Shape Features The geometry of the leaves determines a great deal of the plant’s properties. The width of the leaf is defined by the length of the minor axis, while its length is determined by the Euclidean distance between two locations located on each side of the long axis.

Aspect   Ratio   =  

Length   of   the   leaf (8.1) The   breadth   of   the   leaf

The length and width of the leaf are used to compute the leaf aspect ratio. To calculate the area, first determine the size of a single pixel.

Area = Area   of   a   pixel * Total   no.  of   pixels   present   in   the   leaf (8.2)

The count of pixels with the leaf margin determines the leaf’s perimeter. Rectangularity depicts the resemblance of a leaf to a rectangle.

Rectangularity =

L *W (8.3) A

where L is the length, W is the width, and A is the area of the leaf. 8.3.3.2  Shape Texture The feature is computed as a consequence of one or more measurements, each of which identifies a measurable attribute of an item, and it quantifies a number of the object’s most important characteristics in the process. The use of low-level and high-level characteristics allows for the categorization of all qualities. While it is possible to extract low-level features directly from the source pictures, high-level feature extraction always requires the previous step of extracting low-level features first. One of the characteristics of the surface is the texture. The geographical distribution of different shades of gray inside a neighborhood is what characterizes that neighborhood. Because a texture shows its properties via both the positions of its pixels and the values of those pixels, there are many different methods to classify textures. The size or resolution at which a picture is shown may have an effect on the appearance of the image’s texture. The size or resolution at which a picture is shown may have an effect on the appearance of the image’s texture. A texture that seems to have distinct characteristics when seen on a smaller scale might change into a more uniform appearance as the scale is increased [17]. In statistical texture analysis, a distribution of pixel intensity at a particular point is what represents the properties of the texture. It includes first-order statistics, second-order statistics, and higher-order statistics, all of which are determined by the number of pixels or dots that make up each combination. It is possible to do an analysis of a picture as a texture by using second-order statistics of extracting features based on gray-level co-occurrence matrix (GLCM) [18].

88

Multimedia Data Processing and Computing

FIGURE 8.2  (a) Example of an image with four gray-level images. (b) GLCM for distance 1 and direction 0°.

A database called the GLCM may be used to determine the frequency with which a certain combination of pixel brightness values occurs in a picture. The GLCM of a four-level gray-scale picture is created in the manner seen in Figure 8.2 when the distance is 1 and the direction is 0 degrees. The image’s statistical data are referred to as features. GLCM is a technique for extracting distinct characteristics from grayscale and binary images. The following GLCM characteristics are retrieved using the suggested method. 8.3.3.2.1 Contrast The local differences in the gray-level co-occurrence matrix are measured using contrast.

∑ i −  j

Contrast   =

2

p ( i,  j ) (8.4)

i, j

8.3.3.2.2 Homogeneity The closeness of the element distribution in GLCM to the GLCM diagonals is measured by homogeneity.

Homogeneity   =

∑ i, j  1 +1(i − j )   p (i,  j ) (8.5) 2

8.3.3.2.3 Energy It measures the uniformity among the pixels.

Energy =

∑ i, j   p (i,  j ) 2 (8.6)

8.3.3.2.4 Entropy It measures the statistical measurement of the randomness of each pixel.

Entropy   =   −  

∑ p (i,  j ) log ( p (i,  j ))  (8.7) i, j

89

Plant Leaf Disease Detection

8.3.3.2.5 Dissimilarity Dissimilarity is a metric that describes how different gray-level pairings in an image vary.

Dissimilarity   =  

∑ i  −   j p (i,  j ) (8.8) i, j

where p(i, j) = image pixel to be processed.

8.3.4 Classification The machine learning technique is used to assign categories to various pictures in this suggested study. When the classifiers have been properly trained using the training set, they are next applied to the testing set. After that, the performance is judged based on a comparison of the predicted labels and the actual labels that were produced. During the training and evaluation phases of this method, a decision tree and a gradient-boosting algorithm are put to use in order to classify leaf pictures according to whether they are healthy or affected by a certain illness. 8.3.4.1  Decision Tree Algorithm A decision tree is an important framework for classifying different kinds of scenarios. It is comparable to supervised machine learning in the sense that data is continuously segmented in accordance with a parameter. A tree structure known as a decision tree may be described as having nodes such as the root, intermediate, and leaf nodes. Both classification and regression may be done with the help of a decision tree (Figure 8.3).

FIGURE 8.3  Decision tree.

90

Multimedia Data Processing and Computing

The decision tree (DT) algorithm is given below. Input: S, where S = set of classification instances Output: decision tree 1: Procedure build tree 2: Repeat 3: Gain_max ← 0 4: Split A ← null 5: e ← Entropy (Attributes) 6: for all Attributes α in S do 7: gain ← Gain_Information (a, e) 8: if gain > Gain_max then 9: Gain_max ← gain 10: SplitA ← α 11: end if 12: end for 13: Partition (S, splitA) 14: Until all partition processed 15: End procedure 8.3.4.2  Gradient Boosting Algorithm Gradient boosting (GB) is a technique of machine learning (ML) that is used for classification and regression. It generates a prediction model by using an ensemble of usually binary trees [19]. The gradient-boosting technique is at the heart of XGBoost, an ensemble approach that combines fundamental weak ML algorithms into a more extended model, as seen in Figure 8.4.

FIGURE 8.4  Schematic representation of the gradient-boosting machine learning technique.

91

Plant Leaf Disease Detection

The gradient-boosting algorithm supports both binary and multi-class classification. Compared to existing categorization techniques, the suggested approach provides improved accuracy [20]. After evaluating the results, it was discovered that the proposed method had a higher accuracy of 80.02%. The prediction model’s accuracy may be improved in one of two ways: by adopting feature engineering or by employing the boosting algorithm right away. Input: Training images 1: Initialize f0(x)argminp ∑N  L(yi, p) 2: For m=1 to M, do 3: Step1: compute negative Gradient

(

 ∂ L yi , F ( xi ) yi   =   −    ∂ Fx   i 

)  

4: Step 2: Fit the model N



α m = argminα ,β

∑  y − β h ( x ;α i

m

)

2

5: Step 3: Choose a gradient descent step size as N



ρm   =   argmin p  

∑ L ( y ,  F  −  1( x ) +   ρh (( x ; α ))) i

m

i

i

m

i =1

6: Step 4: Update the estimation of F(x)

Fm ( x )  =   Fm −1 ( x )  +   ρ mh ( x , α m ) 7: end for 8: Output: the final regression function Fm(x)

8.4 RESULTS The Python programming language, OpenCV for image processing, and scikit-learn for classification purposes are all used in the proposed system. This system has been trained and tested on Windows 10, and it has an Intel i5 core and 8 gigabytes of RAM. A qualitative and quantitative examination of the data are used to assess the performance of the system. Accurate categorization is used in the process of doing quantitative analysis.

8.4.1 Analysis of the Qualitative Data The objective of the qualitative analysis is to provide a description of the circumstance that is exhaustive in scope. There is no effort made to ascribe frequencies to

92

Multimedia Data Processing and Computing

FIGURE 8.5  Qualitative analysis on apple black rot leaf: (a) input image, (b) HSV image, (c) mask, (d) processed mask, (e) extracted leaf, (f) classified image.

the linguistic characteristics found in the data, and exceptional events are handled the same as those that are more prevalent. Figure 8.5 depicts the qualitative research that was conducted on the approach that was suggested for identifying leaf diseases. Table 8.2 shows the results of qualitative analysis using different machine learning classifiers. Figures 8.6 and 8.7 depicts a graphical study of the decision tree and gradient boosting method, showing the accuracy for each plant and the combined dataset.

TABLE 8.2 Quantitative Analysis of the Proposed System in Terms of Accuracy Classifier Decision Tree Gradient Boosting

Apple 87.53% 94.59%

Corn 95.76% 98.54%

Grape 74.9% 85.17%

Tomato 82.88% 88.02%

Combined 69.89% 80.02%

Plant Leaf Disease Detection

93

FIGURE 8.6  Qualitative analysis on apple black rot leaf: (a) input image, (b) HSV image, (c) mask, (d) processed mask, (e) extracted leaf, (f) classified image.

According to a qualitative examination, the gradient-boosting strategy outperforms the option for both individual and combined datasets. The maize dataset and the Apple leaf disease picture dataset yield the best results for the classifier. It has also been discovered that as the number of classes rises, the classifier’s performance decreases.

FIGURE 8.7  Graphical analysis of the accuracy of decision tree and gradient-boosting algorithm on individual plant leaf disease dataset and combined leaf disease dataset.

94

Multimedia Data Processing and Computing

8.5 CONCLUSION The classification of plant leaf diseases is the topic of this research, which provides the machine learning techniques known as decision trees and gradient boosting. The form and texture traits are retrieved in order to differentiate between healthy plants and the many illnesses that might affect plants. The assessment of the suggested system is done with the help of the PlantVillage dataset. The suggested technique was subjected to both qualitative and quantitative examination, which revealed that the system is capable of properly classifying plant leaf diseases. According to the results of the provided methodology, the gradient-boosting classifier achieves an accuracy rate of 80.05% for the PlantVillage database. In the future, the accuracy of the system will increase thanks to the implementation of a variety of feature extraction strategies and classification algorithms, by merging different classification algorithms using the fusion classification method, to achieve the goal of increasing the detection rate of the process of classification. In response to the identification of disease, the farmer will get the appropriate mixture of fungicides for continuing application to their crops.

REFERENCES









1. Vijai Singh, Namita Sharma, Shikha Singh. A Review of Imaging Techniques for Plant Disease Detection. Artificial Intelligence in Agriculture, Volume 4, 2020, pp. 229–242. doi: 10.1016/j.aiia.2020.10.002. 2. Ms Gavhale, Ujwalla Gawande. An Overview of the Research on Plant Leaves Disease Detection Using Image Processing Techniques. IOSR Journal of Computer Engineering. Volume 16, 2014, pp. 10–16. doi: 10.9790/0661-16151016. 3. Rashedul Islam, Md. Rafiqul. An Image Processing Technique to Calculate Percentage of Disease Affected Pixels of Paddy Leaf. International Journal of Computer Applications. Volume 123. 2015, pp. 28–34. doi: 10.5120/ijca2015905495. 4. Horticultural Statistics at a Glance 2018, Department of Agriculture, Cooperation & Farmers’ Welfare Ministry of Agriculture & Farmers’ Welfare Government of India. https://agricoop.nic.in/sites/default/files/Horticulture%20Statistics%20at%20a%20 Glance-2018.pdf 5. Savita N. Ghaiwat, Parul Arora. Detection and Classification of Plant Leaf Diseases Using Image Processing Techniques: A Review. International Journal of Recent Advances in Engineering & Technology, Volume 2, Issue 3, 2014, pp. 2347–2812. 6. Sanjay B. Dhaygude, Nitin P. Kumbhar. Agricultural Plant Leaf Disease Detection Using Image Processing. International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering, Volume 2, Issue 1, January 2013, pp. 2022–2033. 7. Mrunalini R. Badnakhe, Prashant R. Deshmukh. An Application of K-Means Clustering and Artificial Intelligence in Pattern Recognition for Crop Diseases. International Conference on Advancements in Information Technology 2011 IPCSIT, Volume 20, 2011. 8. S. Arivazhagan, R. Newlin Shebiah, S. Ananthi, S. Vishnu Varthini. Detection of Unhealthy Region of Plant Leaves and Classification of Plant Leaf Diseases Using Texture Features. Agricultural Engineering International: CIGR Journal, Volume 15, Issue 1, 2013, pp. 211–217.

Plant Leaf Disease Detection

95

9. Anand. H. Kulkarni, Ashwin Patil R. K. Applying Image Processing Technique to Detect Plant Diseases. International Journal of Modern Engineering Research, Volume 2, Issue 5, 2012, pp. 3661–3664. 10. Sabah Bashir, Navdeep Sharma. Remote Area Plant Disease Detection Using Image Processing. IOSR Journal of Electronics and Communication Engineering, Volume 2, Issue 6, 2012, pp. 31–34. 11. Smita Naikwadi, Niket Amoda. Advances in Image Processing for Detection of Plant Diseases. International Journal of Application or Innovation in Engineering & Management, Volume 2, Issue 11, November 2013, pp. 135–141. 12. Sanjay B. Patil. Leaf Disease Severity Measurement Using Image Processing. International Journal of Engineering and Technology, Volume 3, Issue 5, 2011, pp. 297–301. 13. Piyush Chaudhary. Color Transform Based Approach for Disease Spot Detection on Plant Leaf. International Journal of Computer Science and Telecommunications, Volume 3, Issue 6, 2012, pp. 65–70. 14. Arti N. Rathod, Bhavesh Tanawal, Vatsal Shah. Image Processing Techniques for Detection of Leaf Disease. International Journal of Advanced Research in Computer Science and Software Engineering, Volume 3, Issue 11, November 2012, pp. 340–351. 15. PlantVillage Dataset, https://plantvillage.psu.edu/posts/6948-plantvillage-datasetdownload. 16. Neil W. Scanlan “Comparative performance analysis of texture characterization models in DIRSIG.” 2003. 17. Xin Zhang, Jintian Cui, Weisheng Wang, Chao Lin. A Study for Texture Feature Extraction of High-Resolution Satellite Images Based on a Direction Measure and Gray Level Co-Occurrence Matrix Fusion Algorithm. Sensors, Volume 17, Issue 7, 2017, pp. 1474. 18. Abdul Rasak Zubair, Seun Alo. (2019). Grey Level Co-occurrence Matrix (GLCM) Based Second Order Statistics for Image Texture Analysis. International Journal of Science and Engineering Investigations, Volume 8, Issue 93, October 2019, pp. 64–73. 19. Peng Nie, Michele Roccotelli, Maria Pia Fanti, Zhengfeng Ming, Zhiwu Li. Prediction of Home Energy Consumption Based on Gradient Boosting Regression Tree. Energy Reports, Volume 7, 2021, pp. 1246–1255. 20. Hamid Jafarzadeh, Masoud Mahdianpari, Eric Gill, Fariba Mohammadimanesh, Saeid Homayouni. 2021. Bagging and Boosting Ensemble Classifiers for Classification of Multispectral, Hyperspectral, and PolSAR Data: A Comparative Evaluation. Remote Sensing, Volume 13, Issue 21, 4405. https://doi.org/10.3390/rs13214405.

9

Smart Scholarship Registration Platform Using RPA Technology Jalaj Mishra and Shivani Dubey

9.1 INTRODUCTION As the scope of education is growing continuously, the number of students is also growing in every university/college/school. And there is a provision of government scholarships for students studying in any university/college/school. If any student wants to receive the benefit of this provision of government scholarships, they will have to apply for the scholarship registration procedure according to the underlying scholarship provision criteria. The registration process for a government scholarship is a long process. In this procedure, students fill out a long digital form to be registered on the government scholarship portal. But, as we are seeing in the current scenario, the educational cost is going up continuously, so there is also a growing need for a scholarship for every student for their graduate-level and further studies in universities and colleges. So, every student wants to have a government scholarship today. The scholarship registration of a university/college-level student is the responsibility of the university/college administration (i.e., each student of the university/ college should be registered successfully for the scholarship on the government scholarship portal). To fulfill this responsibility, the university/college administration appoints a member or group of members to register all students for scholarship on the government portal. This is a big task for the team, and it is also complex. The members perform the tasks manually for each student. This takes too much effort and time, with a great margin of manual errors. To remedy this issue, robotics process automation (RPA) can be introduced as the solution. By using RPA, the university/college administration can automate the task of filling out the scholarship forms for students, thereby getting a big relief in the scholarship registration process.

9.2  ROBOTIC PROCESS AUTOMATION Automation of robotic processes is that making creation simple, implementing, and take care of software robots that copy as humans connect to the cloud-based applications and systems. Instead of using people to perform repetitive and low-value tasks such as simply logging into systems and applications, relocating documents and folders, retrieving, duplicating, and compounding by the fact, and filling out forms, software robots can now perform these tasks. RPA is a software that is created and used to perform types of administrative tasks that otherwise require human work simplification for instance, transferring information from various sources of input, such as email messages and 96

DOI: 10.1201/9781003391272-9

Smart Scholarship Registration Platform Using RPA Technology

97

spreadsheets, to registration systems, including enterprise resource planning (ERP) [1]. Robotic automation refers to the utilization of specific technologies and methods to utilize a machine or “virtual machine” rather than an individual to control ongoing software applications, such as ERP, Asserts apps, and datasets, and to acquire knowledge management in the same way that humans currently practice [2]. RPA, also known as automated preparation computerization, is the use of computer program bots to automate incredibly tedious tasks that are frequently completed by information workers [3]. In spite of the fact that the phrase “robotic process automation” promotes the notion of robots carrying out human tasks, it may actually be a software configuration. RPA is the technology that executes common process tasks in accordance with straightforward rules. Its range of capabilities includes filling out forms and responding to emails. It can also read and extract data from corporate planning applications and perform simple calculations [4]. The robot used in RPA is programmed. Users are unlikely to see a physical computer having arms, legs, and wheels and tapping away on a keyboard. A fully automated robot instructor captures keystrokes and mouse clicks with the aid of a piece of software. To mimic a person’s actions, a computer (the robot) actually replaces these actions. RPA, or rule-based automated process, is a process that automates processes that are based on rules, using software that attempts to simulate actions after having a conversation with several other applications on a device. This typically entails reading, writing, or possibly tapping already existing systems employed for completion of the required tasks. In addition, such robots make decisions based on the information and predetermined rules and carry out complex calculations [5]. RPA usage in businesses has grown significantly in recent years and is expected to increase by 20% to 30% annually, or $3.97 billion, by 2025. RPA growth is anticipated to occur between 2021 and 2028 at a percentage of 32.8%. Organizations are implementing RPA in an effort to cut costs and boost productivity, efficiency, and service quality (Figure 9.1) [6].

FIGURE 9.1  Growth and efficiency of RPA.

98

Multimedia Data Processing and Computing

The implementation of this software is anticipated to improve productivity by approximately 86%, improve performance by approximately 90%, and lower costs of the workplace by approximately 59%. Because of these major effectiveness improvements, robotization has altered various organizations’ priorities, including those in the banking industry. As a result, 30% of banks globally and 45% in Poland have designated it as a priority [7]. The next innovation wave will change outsourcing: automation. The race to emerge as the leading automation-enabled network operator in the sector is already beginning to take shape. In due course, we’ll probably witness an armed conflict for the most innovative automation tools, which will result in new products and delivery paradigms. Even though the phrase “automation” evokes thoughts of exact robotic systems doing human efforts in offices, it refers specifically to the automation of fully performed efficient delivery.

9.3  BENEFITS OF RPA As you all know, we can’t ignore the economics of RPA and its uses in different industries. By the use of RPA, human employees are free from mundane, repetitive work, and it empowers employees to focus on other core objectives of business, which brings various benefits to the organization. RPA has a successful effect on the operations of the business and its outcome. Robotic Process automation brings corporate advantages like cost reduction, greater accuracy, delivery speed, and other benefits. And it continuously adds value because it spreads and takes up momentum across the organization. RPA enhances the outcomes of the business process, such as customer satisfaction. The benefits of RPA are the following: a. Cost saving and fast ROI: According to estimates, RPA can cut processing costs by 60% to 80%. Anyone can quickly become an expert at creating RPA bots in a non-coding environment where no coding knowledge is necessary, and they can then start generating ROI. An employee squanders 40% of their time on administrative tasks that can be automated. Automation can help to recover the cost in a very short time span. In less than 12 months most of the enterprises see positive results. It’s all about gains. b. Increased speed and employee productivity: All of us are aware of RPA and how it controls numerous repetitive processes and tasks in business. Employees can engage in more activities that add value to the company. RPA enables staff to carry out more important tasks for the business and its clients. Employees appreciate RPA because it eases their workload. Employee productivity rises when they believe their job is highly valued and noteworthy, which can aid in raising productivity. c. Higher accuracy: We all make mistakes at work because we are human, but robotic software never does. Robots are dependable and consistent, and they can eliminate processing errors. RPA has a perfect accuracy rate if processes and sub-processes are properly optimized and accurately mapped. d. Enhanced efficiency: Software robots can be employed 365 days a year, 24 hours per day. They never need a vacation or a holiday. We can replace

Smart Scholarship Registration Platform Using RPA Technology

99

the work of three or four employees with a single robot. More and more work can be processed in the same amount of time, with less effort. e. Super scalability: RPA performs a large quantity of processes from the cloud to the desktop in a parallel manner. RPA easily handles any workload or pressure, whether preplanned or not. f. Improvement of management capabilities: This benefit improves management and administrative capabilities such as attendance management (automatic dealing with participants, sending robotized reminders and warnings and final report to understudies), timetable gathering, and other tedious tasks in HR, finance, and admin departments such as workers onboarding/ offboarding, stock administration, etc. Due to the increasing benefits and the versatility of RPA, this technology is used in various industries, some of are as follows: • Business process outsourcing (BPO) sector: With RPA, the BPO sector can spend less on external labor. • Financial sector: RPA helps to handle huge amounts of transactional data and other of information. • Healthcare sector: RPA improves patient-capture processes in the healthcare industry by sending computer-controlled reminders for important appointments and removing human error. • Telecommunications sector: In this sector, RPA helps with uploading and extracting data and collecting information from client phone systems. • Government sector: RPA reduces the time of populating subcontractors’ forms and various verification processes by automating reports and new systems.

9.4 BACKGROUND In 2021, Tanya Nandwani et al. proposed RPA-related academic area remedies. The enquiry of the understudy’s assessment results is mechanized in this paper. In this paper, the given description of the project is that the student information is predefined in the XLS format, which is read by the RPA service, and RPA will upload the information in the university portal. For the implementation of this described methodology, they used HTML, JavaScript, Selenium Scripting, and Java technology. They draw the conclusion that the research demonstrates a successful robotic system software in order to make a college’s investigation easier. They mentioned in the paper that their results demonstrate that no work was done with mistakes. Likewise, compared to a manual examination by humans, this investigation took approximately 94.44% less time [8]. In 2020, Nandwani et al. proposed a solution in the educational sector that shows how RPA is being used to automate many of the procedures involved in running a student system, as well as how it integrates routine tasks inside of an ERP system. A chatbot is also used here within the solution, allowing users to extract the required information. With the aid of RPA technology, all of the processes in this work are automated. The RPA bot automates the daily repetitive processes in the ERP system

100

Multimedia Data Processing and Computing

to maintain ongoing maintenance. The proposed system says that by using RPA and chatbot, the student information within an ERP system can be managed efficiently without errors because it allows efficient management of functions and procedures. Here a stack of technologies has been used to implement this proposed solution. For developing the chatbot, they used Dialogflow and a webhook, and to automate the process flow, RPA’s UI path was integrated with the chatbot and MySQL database. The result of this study is that this automated student management system helps institutes keep track of data related to the users of this system, so faculty can track students’ performance easily. They conclude that these technologies together provide a user-friendly interface, an ERP, and a chatbot. It will help the firm to maintain the data in an efficient way and control the data also [9]. The advantages of this new RPA technology and implementations of RPA in different domains are provided by applying it to organizational processes of public administration. Introducing the robotization process lowers the cost and automates the processes and functions. The newest technologies are quickly implemented in all spheres of activity. Technologies are designed to enhance or optimize the management and growth of organizations by introducing new levels of service quality and efficiency. Various manufacturing tasks and operations are carried out by smart robots and smart devices. To complete a business process, robotization is employed as an automatic application which copies human personal actions to the data. The amount of data processed and how effectively the procedure can be algorithmized are presented as different requirements for RPA repetitive tasks. This paper also clarifies how RPA can be used by both private businesses and the public sector [10]. A thorough examination of the top platforms, namely UiPath, Automation 360, and Blue Prism, is presented in “Delineated Analysis of Robotic Process Automation Tools”. When promptness is expected across all industries, the rate at which various processes are carried out becomes crucial. The authors compared these tools while taking into account a number of factors, including front-office automation, back-office automation, security, and how user-friendly and code free they were. With UiPath’s adaptive algorithms pushing it beyond all reasonable limits, this paper focuses on its potential future applications [11]. The worldwide market of RPA had a value of $1.29 billion in 2020, according to a Fortune Business Insights report. The global impact of COVID-19 on the RPA over the past two years has been dizzying, a positive impact that has increased demand in all regions. Compared to the average annual growth between 2017 and 2020, the worldwide economy showed a significant expansion of 21% in 2020, according to Fortune Business Insights analysis [12]. This was a case study. In 2020, during COVID, the University of Tirana implemented RPA in the university for the administration process. This paper explains that there were many difficulties in schools/universities that school administrative boards faced to manage their administrative tasks during COVID. One of the processes carried out every year in each study cycle is the application for scholarships. Considering that last year the schools (universities) to respect the social rules of COVID realized the application for scholarships by filling out the form, they realize a program with RPA that automates the process of calculating the lists of applications for scholarships. Every year scholarship application is a very

Smart Scholarship Registration Platform Using RPA Technology

101

big and important task for the administrative board in any school/college/university, which was the same during COVID. During the pandemic 2020 study cycle, the University of Tirana implemented the RPA to count the list of scholarship applications using UiPATH, which is an administrative task. According to them, an RPA bot handles all the processes that read the email (in the specific case of the teaching secretary) and, depending on the subject of the email, downloads the scholarship application forms, reads the data needed with OCR technology, and throws them all into Excel, thus avoiding working hours of a person who opens emails, checks data, determines the type of scholarship, puts the data into Excel, thus repeating a long cycle. By this we get results that are sent by email, with the students’ ID, type of scholarship, and the status of whether it is approved or not. The whole process described above is executed in very few seconds. One of the advantages of using RPA is the realization of voluminous and repetitive work in a very short time and very precisely [1].

9.5  ISSUES IN ROBOTIZATION OF A PROCESS Since RPA is a relatively increasing automation technique, its current systems face a number of difficulties. Various articles demonstrate the RPA challenges. The most crucial thing to know is whether the process can be automated or not, because process automation depends greatly on this understanding, A recurring problem seems to be selection of a task for automation. Companies struggle to apply automation on extremely difficult and fractionable tasks involving parties due to the lack of knowledge about RPA automation. Companies struggle to find the best solutions for their problems, and internal resistance to embracing new cultures is another issue. One difficulty is that staff members are unsure of how this adoption will affect their jobs (systems, document structure, and other changes) [13]. RPA is a new technology, so knowledge and experience in implementation are lacking [14]. Among the major things for RPA implementations are access and security. Humans have always controlled who has access to what resources. However, new regulations must take into account software robots’ access to information [15]. Although the majority of businesses have initiated the usage of digitalization as an adaptable, cutting-edge method of information storage, others are still lagging behind. The use of unordered documents continues to be a major barrier to organizations’ adoption of RPA [15]. The total absence of experience and knowledge with this software contributes to the emergence of contextual resistance to change. First off, some employees refuse to adopt this new technology unless forced to do so because they are afraid of losing their jobs, which reduces compliance. Second, because they are satisfied with their current work cultures, some entities fail to support and prioritize this adoption [16]. Similar to how current security procedures do not take into account the establishment of digital workers, organizations face a great challenge in successfully adopting a new security framework [17]. Companies find it difficult to adopt RPA due to the software’s novelty and the resulting dearth of documentation because there are no standards or methodologies in place [18, 19] as shown in Table 9.1.

102

Multimedia Data Processing and Computing

TABLE 9.1 Issues in RPA S. No. 1.

Concerns Absence of expertise and knowledge

2.

Resistance from workers and investors

3.

Faulty procedures

4.

Incompatible information

Dates & Writers Kokina et al. 2021 [17]; Saukkonen et al. 2019 [20]; Wewerka et al. 2020 [14]; Gotthardt et al. 2020 [21]; Kokina & Blanchette 2019 [22]; Marciniak & Stanisławski 2021 [13]; Hegde et al. 2018 [23]; Lacity et al. 2015 [24]; Flechsig et al. 2021 [25] Saukkonen et al. 2019 [20]; Gotthardt et al. 2020 [21]; Viale & Zouari 2020 [17]; Marciniak &Stanisławski 2021 [13]; Fernandez & Aman 2018 [25]; Willcocks et al. 2017 [26]; Flechsig et al. 2021 [25] Viale & Zouari 2020 [16]; Hegde et al. 2018 [23]; Siderska 2020 [27] Wewerka J et al 2020 Jan 22 [14]; Januszewski et al. 2021 [28]; Gotthardt et al. 2020 [21]; Hegde et al. 2018 [23]

9.6  TOOLS AND TECHNOLOGIES 9.6.1  RPA As a Rising Technology RPA is used to automate tasks in corporations. This technology provides the capacity to process large amounts of data, minimize complexities and mistakes brought on by human error, increase the motion of the process, and continue operations without the need for rest like human workers need [22]. With the help of this technology, employees can be freed up from boring and repetitive business tasks, allowing them to focus their skills on more creative or complex thinking tasks. With the use of this rising technology, intelligent programs are called “software robots.” Their crucial tasks include gathering data available on the platforms, making changes in the database, controlling many tasks on the basis of requirements of an IT company, automating the sending of emails, and migrating extremely large amounts of data in various systems as well as the validation of that. Only those business processes that have clearly predefined rules and regulations of the business can use the technology. It primarily entails interacting with the interfaces of IT systems while mapping the steps that employees take to complete the entire business process [23]. It is important to note that RPA technology accepts non-encroaching methods to find specific properties of developing software when examining its structure and functionality. This technology can be recognized as follows (Table 9.2): • Routine tasks—Routine tasks are those that are performed on a regular basis. Any process can go for automation, the condition is that a problem will be selected for automation if that executes again and again within the constant frequency. • Principle-based analysis of processes—In this analysis, execution depends on a set of steps.

Smart Scholarship Registration Platform Using RPA Technology

103

TABLE 9.2 RPA Research Techniques Protocol Elements of SLR Availability Acronyms Search technique

Associability criteria Exemption standards

Analysis Details for RPA Internet research, research gate. RPA, robotics in corporate, uses robotic process automation in daily-life tasks, in government organizations, and in administration. Prior writings, articles, document files, and papers released in scientific journals receive preference, scientific journals, examples from businesses that have implemented RPA, and reviews of conferences and chapters. Search RPA functional chart, RPA operating model, and the robotic process automation Articles with partial access, articles with few or no links to many other papers, articles lacking complete PDF files, and articles having access to simply the abstract

• Specified structure of data—Structuring of data is used to describe how documents used in a specific business process have been standardized. It is important to note that processes with inconsistent data structures or those where a particular format has the power to convert frequently shouldn’t be automated cause of the excessive number of exceptions. • Non-encroaching tasks—This phrase refers to how RPA technology interacts with the systems it works with, not the tasks that are carried out. Without requiring deep access to services, a robotized and automated solution interacts only with the systems of end users to retrieve and process visible data.

9.6.2  Automation 360 People are empowered by Automation Anywhere to contribute their ideas, thoughts, and solution to any problems. We provide the most advanced digital platform in the world that enables humans to work more by automating business or other processes. In our proposed system, we have used RPA as a technology and “Automation 360 Community Edition,” a cloud-based RPA tool by Automation Anywhere. This tool provides various benefits like AI bots, scalable bots, and many other automation techniques that encourage users to practice automation on the various processes with less effort and cost [24] as shown in Figure 9.2. This is an easily available application platform that is full of various functional operations; with the use of this platform, we can easily implement any RPA process with less effort. In our research, we have used the Community Edition platform because anyone can freely enjoy the benefits of this freely available platform and can start our RPA journey with Community Edition of Automation Anywhere [25]. • In the cloud version, users can gain access to cutting-edge enterprise-class technology.

104

Multimedia Data Processing and Computing

FIGURE 9.2  Automation 360 web page.

• It provides drag-and-drop activities. • Bot creation is allowed for each level of skill that you have. • In-product learning is level-wise. We made a demo scholarship registration form using HTML, CSS, and JavaScript in this project. By using this demo form we can demonstrate the solution idea of our project easily in front of our target audience. These technologies are described as follows: • Hyper-Text Markup Language: If we talk about markup language, then HTML is the first response of anyone; it is a basic and well-known markup language. With the use of HTML, we design web pages, documents, and others things. To enhance representation, we can use various technologies with it, such as cascading style sheets and JavaScript to design web pages and documents fully designed and stylish. Web pages and documents designed with HTML can be easily displayed on web browsers like Chrome and various others. The structure of the web pages is defined by HTML, and all the semantics of the web pages are also defined by HTML. • Cascading Style Sheets (CSS): Cascading style sheets are used to display HTML elements on the screen. CSS has the power to control multiple web pages at one time. CSS was developed in 1996 by the World Wide Web Consortium. “Cascading” in CSS means that we can apply different types of style sheets to one web page. We are using CSS to separate the contents of the web page including designs, formats, styles, and fonts in the demo form which we created for demonstration.

Smart Scholarship Registration Platform Using RPA Technology

105

• JavaScript: In the list of widely used programming languages JavaScript is one that is used for the back end as well as the front end. With the help of JavaScript, we can create websites that are really amazing and stylish, that can provide your clients with the best user interface experience. You can build a website with good graphic design and feel. In every area of software development, JavaScript is already presented. There are various areas in which JavaScript is used: • Validation of client inputs: If the user inputs anything, then it is really important to verify that input before saving it to the server. JavaScript is used for the verification of input by the user. • Manipulation of the HTML page: With the help of JavaScript we can easily add, remove, and update new tags or already coded tags of the HTML page. This helps to change previously coded HTML on the page and change its feel and look on the basis of the requirements of the devices • Notification with JS: We can use JavaScript if we are required to notify users on the web page, then we can generate different types of pop-ups on the screen. • Back-end data loading: In order to load back-end data while performing other tasks, JS offers an Ajax library which helps us to do this. Your website visitors will have an amazing experience with this amazing library. • Application of server: Node JS is designed to build much faster and more scalable applications. This event-driven library helps with the creation of extremely complex server applications such as web servers [34].

9.7 METHODOLOGY 9.7.1 Implications in Existing System The scholarship registration process is traditionally handled manually, resulting in a lot of effort and problems. Manual data processing and data entry is also a problem in today’s scenario because when we speak of any type of registration, it is our responsibility to ensure that our data are entered correctly in the application form. Due to many challenges, it can be difficult for students to successfully register for scholarships [26]. 1. In scholarship registration, there is a deadline for the application form to be filled by the applicant. But with their busy routines, college students face a challenge to complete their application forms. Thus, some students fail to fill out their application forms on time. 2. The server site may be a big challenge for the students, because many students who complete their application form after some time face server problems on the scholarship portal. 3. It is important to discuss that in the manual tradition of scholarship registration it is not necessary that every student has their own laptop

106

Multimedia Data Processing and Computing

or desktop system. But with online registration, students who do not have their own system need to pay a service for filling out their scholarship form. 4. As we all know, scholarship form registration is a lengthy process due to the different pages of entries that are required for the completion of the form. This takes longer when filling out the application form manually. 5. In the traditional way, after registration, when students submit their final print to the administration, it is a big job to handle a large number of document sets. And administration has the responsibility to ensure that no documents are lost. But by using an automated smart scholarship platform, we can overcome all the challenges of the traditional scholarship system [27].

9.7.2  Proposed System Using robotic process automation, it can be demonstrated that a smart scholarship registration platform can be efficiently managed with minimum effort and without errors. It not only automates repeatable tasks or processes but handles large amounts of data easily. The workflow designed for the proposed system describes the flow of the process of all the tasks performed by the RPA bot in order to give efficient and correct output in less time [28] as shown in Figure 9.3.

FIGURE 9.3  Flowchart of proposed system.

Smart Scholarship Registration Platform Using RPA Technology

107

9.7.3 Data Collection and Source File Creation In the first phase – data collection and source file creation – it is the responsibility of the administration of the organization first to check all the required details asked by the scholarship application form and then to creates a Google form that will ask for all required information from students. Send this Google form to all the students as possible. After the students fill out the Google form, close that Google form and then create a spreadsheet of the responses provided by the students. This spreadsheet of responses is the source file, which is called scholarship data of students. This spreadsheet is an unsorted file to provide more clarification. Sort this spreadsheet by any method. (e.g., by class, by section, and roll numberwise, etc.) This sorted spreadsheet is the final source file in the correct format. This phase consists of all the above processes and is termed “data collection and source file creation [29].

9.7.4 Creation and Structure of the Task Bot After the source file is created, in this phase we will work on the creation of a task bot. This phase can be considered the main phase of the complete proposed system because the task bot is the main entity that will automate this task and process the results. The creation of the task bot consists of three sub-phases which are the following: • Device registration • Setting of user credentials • Structure the RPA bot

9.8 IMPLEMENTATION In order to create the task bot, first we need access to the Automation 360 Community Edition. We have used RPA as a technology and the “Automation 360 Community Edition.” Access the Automation Anywhere Community Edition and register. If you have already registered, connect with the control room of Community Edition by logging into the control room [30]. To register your device, following these steps: • In the Navigation pane, click Devices under the Manage section. • Click the Connect local device icon from the action menu. • Once the Automation Anywhere bot agent has been downloaded click Connect to my computer, and proceed to install it. • To install the bot agent, adhere to the wizard’s instructions. Note: You are prompted for proxy server authentication information if your device is managed by a proxy server that authenticates users. The device cannot communicate with the control room without these credentials. Register the device using a browser,

108

Multimedia Data Processing and Computing

FIGURE 9.4  Access the Automation Anywhere Community Edition.

usually a Chrome browser, with the Automation Anywhere extension enabled in order to enable the authenticated proxy [31] as shown in Figure 9.4. i. Enable the Chrome extension. ii. Click Done.

9.8.1 Setting User Credentials To set user credentials in the Automation Anywhere Community Edition, the following steps must be followed (Figure 9.5): • Click the Username in the Navigation pane. • Click My settings in the flyout pane. • Navigate to Devices and enter the Device username in the field. Note: If your username is part of a domain, include the domain within the format domain\username. Note: whoami is the command of the command prompt that provides the device username. It provides network-dependent credentials [32]. Great! Your device has now registered successfully and the login information required for the execution of the bot is already provided to you.

9.8.2 Designing the Task Bot The process of structuring the bot in the Automation Anywhere Community Edition platform has the following steps (Figure 9.6): • In the Explore page, click the Create a bot link. • Enter a name for the bot and then click Create & edit.

Smart Scholarship Registration Platform Using RPA Technology

109

FIGURE 9.5  User credentials.

9.8.3  Running the Task Bot After complete implementation of the task bot, we will run our task bot (Figure 9.7): • Ensure that there are no indications of incomplete actions. • In the Bot Editor toolbar, Click Run. When you click Run the bot will start and do the task given (Figure 9.8).

FIGURE 9.6  Create a bot.

110

Multimedia Data Processing and Computing

FIGURE 9.7  Bot editor tool bar.

9.8.4 Implementing the Task Bot Now we are ready to implement our task bot including all the required steps: • From the Actions palette, search for the Step action and drag and drop the Step action to the canvas. • In the Action details pane, enter an appropriate title for the step. • Click Save. • Repeat steps I to III to add the remaining steps that the bot must perform.

9.8.5 Opening the CSV File • Open the CSV file (Figure 9.9) by using the open action of the Excel advanced package of the action palette and set all the parameters of this related to this action.

FIGURE 9.8  Bot processing status.

Smart Scholarship Registration Platform Using RPA Technology

111

FIGURE 9.9  CSV file.

• Position the cursor in the required cell to update the status of the registration by using the “Go to cell option” of the excel advanced package of the action palette and select the specific cell name in the cell option in the properties section.

9.8.6 Launching the Scholarship Form • Select the open action of the browser package in the action palette to open the scholarship web form in the browser. Select the name of the browser in which you want to open the scholarship web form in the Browser drop-down list. • Enter the link of the scholarship web form or scholarship web portal in the link to open the field (Figure 9.10).

FIGURE 9.10  Student scholarship web form.

112

Multimedia Data Processing and Computing

9.8.7  Populating the Web Form • Extract the information from the CSV file. • We assign each row of the CSV file to a variable by looping. • Input the information from the CSV file into each field of the CRM web form. • After registering each Tuple update, its status in the designated cell into the CSV file.

9.8.8 Sending the E-mail After completing this registration process for all the records of the source file, it is time to send a verification email to all the students saying that their scholarship registration has been successfully done; the message will also contain the registration ID generated after a successful registration [33, 34]. After completion of the task, save the source file and then close this source file by using the Close action of the excel advanced package of the action palette.

9.9  RESULTS ANALYSIS We are improving various factors like cost, time, and effort by providing this automated solution for the administration problem. The difference between this technological solution and the old system of scholarship registration can be easily compared. Table 9.3 provides a comparison that demonstrates the clear differences and benefits between the smart system and the traditional system [35, 36, 37].

TABLE 9.3 Results Analysis of Comparison between Traditional System and Proposed System S. No. 1.

Factors TIME

Traditional System In the traditional system, the form-filling process takes a very long time and requires much effort.

2.

COST

3.

MANUAL EFFORTS

In the traditional system, the shopkeeper charged money to students for form filling, which was very costly. There are many manual tasks in the traditional system of registration. Organizations select a person for the scholarship registration process of students and pay a salary.

Proposed Smart System In this proposed system this problem has been resolved because the bot fills the application form within minutes instead of days. In this smart system, there is no charge for the form-filling process, and the student is free from any type of headache. There is no requirement for manual tasks in the smart registration system. This proposed system is free from human intervention. And it saves money also. (Continued)

Smart Scholarship Registration Platform Using RPA Technology

113

TABLE 9.3 (Continued) Results Analysis of Comparison between Traditional System and Proposed System S. No. 4.

Factors CORRECTNESS OF DATA

5.

PAPER WORK

Traditional System In the manual form-filling process, there is a big possibility of errors; mistakes can be done by a student or any data entry operator who fills the registration form. In the traditional system, organizations have a lot of paperwork to securely collect all the documents of each student. There are lots of bundles of student documents and it is a challenge to safely manage all these documents. There is a possibility of loss of a document.

Proposed Smart System In this smart registration platform, there is no chance of mistakes because software robots never make mistakes like humans. In the smart system, there is no paperwork for the organization or students. And no probability of loss of any document.

9.10 CONCLUSION In this research chapter, a new solution technique is introduced and implemented that could help the administration of institutions/colleges/universities. Basically, the objective of this research work was to solve a big real-life scholarship form registration problem that the administration of every institution/college/university faces. In this chapter, a solution for the scholarship problem is implemented by using robotic process automation and developing a smart scholarship registration system with the help of a software robot or bot which automates this process of form registration. In this system, there is no human involvement and it works very efficiently without errors. It is well known that humans must leave work after a certain amount of time for rest, but robots work 24/7 continuously without taking any rest. For this reason, this system has more benefits than the traditional system described in detail in this paper. To implement this technique, we have used RPA as a technology and the “Automation 360 Community Edition,” which is a cloud-based RPA tool by Automation Anywhere. This tool provides various benefits like AI bots, scalable bots, and many other automation techniques which encourage you to practice automation on the various processes with less effort and cost. Our technological solution represents a significant improvement over today’s system of scholarship registration and provides a smart platform with automation to the management of institutions/colleges/universities.

REFERENCES 1. Greca, S., Zymeri, D., & Kosta, A. (2022). “The implementation of RPA in case of scholarship in University of Tirana during the Covid time”, Twenty-First International Conference on: “Social and Natural Sciences – Global Challenge 2022” (ICSNS XXI2022), 18 April 2022.

114

Multimedia Data Processing and Computing

2. Sutherland, C. (2013). Framing a Constitution for Robotistan. Hfs Research, ottobre. 3. Shajahan A., Supekar N.T., Chapla, D., Heiss, C., Moremen, K.W., & Azadi, P. (2020) Simplifying glycan profiling through a high-throughput micropermethylation strategy. SLAS TECHNOLOGY: Translating Life Sciences Innovation, 25(4), 367–379. 4. Hartley, J. L., & Sawaya, W. J. (2019). Tortoise, not the hare: Digital transformation of supply chain business processes. Business Horizons, 62(6), 707–715. 5. Saini, D. (September 2019). A review of Robotics Process Automation – An Emerging Technology. Think India Journal, 22(17), 4091–4099. 6. Choi, D., R’bigui, H., & Cho, C. (2021). Candidate digital tasks selection methodology for automation with robotic process automation. Sustainability, 13(16), 8980. 7. Harmoko, H. (2021). The five dimensions of digital technology assessment with the focus on robotic process automation (RPA). TehnickiGlasnik, 15(2), 267–274. 8. Nandwani, T., Sharma, M., & Verma, T. (2021) “Robotic process automation – automation of data entry for student information in university portal”, Proceedings of the International Conference on Innovative Computing & Communication (ICICC) 2021, 28 April 2021. 9. Bhanushali, R., & Patil, S. (2020). Automating student management system using chatbot and RPA technology”, SSRN Electronic Journal. DOI: 10.2139/ssrn.3565321 10. Uskenbayeva, R., Kalpeyeva, Z., Satybaldiyeva, R., Moldagulova, A., & Kassymova, A. (2019). “Applying of RPA in administrative processes of public administration.” In 2019 IEEE 21st Conference on Business Informatics (CBI), vol. 2, pp. 9–12. IEEE. 11. Issac, R., Muni, R., & Desai, K. (2018) “Delineated analysis of robotic process automation tools.” In 2018 Second International Conference on Advances in Electronics, Computers and Communications (ICAECC), pp. 1–5. IEEE. 12. Fortune Business Insights (2021). RPA Market Research Report. Fortune Business Insights. 13. Marciniak, P., & Stanislawski, R. (2021). Internal determinants in the field of RPA technology implementation on the example of selected companies in the context of Industry 4.0 assumptions. Information, 12(6), 222. 14. Wewerka, J., Dax, S., & Reichert, M. (2020) “A user acceptance model for robotic process automation”. 2020 IEEE 24th International Enterprise Distributed Object Computing Conference (EDOC), 97–106. 15. Raza, H., Baptista, J., & Constantinides, P. (2019). Conceptualizing the Role of IS Security Compliance in Projects of Digital Transformation: Tensions and Shifts Between Prevention and Response Modes. ICIS. 16. Viale, L., & Zouari, D. (2020). Impact of digitalization on procurement: The case of robotic process automation. Supply Chain Forum: An International Journal, 21(3), 185–195. 17. Kokina, J., Gilleran, R., Blanchette, S., & Stoddard, D. (2021). Accountant as digital innovator: Roles and competencies in the age of automation. Accounting Horizons, 35(1), 153–184. 18. Vokoun, M., & Zelenka, M. (2021). Information and communication technology capabilities and business performance: The case of differences in the Czech financial sector and lessons from robotic process automation between 2015 and 2020. Review of Innovation and Competitiveness: A Journal of Economic and Social Research, 7(1), 99–116. 19. Tiwari, A., Sharma, N., Kaushik, I., & Tiwari, R. (2019) “Privacy Issues & Security Techniques in Big Data”, 2019 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS). DOI: 10.1109/icccis48478.2019.8974511 20. Saukkonen, J., Kreus, P., Obermayer, N., Ruiz, Ó. R., & Haaranen, M. (2019, October). AI, RPA, ML and other emerging technologies: anticipating adoption in the HRM field. ECIAIR 2019 European Conference on the Impact of Artificial Intelligence and Robotics, 287.

Smart Scholarship Registration Platform Using RPA Technology

115

21. Gotthardt, M., Koivulaakso, D., Paksoy, O., Saramo, C., Martikainen, M., & Lehner, O. (2020). Current state and challenges in the implementation of smart robotic process automation in accounting and auditing. ACRN Journal of Finance and Risk Perspective, 1(2), 90–102. 22. Kokina, J., & Blanchette, S. (2019). Early evidence of digital labor in accounting: Innovation with robotic process automation. International Journal of Accounting Information Systems, 35, 100431. 23. Hegde, S., Gopalakrishnan, S., & Wade, M. (2018). Robotics in securities operations. Journal of Securities Operations & Custody, 10(1), 29–37. 24. Lacity, M., Willcocks, L. P., & Craig, A. (2015). Robotic process automation at Telefonica O2. MIS Quarterly Executive, 15(1), 4. 25. Flechsig, C., Anslinger, F., & Lasch, R. (2022). Robotic Process Automation in purchasing and supply management: A multiple case study on potentials, barriers, and implementation. Journal of Purchasing and Supply Management, 28(1), 100718. 26. Fernandez, D., & Aman. (2021). Planning for a successful robotic process automation (RPA) project: A case study. Journal of Information & Knowledge Management, 11, 103–117. 27. Willcocks, L., Lacity, M., & Craig, A. (2017). Robotic process automation: Strategic transformation lever for global business services? Journal of Information Technology Teaching Cases, 7(1), 17–28. 28. Siderska, J. (2020). Robotic process automation – A driver of digital transformation? Engineering Management in Production and Services, 12(2), 21–31. 29. Januszewski, A., Kujawski, J., & Buchalska-Sugajska, N. (2021). Benefits of and obstacles to RPA implementation in accounting firms. Procedia Computer Science, 192, 4672–4680. 30. Rujis, A., & Kumar, A. “Ambient Intelligence In day to day life: A survey”, 2019 2nd International Conference on Intelligent Computing, Instrumentation and Control Technologies (ICICICT), Kannur, Kerala, India, 2019, pp. 1695–1699. DOI: 10.1109/ICICICT46008.2019.8993193. Electronic copy available at: https://ssrn.com/abstract=3834281 31. Jadon, S., Choudhary, A., Saini, H., Dua, U., Sharma, N., & Kaushik, I. (2020). Comfy smart home using IoT. SSRN Electronic Journal. DOI: 10.2139/ssrn.3565908 32. Kalra, G. S., Kathuria, R. S., & Kumar, A. “YouTube video classification based on title and description text”, 2019 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS), Greater Noida, India, 2019, pp. 74–79. DOI: 10.1109/ ICCCIS48478.2019.8974514. 33. Hitanshu, Kalia, P., Garg, A., & Kumar, A. “Fruit quality evaluation using Machine Learning: A review”, 2019 2nd International Conference on Intelligent Computing, Instrumentation and Control Technologies (ICICICT), Kannur, Kerala, India, 2019, pp. 952–956. DOI: 10.1109/ICICICT46008.2019.8993240. 34. Chauhan, U., Kumar, V., Chauhan, V., Tiwari, S., & Kumar, A. “Cardiac Arrest Prediction using Machine Learning Algorithms,” 2019 2nd International Conference on Intelligent Computing, Instrumentation and Control Technologies (ICICICT), Kannur, Kerala, India, 2019, pp. 886–890. DOI: 10.1109/ICICICT46008.2019.8993296. 35. Reddy, K.P., Harichandana, U., Alekhya, T., & Rajesh, S.M. (2019). A study of robotic process automation among artificial intelligence. International Journal of Scientific and Research Publications, 9(2), 392–397. 36. Burgess, Anderw. “Robotic Process Automation & Artificial Intelligence”, handbook posted on 7th June, 2017. 37. Jovanovic, S.Z., Ðuric, J.S., & Šibalija, T.V. (2018). Robotic process automation: Overview and opportunities. International Journal of Advanced Quality, 46, 3–4.

10

Data Processing Methodologies and a Serverless Approach to Solar Data Analytics Parul Dubey, Ashish V Mahalle, Ritesh V Deshmukh, and Rupali S. Sawant

10.1 INTRODUCTION The expansion of the data market has caused a similar expansion in the market for analytics services. For such data processing and decision making, artificial intelligence (AI) is playing a critical role. Machine learning is a subfield of artificial intelligence that enables computers to “learn” and “improve” themselves even when they are not provided with any explicit instructions or guidance. Integration of AI into the area of solar energy analytics may be of assistance in the process of creating predictions about the use of solar energy. Solar energy reaches the Earth at a rate roughly equal to the world’s use of fossil fuels per month. As a result, solar energy’s worldwide potential is several times greater than the world’s current energy consumption. Technology and economic challenges must be overcome before widespread solar energy use is possible. How we solve scientific and technological difficulties, marketing and financial issues, as well as political and legislative concerns, such as renewable energy tariffs, will determine the future of solar power deployments. It is estimated that the atmosphere of the planet reflects about one-third of the Sun’s radiant radiation back into space. After the Earth and its atmosphere have absorbed the remaining 70%, we will have 120,000 terawatts of energy available. The ability of the Sun’s beams to reach the Earth is getting more difficult. Some of the radiation is instantly absorbed by the Earth’s atmosphere, seas, and soil. Fear of the future is a top worry for many. Water evaporates, circulates, and precipitates due to the presence of a third element. Green plants, on the other hand, need only a small portion of the total for photosynthesis. The fundamentals of AI and data processing methods will be covered in this chapter. For solar energy analytics, we will also conduct a comprehensive literature research on solar thermal energy. The next step will be to include the algorithms’ functionality into a cloud platform for serverless solar energy analytics. An infrastructure proposal for the solution’s incorporation into Amazon Web Services (AWS) is anticipated. 116

DOI: 10.1201/9781003391272-10

Data Processing Methodologies and a Serverless Approach

117

10.1.1  Solar Thermal Energy Solar thermal energy is not the same thing as photovoltaic energy. Solar thermal energy is a one-of-a-kind source of energy. In solar thermal energy production, sunlight is condensed into heat, which is used to power a heat engine, which in turn turns a generator to produce electricity. It is possible to heat a working fluid that is either liquid or gaseous. Water, oil, salts, air, nitrogen, and even helium are examples of working fluids, as are a variety of other substances. Among the many kinds of engines are steam engines, gas turbines, and Stirling engines, to name a few. These engines, which have an average efficiency of 30% to 40%, have the potential to produce anywhere from tens to hundreds of megawatts. When it comes to photovoltaic energy conversion, direct conversion of sunlight into electricity is the most common kind. Solar panels can be used only during the daytime due to their poor energy storage. Solar thermal energy storage is a straightforward and cost-effective method of storing heat for use in large-scale energy generation. Heat that has been saved during the day is used to generate electricity in dark. Storage-capable solar thermal systems have the potential to enhance the financial ability of solar energy, according to the International Energy Agency.

10.2  LITERATURE REVIEW When it comes to water in dry and semi-arid environments, groundwater is the most essential source. The availability of these resources continues to be a significant concern. To solve this problem, a new solar desalination technology is being researched; an active and passive single-slope solar still with thermal energy storage was numerically developed and modelled. A parabolic trough collector serves as an extra heat source in the active system, which also utilizes passive solar heating (PTC). Passive and active desalination systems were used to evaluate the performance of flat absorbers with corrugated absorbers (rectangle shaped, triangle shaped, and sphere shaped). COMSOL Multiphysics was used to solve the two models’ conservation equations [1]. Compared to a conventional distiller and an alternative distiller with a platform absorber and storage system, the corrugated surface with rectangular ripples increased pure water output by 109% and 42 %, respectively, according to the simulation results of the corrugated surface with rectangular ripples. In contrast, the passive still with rectangular and spherical absorbers produced just 2.37 and 2.11 kg/m2/day of freshwater, respectively, whereas the solar still with PTC and variable absorber shape generated a total of 2.37 and 2.11 kg/m2/day of freshwater. They developed a modified passive solar still with a rectangular absorber structure with the assistance of PTC for industrial applications such as health facilities and farm conservatories in dry and semi-arid regions. New solar designs developed by Mousa and Taylor in 2017 [2] sought to improve energy production per square foot. To enhance energy efficiency, various researchers offered innovative styles that improve the system output heating, can be conveniently mounted on factory roofs, and combine both technologies into a unique mixed system. The research explored several designs for a moderate industrial application

118

Multimedia Data Processing and Computing

using the simulated program TRNSYS. A PVT collector may obtain a solar contribution of 36.8% when there is no cooling fluid running through the integrated channel underneath the photovoltaic (PV) panel, while a typical PV panel can get a solar contribution of 38.1%. Many bandpass filter bandwidths were evaluated against PVT collection beam split. PVT beam-split collectors, like traditional solar thermal collectors, may capture up to 49% of sunlight. The experiment also compared solar thermal and photovoltaic collectors with varying rooftop percentages. Individual systems may achieve a solar fraction 6% lower than a beam-split collector. China’s concentrated solar power (CSP) sector is gaining momentum. In order to advance the energy revolution and reduce emissions, it must be further developed [3]. The study examined the CSP industry in China. The market, policy and regulatory changes, and new technology all influence the status quo. The difficulties and underlying cause of China’s CSP sector are thoroughly investigated by examining policy, market, and power generation technologies at three levels. Last but not least, practical solutions are provided to encourage the growth of the CSP sector and conversion of the energy infrastructure. Temperatures between 40°C and 260°C are required for most of the solar thermal collector systems. Low to medium high temperature solar collectors are described in detail, as are the characteristics of these collectors, including the relationship between efficiency and operating temperature. The research used a variety of thermal collectors, ranging from a fixed flat plate to a mobile parabolic trough [4]. Various solar technologies were mapped to different climatic zones and heat needs on the basis of a theoretical solar system analysis. Three criteria were employed in the study. Different places solar potential and STC efficiency may be compared. Viability is dependent on these characteristics. In order to improve the operating temperature range adaptability and reliability of solar heat storage systems, a novel solid-gas thermochemical multilayer sorption thermal battery is being developed [5]. Solid-gas thermochemical multilayer sorption systems can store solar thermal energy in the form of sorption potential at different temperature levels. A thermochemical multilayer sorption thermal battery’s operating principle and performance are studied. According to thermodynamic research, cascaded thermal energy storage technology can manage solar heat storage at low and high isolation. It has a better energy density and a broader range of solar collecting temperature than other heat storage technologies. In big industrial processes, it may be utilized for energy management and waste-heat recovery. Sun distillation may be able to solve water and energy shortages in the African region by using solar energy (thermal process). A cylindrical reflector collector transfers heat to a heat exchanger, which is composed of sand and is put into the solar still to increase the efficiency of water production [6]. It’s a useful and practical method. A rise in sand and saltwater temperatures is directly linked to an increase in solar collector flux. The solar collector is the major element of solar thermal systems, since it is capable of converting radiation from the Sun into thermal energy. The solar collector is also the most expensive. In addition to operating temperature (low, moderate, and high), solar collectors may be categorized according to the working fluid used in their operation (gas or liquid). The working fluid in direct absorption collectors

Data Processing Methodologies and a Serverless Approach

119

allows solar energy to be absorbed directly by the collector, as opposed to collectors that employ a surface absorber or that indirectly absorb solar radiation [7]. Solar common fluids are used in a direct absorption solar collector (DASC) to suspend metal, metal oxide, or carbon nanomaterial suspensions. In a study [8], a low-cost solar collector was used to pre-heat ventilation air in commercial broiler plants. Six 36m2 black fabric-based solar collectors were installed on the rooftops of a commercial broiler house. The solar collectors delivered fresh air during the ON cycle of reduced ventilation. The inability of the solar collectors to work at full capacity due to the current minimal ventilation design further reduces heating fuel savings. Recent research indicates that renewable energy is assuming an increasing role in power generation as a means of reducing fossil fuel CO2 emissions [9]. Concentrated solar power (CSP) may be used to gather solar energy from the Sun. The solar power tower (SPT) is one of the most successful CSP topologies for gathering direct sunbeams because it utilizes hundreds of heliostats to reflect sunlight onto a central collector. Control solutions to enhance tracking and SPT efficiency have been suggested in a slew of published studies. The work at the component level, on the other hand, has been scarce. This work goes into great detail on the various SPT drives. The research and development gaps for SPT drives have been broken down into more than a hundred individual research and development projects. The driving mechanisms have been categorized and thoroughly investigated, with attention given to both the power source and the mechanical transmission systems in the process. The multiple electrical motors and electrical and mechanical converters used in heliostat units are also subjected to in-depth examination, evaluation, and comparison. The advantages and disadvantages of various electrical driving technologies are evaluated and contrasted. The tracking of azimuth and elevation is selected, explained, and shown using a dual-axis two linear actuator system. When it comes to achieving net-zero energy consumption, efficient solar energy use is a critical topic in the field of energy sustainability [10]. The most widely used solar cell technologies are capable of operating only on a single layer of a two-dimensional surface, which is why they are so widely employed. A multilayer system has been created that can collect solar light in a cuboid using transparent photothermal thin films of iron oxide and a porphyrin compound to enhance the effectiveness of solar light harvesting. It is similar to the way multilayer capacitors operate in that they enable white light to penetrate and gather photon energy in three-dimensional space while also creating adequate heat on each layer with a greater overall surface area. A high-efficiency solar collector, energy converters, and a generator are used to create thermal energy via the usage of a multilayer photothermal system. This system provides a high density of thermal energy production due to the use of a multilayer photothermal system.

10.3  DATA PROCESSING METHODOLOGIES Throughout the past decade, a flurry of new tools has emerged due to time and technical improvements. Systems and processes benefit from the use of these instruments. Listed below is a more in-depth explanation of some of the areas and technologies that may be used to enhance performance. Figure 10.1 shows the popular new tools for system designing. It shows the relationship between

120

Multimedia Data Processing and Computing

FIGURE 10.1  Relation between AI, ML, DL.

AI, ML, and DL. AI is the superset that includes ML as a subset, whereas DL is the subset of ML.

10.3.1 Artificial Intelligence The topic of artificial intelligence (AI) is divided into numerous subfields that are concerned with the ability of computers to create logical behavior in response to external inputs. The ultimate objective of AI research is to create self-sufficient machines capable of doing jobs that previously required human intellect. The goods and services we use on a regular basis may be affected by artificial intelligence in a variety of ways. Expert system development is a primary focus, since expert systems are the kind of programs that may imitate human intelligence by actions like mimicking speech and learning from user input. The purpose of trying to simulate human intelligence in machines is to build computer programs with cognitive abilities that are on par with human beings [11]. The AI types shown in Figure 10.2 are diverse. They may be divided into four categories: reactive, restricted memory, theory of mind, and self-awareness. • Reactive machines: Reactive machines are those that take into consideration the present condition of the environment and operate accordingly. The machines are assigned particular tasks and are only capable of

FIGURE 10.2  AI and its types.

Data Processing Methodologies and a Serverless Approach

121

comprehending the job at hand at any one moment. The behaviors of the machines are predictable when faced with a comparable situation. • Limited memory: Machines with limited memory may make better decisions based on the information available at the time. The machines analyze observational data in the context of a pre-established conceptual framework that they have developed. The information acquired from the inspections is only maintained for a limited time before being permanently wiped from the system. • Theory of mind: Mental arithmetic in order for robots to participate in social relationships, they must be able to reason and make decisions based on emotional context. Even though the robots are still in development, some of them already display qualities that mirror those of humans. They can gain a basic comprehension of essential speech instructions using voice assistant software, but they can’t carry on a conversation. • Self-awareness: Self-aware robots exhibit traits such as ideation, desire formulation, and internal state awareness, among others. Developed by Alan Turing in 1950, the Turing Test is a method of identifying computers that may behave in a human-like manner. The area of artificial intelligence has risen in popularity and relevance in recent years, thanks to advancements achieved in machine learning over the past two decades. Artificial intelligence has been more popular during the 21st century. In this way, machine learning is possible to design systems that are self-improving and always improving.

10.3.2 Machine Learning Algorithm-based procedures, in which no margin for mistake exists, are those performed by computers. Computers may make decisions based on current sample data rather than written instructions that have a result that is dependent on the input data they receive. The same as people, computers are capable of making mistakes while making judgments in certain instances. As a consequence, machine learning is said to be the process of enabling computer(s) to learn in the same way that a human brain does via the use of data and previous experiences. The basic objective of ML is to develop prototypes that are capable of self-improvement, pattern recognition, and the identification of solutions to new problems based on the past data they have collected [12]. 10.3.2.1  Learning Methods Machine learning may be broken down into four categories as follows: • • • •

Supervised learning Unsupervised learning Semi-supervised learning Reinforced learning

122

Multimedia Data Processing and Computing

FIGURE 10.3  Categories of ML.

Figure 10.3 shows the different categories of ML. There are three layers: the top layer reveals the system’s inputs; the middle layer explains how the system learns; and the bottom layer shows what the system produces. 10.3.2.1.1  Supervised Learning For the supervised learning family of models, there must be a dataset that contains some observations and their labels/classes. Images of animals, for instance, might serve as observations, with the labels indicating the animal’s name as the label. These models initially learn from the labelled datasets before being able to predict future events. As a consequence of this procedure, the learning algorithm develops an inferred function, which is then applied to new, unlabeled data that are fed into the model to anticipate the outcome of the new observations. Upon completion of sufficient training, the model is capable of providing goals for any incoming data. By comparing its output to the correct intended output (the ground truth label), the learning algorithm may also discover errors and alter its output as needed to get the desired result (e.g., via backpropagation). Modeling techniques that have been supervised may be classified into two categories: regression modeling and classification modeling. • Regression models: Predicting or drawing conclusions about the other aspects of the data based on the information that is now accessible. • Classification models: The process of dividing data into categories based on the features of the dataset that was used to make the determination of the categories. 10.3.2.1.2  Unsupervised Learning It is important to have a dataset containing some observations in order to use this family of models, but it is not necessary to know the labels or classes of the observations in advance. For systems to infer a function from unlabeled data in

Data Processing Methodologies and a Serverless Approach

123

order to explain a hidden structure, it is important to label the data prior to processing the information. A data exploration approach and inferences from datasets may be used to describe hidden structures in unlabeled data, and the system may apply data exploration and inferences from datasets in order to characterize hidden structures in unlabeled data. A subcategory of unsupervised learning is clustering, which includes association models, which are both examples of unsupervised learning. • Clustering: This kind of challenge arises when one attempts to find the underlying categories in data, such as when classifying animals according to the number of legs on each leg. • Association: People who purchase X also buy Y, and so on. This is known as association rule learning. PCA, K-means, DBSCAN, mixture models, and other models that fall into this category are examples of what this family has to offer. 10.3.2.1.3  Semi-supervised Machine Learning This topic is conveniently located in the heart of the supervised and unsupervised learning groups. Semi-supervised models are trained using both labeled and unlabeled data, which is utilized in the training process. When the quantity of labeled data is less than the quantity of unlabeled data, both supervised and unsupervised learning are insufficient for effective learning. In many instances, data that have not been adequately categorized are utilized to infer conclusions about the individuals in question. This strategy is referred to as “semi-supervised learning” in generic terms. Semi-supervised learning differs from traditional supervised learning in that it uses a labeled dataset rather than a random dataset. As is the case with supervised learning, the labeled data contain more information than the data that must be predicted. Compared to projected datasets, semisupervised learning makes use of smaller labeled datasets than the projected datasets. 10.3.2.1.4  Reinforced Learning In order to reward or penalize the users, these algorithms employ estimated errors as incentives or punishments. Punishment will be harsh, and the prize will be little, if the error is serious enough. As long as the error is little, punishment will be light and the prize will be large. • Trial-and-error search and delayed reward: The hunt for mistakes and the delay in rewarding the learner are two of the most crucial consequences of reinforcement learning. This group of prototypes can automatically calculate the appropriate behavior to achieve the desired performance when used in combination with a certain environment. • The reinforcement signal: To learn which behavior is the most effective, the model requires positive reinforcement input, often known as “the reinforcement signal.” Models like the Q-learning model are examples of this kind of learning model.

124

Multimedia Data Processing and Computing

10.3.2.2  Frequently Used Algorithms for Machine Learning 10.3.2.2.1  Linear Regression Estimating real values (such as property prices, phone calls, and sales) from a set of continuous variables is one of its many applications. This stage entails fitting the best line feasible to determine the relationship between the independent and dependent variables. The best-fit line, known as the regression line, is represented by the linear equation.

Y = a*X+ b

In this equation, Y is a dependent variable, X is an independent variable, a represents the slope, and b represents the intercept of the line. In order to compute a and b, the sum of squared distances between data points and the regression line must be kept as low as possible. It is shown in the graph below where X and Y can be any two parameters and random values were chosen to represent the training data (Figure 10.4). 10.3.2.2.2  Logistic Regression It is not a regression approach but rather a categorization algorithm that has been developed. A discrete value such as 0 or 1 is estimated using a set of independent factors that have been submitted to the user(s). Instead of predicting the probability of an event happening, it predicts the likelihood of occurrence by fitting data to a logit function. As a consequence, it is referred to as logit regression in many instances. Its

FIGURE 10.4  Linear regression graph.

Data Processing Methodologies and a Serverless Approach

125

FIGURE 10.5  Decision tree.

output values are in the range of 0 and 1, which is acceptable given that it anticipates the possibility of something happening (as expected). 10.3.2.2.3  Decision Tree Problems involving classification are the most often encountered application for this specific kind of supervised learning technology. Despite its simplicity, the strategy is effective with both categorical data-dependent variables, which is unexpected considering how simple it is. We divide the population into two or more homogenous groups, depending on the features of each group, using the approach we just described. When constructing these categories, a variety of criteria are taken into account. Figure 10.5 shows the graphical representation of a decision tree. 10.3.2.2.4  Support Vector Machine A support vector machine (SVM) is a method of categorizing things. As a result of using this method, it is possible to represent each data item as one of n-dimensional points in an n-dimensional space, where each feature’s value corresponds to one of the points’ positions. If a data item includes more than one feature, one may use this technique to represent each data item as an n-dimensional position in an n-dimensional space, which is useful when there are several features. The following is one method of doing this: plotting a person’s height and hair length in a two-dimensional space, where each point is represented by a pair of two coordinates (called support vectors). Figure 10.6 shows the support vector graph for the case study of height and hair length. 10.3.2.2.5  k-Nearest Neighbor k-nearest neighbor (kNN) may be used to tackle problems requiring classification and regression, among other things. In the industry, it is more often used in classification problems that are more sophisticated in their nature. When it comes to algorithms, k-nearest neighbors is a basic one that stores all of the instances that

126

Multimedia Data Processing and Computing

FIGURE 10.6  Support vector graph.

are currently available and categorizes new cases based on the majority vote of their k neighbors. In classification, the case given to a class is the one that occurs most often among its k-nearest neighbors, as determined by a distance function, and is therefore the most common instance in the class. The Figure 10.7 shown below is the example how to generate the kNN list. Yellow dots represent class A and purple shows B. Then within the first circle k=3 which changes to k=6 in the second trial. 10.3.2.2.6 k-Means k-means is a kind of unsupervised strategy that is used to deal with the clustering issue in general. Its approach classifies a given data set into a specific number of clusters (say, k clusters) in a straightforward and easy way (considering k clusters). Data points within a cluster are homogenous and diversified in comparison to data points among peer groups.

FIGURE 10.7  k-nearest neighbors.

Data Processing Methodologies and a Serverless Approach

127

FIGURE 10.8  k-means clustering.

Steps to construct K-means cluster a. k-means finds k points for each cluster, which are referred to as centroids, and allocates them to each cluster using a random number generator. b. k clusters are formed by the nearest centroids for each data point. c. Determines the centroid of each cluster based on the number of individuals in the cluster. Steps b and c must be repeated when new centroids are discovered. Each data point should be linked with a new k-cluster based on its proximity to new centroids. In order to achieve convergence, keep repeating this procedure until the centroids do not change. For calculation, k-means makes use of clusters, and each cluster has a unique centrifugal point. The sum of square values for each cluster is equal to the sum of squared differences between the cluster’s centroid and all of its data points multiplied by count of data points in the cluster. Adding up the square values of all of the clusters results in a total that is within the total square value of the solution. There’s no denying that the sum of squared distance decreases as the number of clusters increases, but if we look at a graph of the findings, we’ll see that the decrease is quick until a certain value of k is reached, and then much slower after that. It is possible to find the optimum number of clusters in this section. Figure 10.8 is the diagrammatic representation of the K-means. It shows three clusters at after the clustering process.

10.3.3  Deep Learning Deep learning is the only kind of machine learning that focuses on training the computer to replicate human behavior. When a computer system learns to perform classification tasks directly on intricate input in the form of images, text, or sound, this is referred to as deep learning. This kind of algorithm can reach accuracy that is comparable to that of the state-of-the-art (SOTA), and in certain situations, it can surpass humans on overall performance. In order to train a system, a large quantity of labeled data is gathered, as well as neural network topologies

128

Multimedia Data Processing and Computing

with several layers that are used in conjunction with it. Some important features of DL are listed below. • Deep learning is a key technique in the development of technologies such as virtual assistants, face recognition, autonomous automobiles, and other similar technologies. • It is an approach that entails first training data and then benefiting from the outcomes of the training. • The name “deep learning” is used to characterize the learning approach because, with each passing minute, the neural networks rapidly learn about additional layers of data that have been added to the dataset. Every time data is trained, the focus is on enhancing the overall performance of the system. • As the depth of the data has expanded, the training efficiency and deep learning powers of this system have improved considerably as a result of its widespread acceptance among data professionals. 10.3.3.1  Deep Learning in Action To train the outputs on the basis of the inputs that are supplied, deep learning algorithms make use of both supervised and unsupervised machine learning methods. Neurons that are linked are shown by circles. The neurons are divided into three separate hierarchies of layers, which are referred to as the input layers, hidden layers, and output layers. • It is the first neuron layer, also termed the input layer, that receives the input data and transmits it to the earliest hidden layer of neurons, which is the first hidden layer of neurons. • The computations are performed on the incoming data by the hidden layers, which are not visible. Determining the number of neurons and the number of hidden layers to utilize in the building of neural networks is the most challenging aspect in the field. • Finally, the output layer is responsible for generating the output that is required. A weight is a value that indicates the relevance of the values that are supplied to a connection between neurons, and it is present in every connection between neurons. An activation function is used in order to ensure that the outputs are consistent. Table 10.1 gives the comparison of deep learning and machine learning in detail. Two essential measures are being examined for the purpose of training the network. To begin, a huge data collection must be generated; to continue, substantial computing capacity must be available. It is the quantity of hidden layers that the model is using to train on the data set that is indicated by the prefix “deep,” which is used in deep learning. A brief summary of how deep learning works may be summarized in four concluding points as shown in Figure 10.9: 1. An artificial neural network (ANN) asks a series of binary true/false questions in succession.

Data Processing Methodologies and a Serverless Approach

129

TABLE 10.1 Comparison between DL and ML Parameter Data

Deep Learning Requires large dataset

Hardware need Specificities in engineering Training period Processing time Count of algorithms Data interpretation

Requires machine with GPU Understands basic data functionality Long Few hours to weeks Few Difficult

Machine Learning Works well with small to medium dataset Can perform in low-end machines Understands features and representation of data Short Few seconds to hours Many Varies from easy to impossible

2. Obtaining numeric numbers from blocks of data is known as data extraction. 3. Sorting the information based on the responses obtained. 4. The last element to mention is the marking/labeling of the data. Some of the popular deep learning algorithms are convolution neural networks (CNNs), long short-term memory networks (LSTMs), recurrent neural networks (RNNs), generative adversarial networks (GANs), radial basis function networks (RBFNs), multilayer perceptrons (MLPs), self-organizing maps (SOMs), deep belief networks (DBNs), restricted Boltzmann machines (RBMs), and autoencoders. Table 10.1 shows the comparison between deep learning and machine learning.

10.4  SERVERLESS SOLAR DATA ANALYTICS Raw data may be generated by a wide range of smart devices and sensors, which can then be examined in greater depth by analytics software. The management and optimization of the internet of things systems need smart analytics solutions [13]. An IoT system that has been properly created and deployed allows engineers to find anomalies in the data collected by the system and take immediate action to prevent

FIGURE 10.9  Neurons and layers.

130

Multimedia Data Processing and Computing

FIGURE 10.10  Architecture for serverless analytics of solar energy intensity.

a bad outcome that could otherwise have happened. Service providers may plan for the next step if data is collected in the correct way and at the right time. Massive amounts of data from internet of things devices are being used by large organizations to get new insights and open doors to new business possibilities. Using market research and analytical methodologies, it’s possible to anticipate market trends and plan ahead for a successful rollout. As a major component in any business model, predictive analysis has the potential to greatly enhance a company’s ability to succeed in its most critical areas of business. Serverless data analytics will be completed using Amazon Web Service (AWS), a well-known cloud service provider [14]. Figure 10.10 depicts the infrastructure utilized to collect the analytics. An internet of things (IoT) device or sensor may broadcast a real-time measurement of sunlight’s intensity to the cloud. This architecture, which is serverless in nature, will take care of the remainder of the analytics.

10.4.1  Kinesis Data Streams Amazon Kinesis Data Streams is an Amazon-controlled streaming data service. Kinesis streams may accept data from thousands of sources simultaneously, including click streams and application logs [15]. Within a few seconds of the stream being formed, Kinesis applications will have access to the data and may begin processing it. There is no need for us to worry about anything other than deciding how fast we want to stream the data. Amazon Kinesis Data Streams takes care of everything else. Data streams are provisioned and deployed on the user’s behalf; ongoing maintenance and other services are also provided. For high availability and long-term data durability, Amazon Kinesis Data Streams provides data replication across three AWS regions.

Data Processing Methodologies and a Serverless Approach

131

10.4.2  Kinesis Data Firehose Amazon Kinesis Data Firehose is the most efficient means of transmitting streaming data into data repositories and analytics tools, according to the business. As a streaming data storage service, Amazon S3 may be used to gather, convert, and store transmission of information, which can then be processed in near-real time using the analytics tools and dashboards. This is a completely managed service that adapts automatically to the flow of data and truly doesn’t require more administration on the user’s side [15]. It may also batch, compress, and encrypt the data before loading it, which minimizes the quantity of storage space required at the destination while boosting security at the same time. Customers may capture and load their data into Amazon S3, Amazon Redshift, or Amazon Open Search Service utilizing Amazon Kinesis Data Firehose without having to worry about either of the underlying infrastructure, storage, networking, or configuration needs. We don’t have to bother about putting up hardware or software, or about establishing any new programs in order to keep abreast of this procedure as it is automated. Firehose is also scalable, and it does so without the requirement for any developer involvement or costs to be engaged. This results in the synchronous replication of data across three Amazon Web Services regions, providing excellent availability and durability of the data as it travels to its ultimate destinations.

10.4.3 Amazon S3 S3 is an abbreviation that stands for “simple storage.” Amazon S3 was built as an object storage system so that any quantity of data could be stored and retrieved from any location. It is one of the most cost-effective, simple, and scalable solutions on the market today. A simple web service interface allows users to save and access unlimited amounts of data at any time and from any place. Cloud-native storage-aware apps may be developed more quickly and easily with the use of this service [16]. The scalability of Amazon S3 means that one may start small and grow the application as needed, all while maintaining uncompromised performance and reliability. Additionally, Amazon S3 is meant to be very flexible. Using a basic FTP program or a complex online platform like Amazon.com, anybody can back up a little quantity of data or a large amount for disaster recovery. By relieving developers of the burden of managing data storage, Amazon S3 allows them to focus on creating new products. Users pay only for the storage space they actually utilize with Amazon S3. There is no set price. The AWS pricing calculator will help the user figure out how much they’ll spend each month. The fee may be lower fee if administrative costs are lower. Depending on the Amazon S3 area, the user may find varying prices for the same things. S3 bucket placement affects the monthly subscription fee. Requests to transport data across Amazon S3 regions are not subject to data transfer fees. Amazon S3 charges a fee for each gigabyte of data transported over the service, with the amount varying depending on which AWS region the data is being sent to. There is no data transfer cost when moving data from one AWS service to another in the same region, such as the US East (Northern Virginia) region.

132

Multimedia Data Processing and Computing

10.4.4 Amazon Athena Amazon Athena is a data processing tool that facilitates the process of evaluating data stored in Amazon S3 using conventional SQL queries. It is accessible for Windows, Mac, and Linux operating systems users. It is not essential to set up or maintain any infrastructure, as Athena is a serverless platform, and users may start analyzing data immediately after installing it. There is no need to enter the data into Athena as it works directly with data stored on Amazon S3. The user begins by logging into the Athena Management Console and defining the schema they will be utilizing. After that, they query the data contained in the database by using the Athena API [16]. It is possible to utilize Athena as a database on the Presto platform, and it contains all of the typical SQL functionality. Other important data formats supported by the program include Oracle Relational Catalog, Apache Parquet, and Avro. CSV and JSON are merely two of the formats supported by the program. Sophisticated analysis is achievable, despite the fact that Amazon Athena is designed for rapid, ad-hoc querying and that it interfaces well with Amazon QuickSight for easy visualization. This contains massive joins as well as window functions and arrays, among other things.

10.4.5  Quicksight It is straightforward for everyone in the business to acquire a better understanding of the data owing to the interactive dashboards, machine learning–powered patterns, and outlier detection in Amazon QuickSight, which is a cloud-based business intelligence (BI) platform for big enterprises. Using Amazon QuickSight, users can effortlessly share insights with their team, no matter where they are situated in the globe. With Amazon QuickSight, users may access and combine the cloud-based data in a single spot. All of this information is available in a single QuickSight dashboard, which incorporates data from AWS, spreadsheets, SaaS, and business-to-business (B2B) transactions. Amazon QuickSight delivers enterprise-grade security, global availability, and built-in redundancy. It is accessible on a subscription basis. Users may also expand from 10 to 10,000 members utilizing the user management tools offered by this system, all without the need to invest in or maintain any infrastructure.

10.4.6 F. Lambda This service makes it possible for the user to view their code without having to set up or manage servers on their own PC. During non-execution periods, the user is charged only for the amount of compute time that was used by the code. No additional administration is required when using Lambda to run existing applications and backend services. To take advantage of Lambda’s high availability, all users need to do is submit the code once and it will take care of the rest. An AWS service, a web page, or a mobile app may activate the code on demand, or the user can call it directly.

Data Processing Methodologies and a Serverless Approach

133

10.4.7 Amazon Simple Queue Service (SQS) In addition to serverless applications and distributed systems, Amazon Simple Queue Service (SQS) provides a fully managed message queuing service. Allowing developers to focus on building new and creative features, SQS frees them from managing and administering message-oriented middleware. When a user utilizes SQS, they don’t have to be worried about messages being lost or relying on other services to be available. Message queues in SQS come in two flavors. Due to the best-effort ordering and at least once delivery provided by standard queues, standard queues allow the largest possible amount of work to be completed. To guarantee that messages are handled in the order they were received, SQS FIFO technology is used in the building of queues.

10.5 CONCLUSION Trending technology may be utilized to develop new smart systems. These systems must have new and better capabilities by making use of artificial intelligence, machine learning, and deep learning. IoT may be applied on top of these strategies to further make the system even more interactive and real-time functional. This chapter explored these current tools in particular. Ideas may be adopted from these conversations and can be employed for the construction of any smarter system dreamed of. An architecture proposal for the solution’s incorporation into AWS is anticipated in this chapter. Solar thermal conversion technologies for industrial process heating will be a tremendous success with the aid of the approaches mentioned in this chapter.

REFERENCES

1. H. Hafs et al., “Numerical simulation of the performance of passive and active solar still with corrugated absorber surface as heat storage medium for sustainable solar desalination technology,” Groundwater for Sustainable Development, vol. 14, p. 100610, Aug. 2021. doi: 10.1016/j.gsd.2021.100610. 2. O. B. Mousa, and R. Taylor, “Photovoltaic Thermal Technologies for Medium Temperature Industrial Application A Global TRNSYS Performance Comparison,” 2017 International Renewable and Sustainable Energy Conference (IRSEC), Dec. 2017. doi: 10.1109/irsec.2017.8477319. 3. J. Zou, “Review of concentrating solar thermal power industry in China: Status quo, problems, trend and countermeasures,” IOP Conference Series: Earth and Environmental Science, vol. 108, p. 052119, 2018. doi: 10.1088/1755-1315/108/5/ 052119. 4. M. Ghazouani, M. Bouya, and M. Benaissa, “A New Methodology to Select the Thermal Solar Collectors by Localizations and Applications,” 2015 3rd International Renewable and Sustainable Energy Conference (IRSEC), Dec. 2015. doi: 10.1109/ irsec.2015.7455058. 5. T. X. Li, S. Wu, T. Yan, J. X. Xu, and R. Z. Wang, “A novel solid–gas thermochemical multilevel sorption thermal battery for cascaded solar thermal energy storage,” Applied Energy, vol. 161, pp. 1–10, 2016. doi: 10.1016/j.apenergy.2015.09.084.

134

Multimedia Data Processing and Computing

6. R. Bellatreche, M. Ouali, M. Balistrou, and D. Tassalit, “Thermal efficiency improvement of a solar desalination process by parabolic trough collector,” Water Supply, vol. 21, no. 7, pp. 3698–3709, 2021. doi: 10.2166/ws.2021.131. 7. M. Karami, M. Bozorgi, and S. Delfani, “Effect of design and operating parameters on thermal performance of low-temperature direct absorption solar collectors: A review,” Journal of Thermal Analysis and Calorimetry, vol. 146, no. 3, pp. 993–1013, 2020. doi: 10.1007/s10973-020-10043-z. 8. Y. Liang, M. Janorschke, and C. E. Hayes, “Low-Cost Solar Collectors to Pre-heat Ventilation Air in Broiler Buildings,” 2021 ASABE Annual International Virtual Meeting, 2021. doi: 10.13031/aim.202100619. 9. W. M. Hamanah, A. Salem, M. A. Abido, A. M. Qwbaiban, and T. G. Habetler, “Solar power tower drives: A comprehensive survey,” IEEE Access, pp. 1–1, 2021. doi: 10.1109/ access.2021.3066799. 10. M. Lyu, J. Lin, J. Krupczak, and D. Shi, “Solar harvesting through multilayer spectral selective iron oxide and porphyrin transparent thin films for photothermal energy generation,” Advanced Sustainable Systems, vol. 5, no. 6, p. 2100006, 2021. doi: 10.1002/ adsu.202100006. 11. S. Akshay, and T. K. Ramesh, “Efficient Machine Learning Algorithm for Smart Irrigation,” 2020 International Conference on Communication and Signal Processing (ICCSP), Jul. 2020, Published, doi: 10.1109/iccsp48568.2020.9182215. 12. “Expand Patient Care with AWS Cloud for Remote Medical Monitoring | 10,” Taylor & Francis. https://www.taylorfrancis.com/chapters/edit/10.1201/9781003217091-10/ expand-patient-care-aws-cloud-remote-medical-monitoring-parul-dubey-pushkardubey 13. “Big Data, Cloud Computing and IoT,” Google Books. https://books.google.com/books/ about/Big_Data_Cloud_Computing_and_IoT.html?id=gDaxEAAAQBAJ 14. “Free Cloud Computing Services - AWS Free Tier,” Amazon Web Services, Inc. https:// aws.amazon.com/free/ 15. “Amazon Kinesis Data Streams FAQs | Amazon Web Services,” Amazon Web Services, Inc. https://www.amazonaws.cn/en/kinesis/data-streams/faqs/ 16. “Amazon Simple Storage Service (S3) — Cloud Storage — AWS,” Amazon Web Services, Inc. https://aws.amazon.com/s3/faqs/

11

A Discussion with Illustrations on World Changing ChatGPT – An Open AI Tool Parul Dubey, Shilpa Ghode, Pallavi Sambhare, and Rupali Vairagade

11.1 INTRODUCTION The development of chatting machines, also known as chatbots, has emerged as one of the most astonishing applications of artificial intelligence technology in recent years. The use of chatbots, which are conversational pieces of software, can help the experience of communicating with computers feel natural and less mechanical [1]. The advancement of machine learning and natural language processing techniques has led to improvements in the precision and intelligence of chatbots. Within this sphere, the ChatGPT, developed by OpenAI, stands out as a oneof-a-kind automaton. This is accomplished through the utilization of a deep neural network architecture known as generative pre-trained transformer (GPT), which is taught on extensive amounts of text data in order to understand the nuances of human language. Because of its flexibility, ChatGPT can be helpful in a variety of disciplines, including customer support and medicine, for example. Because ChatGPT is openly accessible as open source, programmers have the option of utilizing it in its original form or modifying it to better suit their needs. Thus, ChatGPT has the potential to completely transform our lives by making our relationships with technology more streamlined and straightforward.

11.2  LITERATURE REVIEW Thinking about the potential roles that ChatGPT could play in shaping the future of academics and librarianship [2] is both nerve-wracking and thrilling. However, in the race to create new scholarly knowledge and educate future professionals, it is crucial to consider how to use this technology in a responsible and ethical manner. Additionally, it is important to investigate how we, as professionals, can collaborate with this technology to improve our work, rather than abusing it or allowing it to abuse us. This is because there is a race going on to educate future professionals. Another research report [3] discussed a ChatGPT linguistic model developed by OpenAI applications in the health sector. The ability of ChatGPT to generate writing that sounds natural from enormous quantities of data has the potential to assist both DOI: 10.1201/9781003391272-11

135

136

Multimedia Data Processing and Computing

individuals and groups in making decisions that are more beneficial to their health. There are a few limitations and challenges that come along with using ChatGPT, but it does have the potential to make a positive impact on public health. This analysis focused on the potential implementations of ChatGPT in the field of public health, as well as the advantages and downsides associated with using this technology. Irish limericks were written with the help of Chat GPT in another study [4]. A trend emerged during production that seemed to produce upbeat limericks about liberal leaders and downbeat ones about conservative ones. Following the discovery of this trend, the number of participants in the study was increased to 80, and statistical computations were performed to see if the observed data diverged from what would have been expected based on chance theory. It was discovered that the AI had a liberal prejudice, favoring liberal leaders and disfavoring conservatism. To effectively address the multifaceted challenge posed by climate change, interdisciplinary approaches from a wide range of disciplines, including atmospheric science, geology, and ecology, are essential. Because of the complexity and breadth of the problem, gaining an understanding of, analyzing, and forecasting future climate conditions requires the use of cutting-edge tools and methodologies. ChatGPT is an example of a technology that combines artificial intelligence and natural language processing, and it has the potential to play a significant role in improving both our understanding of climate change and the accuracy of our ability to forecast future climate conditions. In the field of climate research, ChatGPT’s many applications include, among other things, the construction of models, the analysis and interpretation of data, the development of scenarios, and the evaluation of models. Academics and policymakers now have access to a powerful tool that will allow them to generate and evaluate a variety of climate scenarios based on a wide range of data sources, as well as improve the accuracy of climate projections. The author freely confesses to having posed a question on ChatGPT concerning the possibility of applying its findings to the investigation of climate change. The article [5] provides a listing of a number of potential applications, both for the present and the future. The author performed an analysis on the GPT conversation responses and made some adjustments to them.

11.3  AI AND CHATGPT The term “AI-powered chat assistant” refers to a tool that is built on machine learning and is capable of replicating human-like interactions. ChatGPT falls into this category. The ability of computers to perform tasks that would typically require the cognition of a human being, such as understanding common language and generating appropriate responses, is referred to as AI, which stands for “artificial intelligence.” Using advanced techniques of machine learning such as deep neural networks, ChatGPT is able to comprehend natural language questions and generate responses that are human-like in nature. Deep neural networks are a specific type of machine learning algorithm that are designed to simulate the functioning of the human brain. This is accomplished by creating multiple connected levels of artificial neurons that are able to process data in a hierarchical fashion. The use of deep neural networks

A Discussion with Illustrations on World Changing ChatGPT

137

has enabled ChatGPT to learn from a large corpus of text data and generate responses that are contextually appropriate and linguistically sophisticated [6]. Therefore, ChatGPT represents an impressive application of AI technology. By using machine learning algorithms to process natural language inputs, ChatGPT is capable of mimicking human-like conversations and generating responses that are relevant and coherent. This makes ChatGPT a potent tool for a wide range of applications. Overall, the relationship between ChatGPT and AI highlights the transformative potential of machine learning-based applications in our lives. A few important discussions relating AI and ChatGPT are listed below:

11.3.1 Code, Chat and Career: Pros and Cons of Using AI Language Models for Coding Industry Using AI language models such as GPT for coding industry has several potential advantages and disadvantages. Here are a few pros and cons to consider: Pros 1. Increased productivity: AI language models can assist developers in writing code faster and more accurately, leading to increased productivity. 2. Improved code quality: AI language models can help identify potential errors in the code, leading to improved code quality and reduced debugging time. 3. Enhanced collaboration: AI language models can help facilitate collaboration between developers, making it easier for team members to understand each other’s code and work together more efficiently. 4. Accessible to all levels: AI language models can be used by developers of all levels, from beginners to experts, making it easier for new developers to learn and improve their skills. Cons 1. Dependence on AI: Overreliance on AI language models could lead to developers relying too heavily on automated suggestions, leading to a decline in their own coding skills. 2. Limited context awareness: AI language models may not always be able to take into account the full context of a coding problem or project, potentially leading to inaccurate or incomplete solutions. 3. Bias and errors: AI language models can sometimes produce biased or incorrect output due to the data they were trained on or limitations in the algorithms used. 4. Privacy and security risks: Storing code in AI language models raises concerns about the security and privacy of sensitive information. These are just a few examples of the pros and cons of using AI language models such as GPT for the coding industry. As with any technology, it’s important to carefully consider the potential benefits and drawbacks before deciding to use it in a particular context.

138

Multimedia Data Processing and Computing

11.3.2  Jobs of Future: Will AI Displace or Augment Human Workers? A lot of people are curious about the impact that artificial intelligence will have on the world of employment in the future. While some people are concerned that artificial intelligence will make humans obsolete in employment, others see it as a means to complement human capabilities and open up new areas of study. The level of industrialization, the complexity of the work, and the number of competent workers who are available to work alongside AI systems are just a few of the many factors that will play a role in determining the impact that AI will have on employment. Inputting data and working as workers on production lines are two instances of the kinds of jobs that could be automated away by artificial intelligence. However, in other cases, AI may augment human workers by assisting them in complex decisionmaking or providing real-time insights. It’s also important to note that the development of AI technologies may create entirely new jobs and industries, such as AI programming and development, data analysis and interpretation, and AI ethics and regulation. Overall, it’s difficult to predict exactly how AI will impact the job market in the future. However, it’s clear that AI has the potential to both displace and augment human workers, and it will be important for individuals and organizations to adapt to these changes and embrace new opportunities as they arise.

11.4  IMPACT OF CHATGPT ChatGPT has the potential to make a significant impact in various fields due to its ability to simulate human-like conversations and generate relevant responses. It can be explained with Figure 11.1. Here are some of the ways ChatGPT can make an impact: 1. Customer service: It is possible to use it to provide customer service assistance 24 hours a day, 7 days a week without the participation of a human being, which can reduce the need for human customer service

FIGURE 11.1  Impact of ChatGPT.

A Discussion with Illustrations on World Changing ChatGPT

139

representatives. This has the potential to improve consumer satisfaction while also saving time and money for enterprises. 2. Healthcare: It can also be used as a virtual assistant for patients, providing personalized health advice and answering questions about medical conditions. This can improve access to healthcare services, especially in areas where healthcare providers are limited. 3. Education: ChatGPT can be used as a virtual tutor, helping students with homework assignments and providing personalized feedback. This can improve the quality of education and make it more accessible to students in remote areas. 4. Language translation: It is possible to use ChatGPT to transcribe writing from one specific language to another, thereby removing linguistic obstacles and enhancing collaboration between individuals who speak a variety of languages. 5. Mental health: ChatGPT can be used to help individuals who are experiencing mental health issues by mimicking a discussion with a therapist. This can make mental health facilities more available to people who are reluctant to use them, and it can help lessen the shame connected with doing so. Overall, ChatGPT has the potential to transform the way we interact with machines and make our lives more efficient and accessible. By providing personalized and contextually appropriate responses, ChatGPT can improve customer satisfaction, access to healthcare services, education, language translation, and mental health support. Further ChatGPT has different impacts on different age groups and users. A few are discussed below.

11.4.1 Impact that ChatGPT Creates on Students ChatGPT can have a positive impact on students by providing access to a vast amount of information and resources in a conversational format. Students can use ChatGPT to ask questions and receive instant answers on a wide range of topics, including science, history, literature, and more. ChatGPT can also help students to improve their critical thinking and problem-solving skills by encouraging them to ask thoughtful questions and evaluate the accuracy and relevance of the responses. Furthermore, ChatGPT can be used as a tool to supplement classroom learning and enhance self-directed learning. Students can use ChatGPT to access information and resources outside of the classroom, and to get instant feedback on their understanding of the material. However, it’s important to note that ChatGPT is not a substitute for traditional classroom learning and human interaction. While ChatGPT can provide instant answers to factual questions, it cannot provide the same level of guidance and support that human teachers and mentors can offer. Additionally, students need to be aware of the potential limitations and biases of AI models and should use critical thinking and verify information from multiple sources before making decisions or taking action based on its responses.

140

Multimedia Data Processing and Computing

11.4.2 Impact that ChatGPT Creates on Teachers/Academicians ChatGPT can have a positive impact on teachers and academicians by providing access to a vast amount of information and resources that can be used to enhance teaching and research activities. Teachers and academicians can use ChatGPT to ask questions and receive instant answers on a wide range of topics, including science, technology, history, culture, and more. ChatGPT can also help teachers and academicians to stay updated on the latest developments and trends in their respective fields. Furthermore, ChatGPT can be used as a tool to supplement classroom teaching and enhance self-directed learning. Teachers can use ChatGPT to provide students with instant feedback on their understanding of the material, and to provide additional resources and support outside of the classroom. In addition, ChatGPT can be used by researchers to access and analyze large amounts of data in a timely and efficient manner. ChatGPT can also be used to generate new research ideas and hypotheses, and to support the development of research proposals and grant applications. However, it’s important to note that ChatGPT is not a substitute for traditional teaching and research methods and should be used in conjunction with them. While ChatGPT can provide instant answers to factual questions, it cannot provide the same level of guidance and support that human teachers and mentors can offer. Additionally, researchers need to be aware of the potential limitations and biases of AI models and should use critical thinking and verify information from multiple sources before making decisions or taking action based on its responses. Overall, ChatGPT can be a useful tool for teachers and academicians, but it should be used in conjunction with traditional teaching and research methods and with guidance from human teachers and mentors.

11.4.3 Impact that ChatGPT Creates on Parents ChatGPT can have a positive impact on parents by providing access to a vast amount of information and resources that can help them support their children’s learning and development. Parents can use ChatGPT to ask questions and receive instant answers on a wide range of topics, including education, child development, health, and more. ChatGPT can also help parents to stay updated on the latest parenting trends and research. Furthermore, ChatGPT can be used as a tool to supplement traditional parenting methods and enhance self-directed learning. Parents can use ChatGPT to provide their children with additional resources and support outside of the classroom, and to help them with homework and assignments. In addition, ChatGPT can help parents to engage with their children in meaningful conversations and encourage them to ask thoughtful questions. This can help to foster their children’s curiosity and critical thinking skills, and support their overall learning and development. However, it’s important to note that ChatGPT is not a substitute for traditional parenting methods and should be used in conjunction with them. While ChatGPT can provide instant answers to factual questions, it cannot replace the guidance and

A Discussion with Illustrations on World Changing ChatGPT

141

support that human parents can offer. Additionally, parents need to be aware of the potential limitations and biases of AI models and should use critical thinking and verify information from multiple sources before making decisions or taking action based on its responses. Overall, ChatGPT can be a useful tool for parents, but it should be used in conjunction with traditional parenting methods and with guidance from human parents and experts.

11.5  APPLICATIONS OF CHATGPT ChatGPT has a wide range of applications across various industries and fields. Here are some of the common applications of ChatGPT: a. Customer service: ChatGPT can be a valuable tool for customer service, as it can provide an automated way to interact with customers and respond to their queries. Some ways ChatGPT can be used for customer service include the following: • Answering FAQs: ChatGPT can provide quick and accurate responses to frequently asked questions, such as product information, pricing, and delivery times. This can help to reduce the workload of customer service agents and improve the efficiency of customer support. • 24/7 Availability: ChatGPT can provide 24/7 availability to customers, enabling them to get assistance at any time. This can improve customer satisfaction and loyalty, as customers appreciate the convenience of being able to get help whenever they need it. • Personalized Responses: ChatGPT can use data about a customer’s previous interactions and purchase history to provide personalized responses and recommendations. This can help to build a stronger relationship with customers and increase sales. • Multilingual Support: ChatGPT can provide multilingual support to customers, enabling them to get help in their preferred language. This can improve the customer experience for non-native speakers and help to expand a company’s customer base. b. Education: ChatGPT can play several roles in education, including the following: • Personalized Learning: ChatGPT can help to personalize the learning experience for students. By analyzing students’ learning patterns and preferences, it can provide customized learning material and recommend learning resources to help students achieve their goals. • Student Support: ChatGPT can act as a virtual teaching assistant and provide support to students by answering their questions, providing feedback, and offering guidance on assignments and projects. This can help to alleviate the workload of teachers and enable them to focus on more complex tasks. • Language Learning: ChatGPT can assist students in learning a new language by providing them with conversation practice and feedback.

142

Multimedia Data Processing and Computing

It can also help students improve their writing skills by providing suggestions and corrections. • Accessibility: ChatGPT can help to make education more accessible to students with disabilities or learning difficulties. By providing a more interactive and flexible learning experience, it can enable students to learn at their own pace and in their preferred format. c. Healthcare: ChatGPT can be used in healthcare to provide personalized medical advice to patients in several ways. Here are some examples: • Symptom checking: ChatGPT can be used to help patients check their symptoms and provide an initial diagnosis based on their symptoms. Patients can input their symptoms into the chatbot, and ChatGPT can use its knowledge base to provide an accurate diagnosis and suggest next steps. • Medication information: ChatGPT can be used to help patients understand their medications, including the dosage, side effects, and interactions with other medications. • Appointment scheduling: ChatGPT can be used to help patients schedule appointments with their healthcare providers, including primary care physicians, specialists, and other healthcare professionals. • Health monitoring: ChatGPT can be used to monitor patients’ health status, including tracking their vital signs, medication adherence, and other health metrics. • Chronic disease management: ChatGPT can be used to help patients manage chronic diseases, such as diabetes, asthma, and heart disease, by providing information about the disease, helping patients monitor their symptoms, and suggesting lifestyle changes. d. E-commerce: ChatGPT can be used in e-commerce to provide personalized product recommendations to customers based on their preferences and behaviors. Here are some ways ChatGPT can be used for product recommendations: • Chatbot-based product recommendations: E-commerce companies can integrate chatbots powered by ChatGPT on their website or mobile app to engage with customers and provide product recommendations. Customers can input their preferences or behaviors, such as their browsing history or purchase history, and ChatGPT can provide personalized product recommendations. • Email marketing: E-commerce companies can use ChatGPT to send personalized product recommendations to customers through email marketing campaigns. ChatGPT can analyze customers’ purchase history and browsing behavior to suggest products that they are likely to be interested in. • Personalized product bundles: ChatGPT can be used to suggest product bundles that are personalized to customers’ preferences and behaviors. For example, if a customer has purchased a certain type of product, ChatGPT can suggest complementary products that they may be interested in.

A Discussion with Illustrations on World Changing ChatGPT

143

• Upselling and cross-selling: ChatGPT can be used to suggest products that complement or upgrade a customer’s existing purchase. For example, if a customer has purchased a smartphone, ChatGPT can suggest phone cases, screen protectors, and other accessories. e. Marketing: Yes, ChatGPT can be used in marketing to improve customer engagement and provide personalized recommendations to customers. Here are some ways ChatGPT can be used in marketing: • Chatbot-based customer service: E-commerce companies can integrate ChatGPT-powered chatbots on their websites or mobile apps to engage with customers and provide personalized customer service. Chatbots can help customers with their queries and provide personalized recommendations based on their preferences and behaviors. • Personalized product recommendations: ChatGPT can be used to provide personalized product recommendations to customers based on their browsing history, purchase history, and other factors. E-commerce companies can send personalized emails or push notifications to customers with product recommendations, based on ChatGPT analysis of customer behavior. • Personalized content recommendations: ChatGPT can be used to provide personalized content recommendations to customers based on their interests and preferences. For example, ChatGPT can suggest articles, videos, or other content that a customer is likely to be interested in based on their browsing history and behavior. • Customer feedback analysis: ChatGPT can be used to analyze customer feedback and provide insights to improve the customer experience. ChatGPT can analyze customer reviews, social media posts, and other feedback to identify common themes and suggest improvements. • Chatbot-based lead generation: ChatGPT can be used to generate leads by engaging with customers and asking them questions about their preferences and behaviors. Based on the responses, ChatGPT can identify potential customers and suggest products or services that they may be interested in. f. Financial services: ChatGPT can be used in financial services to provide personalized financial advice to customers. Here are some ways ChatGPT can be used for financial advice: • Investment recommendations: ChatGPT can be used to provide investment recommendations to customers based on their risk tolerance, investment goals, and other factors. Customers can input their investment preferences and behaviors, and ChatGPT can provide personalized investment recommendations. • Retirement planning: ChatGPT can be used to help customers plan for retirement by analyzing their retirement goals, income, expenses, and other factors. ChatGPT can suggest investment strategies and retirement plans that are personalized to customers’ needs.

144

Multimedia Data Processing and Computing

• Debt management: ChatGPT can be used to help customers manage their debt by providing advice on how to reduce debt, manage credit card balances, and improve credit scores. • Financial education: ChatGPT can be used to provide financial education to customers by answering their questions about personal finance, budgeting, and other financial topics. • Customer service: ChatGPT can be used to provide instant and personalized customer service support to customers with their financial queries. g. Content creation: ChatGPT can be used for content creation in various ways. Here are some examples: • Blog post generation: ChatGPT can be used to generate blog post ideas and even write the blog post itself. This is especially useful for content creators who may experience writer’s block or need inspiration for new ideas. • Social media post generation: ChatGPT can be used to generate social media post ideas, including captions and hashtags, that are tailored to a specific audience. • Video script creation: ChatGPT can be used to generate video script ideas or even write the entire script for a video. This can save time and effort for content creators who may struggle with script writing. • Content optimization: ChatGPT can be used to optimize existing content by suggesting improvements to language, structure, and formatting. This can help improve the readability and engagement of the content. • Content summarization: ChatGPT can be used to summarize long articles or reports into shorter, more digestible formats. This is especially useful for creating executive summaries or briefs for busy professionals. h. Research: ChatGPT can be used in research to analyze large amounts of data and generate insights and hypotheses in the following ways: • Natural language processing: ChatGPT can be trained on large datasets to identify patterns and trends in unstructured data, such as text or speech. This can help researchers identify key themes and insights from large amounts of data. • Text summarization: ChatGPT can be used to summarize large amounts of text data, such as research articles or reports. This can help researchers quickly understand the main findings and conclusions of a study or report. • Hypothesis generation: ChatGPT can be used to generate hypotheses based on patterns identified in the data. This can be useful in exploratory research or when researchers are seeking to generate new ideas and theories. • Sentiment analysis: ChatGPT can be used to analyze the sentiment of large amounts of text data, such as customer reviews or social media

A Discussion with Illustrations on World Changing ChatGPT

145

FIGURE 11.2  Applications of ChatGPT.

posts. This can help researchers understand the attitudes and opinions of customers or the general public towards a particular topic. • Data visualization: ChatGPT can be used to generate visualizations of large amounts of data, such as graphs or charts. This can help researchers identify patterns and trends in the data more easily. Figure 11.2 summarizes the applications of ChatGPT at a glance.

11.6  ADVANTAGES OF CHATGPT Some of the advantages of ChatGPT include the following: 1. Scalability: ChatGPT can handle a large volume of requests and can provide responses in real time, making it an efficient solution for businesses and organizations that need to handle a high volume of customer inquiries. 2. Personalization: ChatGPT can be customized to provide personalized responses based on the user’s preferences, history, and behavior, providing a more personalized and engaging experience for users.

146

FIGURE 11.3

Multimedia Data Processing and Computing

Advantages of ChatGPT.

3. Cost-effectiveness: ChatGPT can be a cost-effective solution for businesses and organizations that want to provide 24/7 customer support without the need for human operators, reducing staffing costs. 4. Efficiency: ChatGPT can provide fast and accurate responses to user inquiries, reducing wait times and improving the user experience. 5. Consistency: ChatGPT provides consistent responses to users, ensuring that all users receive the same information and treatment. 6. Accessibility: ChatGPT is a helpful service for users who favor digital communication because it can be accessed from any device as long as it has an internet connection. Advantages of ChatGPT can be seen in Figure 11.3.

11.7  DISADVANTAGES OF CHATGPT Some of the disadvantages of ChatGPT include the following: 1. Lack of emotional intelligence: ChatGPT is an artificial intelligence model and cannot replicate the emotional intelligence and empathy of human operators. This can lead to frustration and dissatisfaction among users who are looking for a more human connection. 2. Limited knowledge base: While ChatGPT has been trained on a large corpus of data, its knowledge base is still limited to the information it has been trained on. This means that it may not be able to provide accurate responses to queries that fall outside its area of expertise. 3. Language limitations: ChatGPT has been primarily trained on Englishlanguage data, which means that it may not be as effective at providing accurate responses in other languages. 4. Lack of context: ChatGPT can sometimes provide inaccurate or irrelevant responses if it does not understand the context of the query or if the query is ambiguous. 5. Dependence on training data: The quality of ChatGPT responses depends on the quality and diversity of the data it has been trained on. If the training data is biased or limited, it can affect the accuracy and reliability of ChatGPT responses. Disadvantages of ChatGPT can be seen in Figure 11.4.

A Discussion with Illustrations on World Changing ChatGPT

147

FIGURE 11.4  Disadvantages of ChatGPT.

11.8  ALGORITHMS USED IN CHATGPT With the assistance of the GPT architecture, which is based on the design of the Transformer, ChatGPT is able to produce text of an exceptionally high standard. Due to its capability of comprehending the patterns and structures of natural language as a result of intensive pre-training on a vast quantity of data, this model is extremely helpful for a variety of NLP tasks. This is because of its ability to understand the patterns and structures of natural language. The neural network architecture that came to be known as the Transformer was initially introduced in an article in 2017 [7]. The design of the Transformer is based on the concept of self-attention, which gives the model the ability to focus on different parts of the input sequence at different times. This eliminates the need to handle the input sequence in a specific manner as is the case with many other designs. The architecture of the Transformer is centered on a pair of central components known as an encoder and a decoder. The encoder will take the input sequence and transform it into a number of secret forms. These secret forms will accurately represent the input sequence. After that, the encoder uses these hidden models to generate the output sequence in a repetitive manner. The self-attention method is the most innovative aspect of the Transformer’s design because it permits the model to allocate a proportional weight to each input character as it is creating the output series. This makes the self-attention method the most unique component of the Transformer’s design. In order to achieve this goal, we first determine the concentration ratings for each potential combination of input characters. After that, a weighted average of all the entering terms is computed, with the concentration evaluations serving as the weights in the calculation. Because it has a self-attention system, the Transformer design is able to process input patterns of a limitless duration. This removes the necessity for repeating connections or convolutions in the original design. Because of this, the model is preferable to previous sequence-to-sequence models such as the LSTM and the GRU in terms of effectiveness and the ability to parallelize the processing.

11.8.1 Illustration 1 Audio-to-text algorithms are computer programs that convert spoken language or sound recordings into written text. This technology has become increasingly important as the volume of audio data generated in daily life has exploded. Audio recordings

148

Multimedia Data Processing and Computing

of meetings, interviews, lectures, podcasts, and videos are just some examples of the type of data that can be converted into text using an audio-to-text algorithm. The use of audio-to-text algorithms has several advantages. First, it allows for easier storage, retrieval, and searchability of audio data. Text data are generally more easily searchable and analyzable than audio data, allowing for more efficient processing and analysis of information. Second, audio-to-text algorithms enable people who are deaf or hard of hearing to access spoken content more easily, as well as enabling automatic transcription in real time during live events or for remote meetings. Audio-to-text algorithms typically use a combination of machine learning and natural language processing techniques to transcribe audio into text. These algorithms are trained on large datasets of speech and text data to recognize patterns in speech and to accurately transcribe the spoken content into written text. They also take into account various factors, such as speaker identification, accent, background noise, and context, to improve the accuracy of the transcription. Here is a high-level algorithm that ChatGPT might use for converting audio to text: 1. Receive audio input: ChatGPT receives the audio input, which could be in the form of a voice recording or a live audio stream. 2. Pre-processing: The audio is pre-processed to remove background noise and enhance the quality of the audio. 3. Feature extraction: The audio signal is broken down into smaller segments, and features such as frequency, pitch, and volume are extracted from each segment. 4. Language model: ChatGPT uses a language model trained on a large corpus of text to convert the extracted features into text. The language model uses probabilistic algorithms to generate the most likely sequence of words given the input features. 5. Post-processing: The generated text is post-processed to remove any errors or inconsistencies and improve the overall accuracy of the transcription. 6. Output: The final text output is returned to the user or stored in a database for future reference. This algorithm can be implemented using various libraries and tools for speech recognition and natural language processing, such as the Google Cloud Speech-to-Text API or the Python SpeechRecognition library. A screen shot for the implementation of the code in python is attached below in Figure 11.5.

11.8.2 Illustration 2 Processing text input using the ChatGPT API can be a complex task that presents several challenges. The ChatGPT API is a powerful natural language processing tool that is capable of generating human-like responses to textual input. However, it requires a well-designed algorithm that takes into account various factors, such as the quality and accuracy of the input data, the desired length and complexity of the response, and the user’s expectations.

A Discussion with Illustrations on World Changing ChatGPT

149

FIGURE 11.5  Pseudocode for illustration 1.

One challenge with processing text input using the ChatGPT API is ensuring that the input data is of high quality and accuracy. The ChatGPT algorithm relies on large datasets of text and speech data to generate responses, and it requires clean and relevant input data to produce accurate results. If the input data is noisy or contains errors or inconsistencies, the ChatGPT algorithm may generate inaccurate or irrelevant responses. Another challenge is determining the desired length and complexity of the response. The ChatGPT API can generate responses of varying length and complexity, depending on the input data and the user’s expectations. However, it is important to ensure that the generated responses are appropriate and relevant to the user’s needs. For example, if the user is seeking a short and simple answer to a question, a long and complex response may not be suitable. Finally, user expectations can also pose a challenge in processing text input using the ChatGPT API. Users may have different expectations regarding the tone, style, and content of the generated responses, and it is important to take these into account when designing the algorithm. For example, users may expect a conversational and informal tone from a chatbot, but a formal and professional tone from a language translation tool.

150

Multimedia Data Processing and Computing

FIGURE 11.6  Pseudocode for illustration 2.

Here’s an algorithm for processing text input using the ChatGPT API. A screen shot for the implementation of the code in python is attached in Figure 11.6: 1. Initialize the API endpoint URL and authentication credentials 2. Define the input text to be processed 3. Define the API request payload with the input text and any additional parameters 4. Send a POST request to the API endpoint with the payload and credentials 5. Receive the API response, which includes the processed text output 6. Handle any errors or exceptions that may occur during the API request 7. Return the processed text output to the calling function or user interface

11.8.3 Illustration 3 Data analysis is a critical component of many fields, including business, finance, healthcare, and scientific research. One of the most significant challenges in data analysis is dealing with large and complex datasets that require sophisticated analytical techniques to extract meaningful insights.

A Discussion with Illustrations on World Changing ChatGPT

151

FIGURE 11.7  Pseudocode for illustration 3.

ChatGPT is a natural language processing tool that can be used to solve various data analysis problems. One key advantage of ChatGPT is its ability to analyze and interpret textual data, including unstructured data such as social media posts, customer reviews, and survey responses. For example, ChatGPT can be used to analyze customer feedback and identify common themes and issues. By processing large volumes of text data, ChatGPT can identify patterns and trends that may be difficult or time-consuming to identify manually. This can help businesses to improve their products and services and enhance the customer experience. Another use case for ChatGPT in data analysis is in scientific research. Researchers can use ChatGPT to analyze large volumes of research papers and identify key concepts and relationships. This can help to identify knowledge gaps and opportunities for future research. Figure 11.7 gives the pseudocode for this illustration. Here is a general algorithm that ChatGPT could follow for a basic NLP data analysis task: 1. Collect and pre-process the text data: The first step is to gather the text data that will be analyzed and pre-process it to remove any irrelevant information, such as formatting or special characters. This could involve using techniques such as tokenization, stemming, and stop word removal.

152

Multimedia Data Processing and Computing

2. Load the pre-processed text data into ChatGPT: The next step is to load the pre-processed text data into ChatGPT, which will be used to generate responses and analyze the text. 3. Generate responses using ChatGPT: Once the text data is loaded into ChatGPT, the model can be used to generate responses to specific queries or questions. The responses generated by ChatGPT can provide insights into the text data and help identify patterns or trends. 4. Analyze the generated responses: After ChatGPT generates responses to the queries, the responses can be analyzed to extract meaningful insights from the data. This could involve techniques such as sentiment analysis, topic modelling, or named entity recognition. 5. Visualize and present the results: Finally, the results of the data analysis can be visualized and presented in a format that is easy to understand and interpret. This could involve creating charts, graphs, or other visualizations to help communicate the insights that were extracted from the text data.

11.9  FUTURE OF CHATGPT The future of ChatGPT is promising, as the technology continues to evolve and improve. Here are some potential developments that could shape the future of ChatGPT: • Improved accuracy: As the amount and diversity of training data increases, ChatGPT’s accuracy is likely to improve, enabling it to provide more accurate and relevant responses. • Multilingual support: ChatGPT could be further developed to support multiple languages, making it more accessible and useful to users around the world. • Enhanced personalization: ChatGPT could be customized to provide more personalized responses based on a user’s individual preferences and behavior, improving the user experience. • Integration with other technologies: ChatGPT could be integrated with other technologies, such as voice assistants or virtual reality, to provide even more immersive and engaging experiences. • Increased emotional intelligence: Developers could work to improve ChatGPT’s emotional intelligence, enabling it to provide more empathetic and human-like responses to users.

11.10 CONCLUSION This study offers a thorough evaluation of ChatGPT, an open AI utility that has significantly advanced NLP. The article describes how ChatGPT has been used in a variety of settings, including healthcare, banking, customer service, and education, to boost satisfaction ratings, lower costs, and shorten decision-making times.

A Discussion with Illustrations on World Changing ChatGPT

153

This article examines the benefits and drawbacks of ChatGPT, discussing how it has lowered the barrier to entry for natural language processing in the business world but also raising concerns about prejudice and ethical considerations. The writers also look ahead to the development and possible problems that may arise with ChatGPT in the future. The chapter explains how ChatGPT’s application programming interface (API) and methods operate by using cases from the area of natural language processing.

REFERENCES 1. “Introducing ChatGPT,” Introducing ChatGPT. https://openai.com/blog/chatgpt 2. B. Lund and W. Ting, “Chatting about ChatGPT: How May AI and GPT Impact Academia and Libraries?” SSRN Electronic Journal, 2023. doi: 10.2139/ssrn.4333415. 3. S. S. Biswas, “Role of Chat GPT in Public Health,” Annals of Biomedical Engineering, Mar. 2023. doi: 10.1007/s10439-023-03172-7. 4. R. W. McGee, “Is Chat Gpt Biased Against Conservatives? An Empirical Study,” SSRN, Feb. 17, 2023. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4359405 5. S. S. Biswas, “Potential Use of Chat GPT in Global Warming,” Annals of Biomedical Engineering, Mar. 2023. doi: 10.1007/s10439-023-03171-8. 6. N. M. S. Surameery and M. Y. Shakor, “Use Chat GPT to Solve Programming Bugs,” International Journal of Information technology and Computer Engineering, no. 31, pp. 17–22, Jan. 2023, doi: 10.55529/ijitc.31.17.22. 7. A. Vaswani et al., “Attention Is All You Need,” arXiv.org, Jun. 12, 2017. https://arxiv. org/abs/1706.03762v5

12

The Use of Social Media Data and Natural Language Processing for Early Detection of Parkinson’s Disease Symptoms and Public Awareness Abhishek Guru, Leelkanth Dewangan, Suman Kumar Swarnkar, Gurpreet Singh Chhabra, and Bhawna Janghel Rajput

12.1 INTRODUCTION Parkinson’s disease (PD) is a neurodegenerative disorder characterized by motor and non-motor symptoms, affecting approximately 10 million individuals worldwide [1]. It primarily manifests as motor symptoms, such as tremors, rigidity, and bradykinesia (slow movement), along with non-motor symptoms, including cognitive impairment, mood disorders, and autonomic dysfunction [2]. Early diagnosis and intervention are essential for managing the progression of the disease, alleviating symptoms, and improving patients’ quality of life [3]. Social media platforms have emerged as valuable data sources for studying various health-related issues, including mental health, infectious diseases, and chronic conditions [4, 5]. The vast amounts of user-generated data offer researchers realtime insights into people’s experiences, behaviors, and perceptions of health [6]. Analyzing social media data can potentially identify early warning signs of diseases, track the dissemination of information, and raise public awareness about critical health issues [7]. Natural Language Processing (NLP) is a subfield of artificial intelligence that deals with the interaction between computers and human language. It enables computers to process and analyze large volumes of unstructured text data, such as social media posts, to extract meaningful insights [8]. NLP techniques have been applied in various health research contexts, including sentiment analysis, topic modeling, and predictive modeling [9]. 154

DOI: 10.1201/9781003391272-12

Early Detection of Parkinson’s Disease Symptoms and Public Awareness

155

In this research, we explore the potential of utilizing social media data and NLP techniques to detect early signs of PD and promote public awareness. We propose a comprehensive framework that integrates data collection, preprocessing, and analysis, and assess the effectiveness of this approach in identifying PD symptoms and fostering public awareness. The implications of this study extend to the development of novel methods for monitoring and managing health issues using social media data.

12.2  LITERATURE REVIEW 12.2.1  Parkinson’s Disease Detection and Diagnosis Efforts to detect and diagnose Parkinson’s disease (PD) have primarily focused on clinical assessments and neuroimaging techniques [10]. Recently, researchers have also explored the use of machine learning algorithms and wearable sensor data to identify early motor and non-motor signs of PD [11, 12]. Despite these advances, early detection of PD remains a challenge, necessitating innovative methods to identify early symptoms and improve patient outcomes [13].

12.2.2 Social Media and Health Research Social media platforms have become valuable data sources for health research due to their real-time, user-generated content, providing insights into public perceptions, experiences, and behaviors related to health issues [14, 15]. Studies have leveraged social media data to track disease outbreaks, monitor mental health, and assess public awareness and sentiment around health topics [16–18]. These research efforts indicate the potential of social media data to inform health research, policy, and intervention strategies [19].

12.2.3 Natural Language Processing in Health Research Natural Language Processing (NLP) techniques have been employed in various health research contexts, enabling the analysis of large volumes of unstructured text data. NLP has been applied to extract information from electronic health records, analyze patient narratives, and mine social media data for health-related insights [20–22]. In the context of PD, previous studies have employed NLP techniques to identify relevant literature for systematic reviews, analyze online forum data for patient experiences, and extract information from clinical notes [23–25].

12.2.4 Early Detection of Health Issues Using Social Media and NLP Several studies have explored the potential of social media data combined with NLP techniques for early detection of various health issues. Examples include predicting the onset of depression using Twitter data [26], identifying early signs of Alzheimer’s disease in online forums [27], and detecting adverse drug reactions via social media analysis [28]. These studies demonstrate the potential of combining social media

156

Multimedia Data Processing and Computing

data and NLP techniques for early detection of health issues and informing public health interventions.

12.2.5  Public Awareness and Health Communication Public awareness plays a crucial role in the early detection, prevention, and management of diseases. Health communication campaigns on social media have been found to effectively raise public awareness, promote behavior change, and disseminate accurate health information [29, 30]. Analyzing social media data can provide insights into public awareness levels, sentiment, and knowledge gaps, enabling targeted health communication efforts to address identified needs (Table 12.1) [31].

TABLE 12.1 Summary of Studies on Social Media and NLP for Early Detection of Parkinson’s Disease Symptoms and Public Awareness Reference [10]

Focus PD detection and diagnosis

[11, 12]

PD detection using wearable sensor data Social media as a data source for health research Social media in disease tracking and health monitoring NLP in health research

[14, 15]

[16–18]

[20–22]

[23–25]

NLP applications in PD research

[26–28]

Early detection of health issues using social media and NLP Public awareness and health communication on social media

[29–31]

Methods/Techniques Clinical assessments, neuroimaging, and machine learning Wearable sensor data, machine learning algorithms Systematic reviews, content analysis Disease outbreak tracking, mental health monitoring, public awareness and sentiment analysis Electronic health records, patient narratives, social media data mining Literature classification, online forum analysis, clinical note extraction Twitter data, online forum data, sentiment analysis, machine learning Health communication campaigns, public awareness analysis, knowledge gap identification

Key Findings Relevant to Study Early detection of PD remains a challenge There are promising results for identifying early motor and non-motor signs of PD Social media data provides insights into public perceptions, experiences, and behaviors Social media data can inform health research, policy, and intervention strategies NLP enables analysis of large volumes of unstructured text data NLP techniques can be applied to extract information about PD from various sources There is demonstrated potential for early detection of health issues and informing public health interventions Social media can effectively raise public awareness, promote behavior change, and disseminate accurate health information

Early Detection of Parkinson’s Disease Symptoms and Public Awareness

157

12.3 METHODOLOGY 12.3.1  Data Collection We collected a dataset of 10,000 posts from Twitter, Facebook, and Reddit using the Python Reddit API Wrapper (PRAW) [7], Tweepy [8], and Facebook Graph API [9]. We used keywords related to Parkinson’s disease, including “Parkinson’s,” “PD,” “tremors,” and “stiffness,” to retrieve relevant posts. We collected posts from the past year and filtered out non-English posts and posts with irrelevant content.

12.3.2  Data Preprocessing We preprocessed the data using standard NLP techniques, including tokenization, stop-word removal, and stemming. We used the Natural Language Toolkit (NLTK) library in Python for this task. We removed URLs, hashtags, and mentions from the text and converted all text to lowercase. We also removed any special characters and digits from the text [32].

12.3.3 Feature Extraction We extracted three types of features from the preprocessed data: Term FrequencyInverse Document Frequency (TF-IDF), sentiment scores, and syntactic patterns. 12.3.3.1  Term Frequency-Inverse Document Frequency We computed the term frequency-inverse document frequency (TF-IDF) scores for each term in the training data using the following formula:

TF − IDF ( w, d ) = TF ( w, d ) * IDF ( w )

where TF(w, d) is the frequency of term w in document d, and IDF(w) is the inverse document frequency of w, given by:

IDF ( w ) = log ( N / DF ( w ))

where N is the total number of documents in the corpus and DF(w) is the number of documents containing w. We used the scikit-learn library in Python to compute the TF-IDF scores for each term in the training data. 12.3.3.2  Sentiment Scores We used the Vader sentiment analysis tool [10] to compute sentiment scores for each post in the dataset. Vader assigns a score between −1 and 1 to each post based on the degree of positivity, negativity, and neutrality expressed in the text. 12.3.3.3  Syntactic Patterns We used the Stanford Parser [11] to extract syntactic patterns from the text. The parser analyzes the syntax of a sentence and identifies its constituents, such as noun

158

Multimedia Data Processing and Computing

phrases and verb phrases. We used the frequencies of these patterns as features in the machine learning models.

12.3.4  Machine Learning Models We trained three machine learning models on the features extracted from the data: support vector machines (SVM), random forest, and a deep learning model. SVM is a widely used linear classifier that separates data into different classes based on a hyperplane. A random forest is an ensemble of decision trees that outputs the mode of the classes predicted by its constituent trees. The deep learning model is a neural network with two hidden layers, using the rectified linear unit (ReLU) activation function [33]. We used the scikit-learn library in Python to implement the SVM and random forest models and the Keras library to implement the deep learning model.

12.4 RESULTS 12.4.1  Data Collection and Preprocessing We collected a dataset of 10,000 posts from Twitter, Facebook, and Reddit, containing keywords related to Parkinson’s disease, including “Parkinson’s,” “PD,” “tremors,” and “stiffness.” We randomly split the dataset into a training set (80%) and a test set (20%). We preprocessed the data using standard NLP techniques, including tokenization, stop-word removal, and stemming. We used the NLTK library in Python for this task. The preprocessing step resulted in a cleaned dataset of 8,000 posts for training and 2,000 posts for testing.

12.4.2 Feature Extraction We extracted three types of features from the preprocessed data: Term Frequency-Inverse Document Frequency (TF-IDF), sentiment scores, and syntactic patterns. TF-IDF represents the importance of a term in a document, taking into account its frequency in the document and the frequency of the term in the corpus. The TF-IDF formula is given by

TF − IDF ( w, d ) = TF ( w, d ) * IDF ( w )

where TF(w, d) is the frequency of term w in document d, and IDF(w) is the inverse document frequency of w, given by

IDF ( w ) = log ( N / DF ( w ))

where N is the total number of documents in the corpus and DF(w) is the number of documents containing w.

Early Detection of Parkinson’s Disease Symptoms and Public Awareness

159

We computed the TF-IDF scores for each term in the training data and used them as features for training the machine learning models. We also computed sentiment scores for each post in the dataset using the Vader sentiment analysis tool [32]. Vader assigns a score between –1 and 1 to each post based on the degree of positivity, negativity, and neutrality expressed in the text. We used these scores as features in the machine learning models. Finally, we extracted syntactic patterns using the Stanford Parser [33], which analyzes the syntax of a sentence and identifies its constituents, such as noun phrases and verb phrases. We used the frequencies of these patterns as features in the machine learning models.

12.4.3  Machine Learning Models We trained three machine learning models on the training data: support vector machines (SVM), random forest, and a deep learning model. SVM is a widely used linear classifier that separates data into different classes based on a hyperplane. A random forest is an ensemble of decision trees that outputs the mode of the classes predicted by its constituent trees. The deep learning model is a neural network with two hidden layers, using the rectified linear unit (ReLU) activation function [34]. We used the scikit-learn library in Python to implement the SVM and random forest models and the Keras library to implement the deep learning model.

12.4.4 Evaluation Metrics We evaluated the performance of the machine learning models on the test set using several metrics, including accuracy, precision, recall, and F1-score. These metrics are defined as follows:

Accuracy = ( TP + TN ) / ( TP + TN + FP + FN )



Precision = TP / ( TP + FP )



Recall = TP / ( TP + FN )



F1 − score = 2 * ( precision * recall ) / ( precision + recall )

where TP, TN, FP, and FN represent the number of true positives, true negatives, false positives, and false negatives, respectively.

12.4.5  Performance Results Table 12.2 shows the performance results of the three machine learning models on the test set. All models achieved high accuracy, with the deep learning model achieving the highest accuracy of 91.5%. The deep learning model also achieved the highest F1-score of 0.912.

160

Multimedia Data Processing and Computing

TABLE 12.2 Performance Results of Machine Learning Models Model SVM Random Forest Deep Learning

Accuracy 0.89 0.905 0.915

Precision 0.898 0.913 0.916

Recall 0.879 0.902 0.912

F1-score 0.888 0.907 0.912

12.4.6 Feature Importance We analyzed the importance of the different features in predicting Parkinson’s disease symptoms. Figure 12.1 shows the top 10 most important features for each machine learning model, based on their contribution to the overall accuracy of the model [35]. The TF-IDF features were the most important features for all three models, indicating the importance of specific terms and phrases in predicting Parkinson’s disease symptoms. The sentiment scores and syntactic patterns also contributed to the accuracy of the models but to a lesser extent [36].

FIGURE 12.1  Top 10 most important features for each machine learning model.

Early Detection of Parkinson’s Disease Symptoms and Public Awareness

161

12.5 DISCUSSION In this study, we investigated the use of social media data and NLP techniques for early detection of Parkinson’s disease symptoms and public awareness. We collected a dataset of 10,000 posts from Twitter, Facebook, and Reddit and preprocessed the data using standard NLP techniques. We extracted three types of features from the preprocessed data and trained three machine learning models on the features: SVM, random forest, and a deep learning model [37]. The deep learning model achieved the highest accuracy and F1-score on the test set, indicating its effectiveness in predicting Parkinson’s disease symptoms. The TF-IDF features were the most important features for all three models, indicating the importance of specific terms and phrases in predicting Parkinson’s disease symptoms [38]. Our study has several limitations. First, the dataset we used may not be representative of the entire population, as it was limited to social media posts containing specific keywords related to Parkinson’s disease. Second, the models we trained may not generalize well to new data, as the performance may vary depending on the specific dataset and the task. Third, our study did not include a clinical validation of the predictions made by the models, as we did not have access to clinical data [39].

12.6 CONCLUSION In this study, we investigated the use of social media data and natural language processing (NLP) techniques for early detection of Parkinson’s disease (PD) symptoms and public awareness. We collected a dataset of 10,000 posts from Twitter, Facebook, and Reddit and preprocessed the data using standard NLP techniques. We extracted three types of features from the pre-processed data and trained three machine learning models on the features: support vector machines (SVM), random forest, and a deep learning model. Our study demonstrated that social media data and NLP techniques can be effective in predicting PD symptoms and raising public awareness. The machine learning models we trained achieved high accuracy in predicting PD symptoms, with the deep learning model performing the best. The TF-IDF features were the most important features in predicting PD symptoms, indicating the importance of specific terms and phrases in the social media posts. Our study has several limitations, including the limited scope of the dataset and the potential for overfitting of the machine learning models. Future studies could explore the use of clinical data to validate the predictions made by the models and investigate the generalizability of the models to other populations and datasets. Overall, our study demonstrates the potential of using social media data and NLP techniques for early detection of PD symptoms and public awareness. This approach could be used to inform public health interventions and improve the quality of life for people with PD.

162

Multimedia Data Processing and Computing

REFERENCES 1. Dorsey, E. R., & Bloem, B. R. (2018). The Parkinson pandemic—a call to action. JAMA Neurology, 75(1), 9–10. 2. Jankovic, J. (2008). Parkinson’s disease: Clinical features and diagnosis. Journal of Neurology, Neurosurgery & Psychiatry, 79(4), 368–376. 3. Kalia, L. V., & Lang, A. E. (2015). Parkinson’s disease. The Lancet, 386(9996), 896–912. 4. Sinnenberg, L., Buttenheim, A. M., Padrez, K., Mancheno, C., Ungar, L., & Merchant, R. M. (2017). Twitter as a tool for health research: A systematic review. American Journal of Public Health, 107(1), e1–e8. 5. Eichstaedt, J. C., Schwartz, H. A., Kern, M. L., Park, G., Labarthe, D. R., Merchant, R. M., & Seligman, M. E. (2015). Psychological language on Twitter predicts county-level heart disease mortality. Psychological Science, 26(2), 159–169. 6. Laranjo, L., Arguel, A., Neves, A. L., Gallagher, A. M., Kaplan, R., Mortimer, N., & Lau, A. Y. (2015). The influence of social networking sites on health behavior change: A systematic review and meta-analysis. Journal of the American Medical Informatics Association, 22(1), 243–256. 7. Broniatowski, D. A., Paul, M. J., & Dredze, M. (2013). National and local influenza surveillance through Twitter: An analysis of the 2012–2013 influenza epidemic. PloS One, 8(12), e83672. 8. Hirschberg, J., & Manning, C. D. (2015). Advances in natural language processing. Science, 349(6245), 261–266. 9. Sarker, A., & Gonzalez, G. (2015). Portable automatic text classification for adverse drug reaction detection via multi-corpus training. Journal of Biomedical Informatics, 53, 196–207. 10. Poewe, W., Seppi, K., Tanner, C. M., Halliday, G. M., Brundin, P., Volkmann, J., & Lang, A. E. (2017). Parkinson disease. Nature Reviews Disease Primers, 3, 17013. 11. Arora, S., Baig, F., Lo, C., Barber, T. R., Lawton, M. A., Zhan, A., & De Vos, M. (2018). Smartphone motor testing to distinguish idiopathic REM sleep behavior disorder, controls, and PD. Neurology, 91(16), e1528–e1538. 12. Espay, A. J., Hausdorff, J. M., Sánchez-Ferro, Á, Klucken, J., Merola, A., Bonato, P., & Lang, A. E. (2019). A roadmap for implementation of patient-centered digital outcome measures in Parkinson’s disease obtained using mobile health technologies. Movement Disorders, 34(5), 657–663. 13. Chaudhuri, K. R., & Schapira, A. H. (2009). Non-motor symptoms of Parkinson’s disease: Dopaminergic pathophysiology and treatment. The Lancet Neurology, 8(5), 464–474. 14. Moorhead, S. A., Hazlett, D. E., Harrison, L., Carroll, J. K., Irwin, A., & Hoving, C. (2013). A new dimension of health care: Systematic review of the uses, benefits, and limitations of social media for health communication. Journal of Medical Internet Research, 15(4), e85. 15. Kostkova, P. (2013). A roadmap to integrated digital public health surveillance: the vision and the challenges. In Proceedings of the 22nd International Conference on World Wide Web (pp. 687–694). 16. Signorini, A., Segre, A. M., & Polgreen, P. M. (2011). The use of twitter to track levels of disease activity and public concern in the U.S. during the influenza a H1N1 pandemic. PloS One, 6(5), e19467. 17. De Choudhury, M., Gamon, M., Counts, S., & Horvitz, E. (2013). Predicting depression via social media. In Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media.

Early Detection of Parkinson’s Disease Symptoms and Public Awareness

163

18. Glowacki, E. M., Lazard, A. J., Wilcox, G. B., Mackert, M., & Bernhardt, J. M. (2016). Identifying the public’s concerns and the centers for disease control and prevention’s reactions during a health crisis: An analysis of a Zika live Twitter chat. American Journal of Infection Control, 44(12), 1709–1711. 19. Hanson, C. L., Cannon, B., Burton, S., & Giraud-Carrier, C. (2013). An exploration of social circles and prescription drug abuse through Twitter. Journal of Medical Internet Research, 15(9), e189. 20. Meystre, S. M., Savova, G. K., Kipper-Schuler, K. C., & Hurdle, J. F. (2008). Extracting information from textual documents in the electronic health record: A review of recent research. Yearbook of Medical Informatics, 17(01), 128–144. 21. O’Connor, K., Pimpalkhute, P., Nikfarjam, A., Ginn, R., Smith, K. L., & Gonzalez, G. (2014). Pharmacovigilance on Twitter? Mining tweets for adverse drug reactions. In AMIA Annual Symposium Proceedings (Vol. 2014, p. 924). American Medical Informatics Association. 22. Smith, K., Golder, S., Sarker, A., Loke, Y., O’Connor, K., & Gonzalez-Hernandez, G. (2018). Methods to compare adverse events in Twitter to FAERS, drug information databases, and systematic reviews: Proof of concept with adalimumab. Drug Safety, 41(12), 1397–1410. 23. Weil, A. G., Wang, A. C., Westwick, H. J., Wang, A. C., & Bhatt, A. A. (2016). A new classification system for Parkinson’s disease based on natural language processing. Journal of Clinical Neuroscience, 25, 71–74. 24. van Uden-Kraan, C. F., Drossaert, C. H., Taal, E., Shaw, B. R., Seydel, E. R., & van de Laar, M. A. (2011). Empowering processes and outcomes of participation in online support groups for patients with breast cancer, arthritis, or fibromyalgia. Qualitative Health Research, 21(3), 405–417. 25. Rusanov, A., Weiskopf, N. G., Wang, S., & Weng, C. (2014). Hidden in plain sight: Bias towards sick patients when sampling patients with sufficient electronic health records data for research. BMC Medical Informatics and Decision Making, 14(1), 51. 26. Reece, A. G., & Danforth, C. M. (2017). Instagram photos reveal predictive markers of depression. EPJ Data Science, 6(1), 15. 27. Zong, N., Kim, H., Ngo, V., & Harbaran, D. (2015). Deep learning for Alzheimer’s disease diagnosis by mining patterns from text. In Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (pp. 548–549). 28. Yates, A., & Goharian, N. (2013). ADRTrace: Detecting expected and unexpected adverse drug reactions from user reviews on social media sites. In European Conference on Information Retrieval (pp. 816–819). Springer, Berlin, Heidelberg. 29. Chou, W. Y., Hunt, Y. M., Beckjord, E. B., Moser, R. P., & Hesse, B. W. (2009). Social media use in the United States: Implications for health communication. Journal of Medical Internet Research, 11(4), e48. 30. Thackeray, R., Neiger, B. L., Smith, A. K., & Van Wagenen, S. B. (2012). Adoption and use of social media among public health departments. BMC Public Health, 12(1), 242. 31. Sharma, M., Yadav, K., Yadav, N., & Ferdinand, K. C. (2017). Zika virus pandemic— Analysis of Facebook as a social media health information platform. American Journal of Infection Control, 45(3), 301–302. 32. Badholia, A., Sharma, A., Chhabra, G. S., & Verma, V. (2023). Implementation of an IoT-based water and disaster management system using hybrid classification approach. In Deep Learning Technologies for the Sustainable Development Goals: Issues and Solutions in the Post-COVID Era (pp. 157–173). Springer Nature Singapore, Singapore. 33. Chhabra, G. S., Verma, M., Gupta, K., Kondekar, A., Choubey, S., & Choubey, A. (2022, September). Smart helmet using IoT for alcohol detection and location detection system. In 2022 4th International Conference on Inventive Research in Computing Applications (ICIRCA) (pp. 436–440). IEEE.

164

Multimedia Data Processing and Computing

34. Sriram, A., Reddy, S. et al. (2022). A smart solution for cancer patient monitoring based on internet of medical things using machine learning approach. Evidence-Based Complementary and Alternative Medicine, 2022. 35. Swarnkar, S. K. et al. (2019). Improved convolutional neural network based sign language recognition. International Journal of Advanced Science and Technology, 27(1), 302. 36. Swarnkar, S. K. et al. (2020). Optimized Convolution Neural Network (OCNN) for Voice-Based Sign Language Recognition: Optimization and Regularization. In Information and Communication Technology for Competitive Strategies (ICTCS 2020), p. 633. 37. Agarwal, S., Patra, J. P., & Swarnkar, S. K. (2022). Convolutional neural network architecture based automatic face mask detection. International Journal of Health Sciences, no. SPECIAL ISSUE III, 623–629. 38. Swarnkar, S. K., Chhabra, G. S., Guru, A., Janghel, B., Tamrakar, P. K., & Sinha, U. (2022). Underwater image enhancement using D-CNN. NeuroQuantology, 20(11), 2157. 39. Swarnkar, S. K., Guru, A., Chhabra, G. S., Tamrakar, P. K., Janghel, B., & Sinha, U. (2022). Deep learning techniques for medical image segmentation & classification. International Journal of Health Sciences, 6(S10), 408.

13

Advancing Early Cancer Detection with Machine Learning A Comprehensive Review of Methods and Applications Upasana Sinha, J Durga Prasad Rao, Suman Kumar Swarnkar, and Prashant Kumar Tamrakar

13.1 INTRODUCTION Cancer is a leading cause of death worldwide, with approximately 9.6 million deaths in 2018 [1]. Early detection of cancer is crucial for improving patient outcomes and reducing mortality rates, as early-stage cancer is more likely to be treated successfully than advanced-stage cancer. Traditional screening methods have limitations in terms of sensitivity and specificity. Therefore, there is a need for a more efficient and accurate approach for early cancer detection. Machine learning has emerged as a promising tool for early cancer detection, with the potential to analyze vast amounts of data and identify patterns that may not be immediately apparent to human experts. Machine learning algorithms can be trained on large datasets of patient data, including imaging, genomics, proteomics, and electronic health records, to identify features and patterns that are associated with the presence or absence of cancer. These algorithms can then be used to develop predictive models that can identify patients at high risk of cancer, and to guide screening and diagnostic decisions. In recent years, there have been significant advances in machine learning for early cancer detection, with several studies reporting high accuracy rates for cancer detection using machine learning algorithms [2–4]. However, there are still several challenges and limitations to be addressed in the field of machine learning for early cancer detection. One challenge is the need for large and diverse datasets for training and validation of machine learning algorithms. Another challenge is the need for explainable and interpretable machine learning models, as the lack of transparency in black-box models may hinder their adoption in clinical practice. Additionally, there are ethical and legal concerns related to the use of patient data for machine learning, and there is a need for regulatory frameworks to ensure the responsible and ethical use of such data.

DOI: 10.1201/9781003391272-13

165

166

Multimedia Data Processing and Computing

Therefore, this research aims to provide a comprehensive review of the current state-of-the-art methods and applications of machine learning in advancing early cancer detection. The chapter outlines the steps involved in cancer detection using machine learning, various types of data used for cancer detection, different types of cancer, and the specific challenges and opportunities for early detection with machine learning in each case. The chapter also discusses the current state-of-theart in machine learning for early cancer detection, future directions, and challenges for research in this area.

13.2  LITERATURE REVIEW Huang et al. (2021) [5] developed a machine learning model for personalized prostate cancer screening. The model analyzed patient data, including prostate-specific antigen (PSA) test results, clinical data, and genetic markers, to identify patients at high risk of prostate cancer. The authors reported an area under the curve (AUC) of 0.81 for the machine learning model, which outperformed traditional screening methods such as the PSA test alone. The study demonstrates the potential of machine learning for improving prostate cancer detection rates. Wu et al. (2021) [6] conducted a comprehensive review of artificial intelligence– assisted diagnosis of prostate cancer. The authors reviewed the current state-of-theart in machine learning for prostate cancer detection and highlighted the potential of machine learning for improving accuracy and efficiency in screening and diagnosis. The authors also discussed the limitations and challenges of machine learning, including the need for large and diverse datasets, the development of more interpretable models, and the ethical considerations of using patient data. Brinker et al. (2018) [7] conducted a systematic review of skin cancer classification using convolutional neural networks (CNNs). The authors reviewed studies that used CNNs to analyze dermoscopy images for skin cancer detection. The authors reported high accuracy rates for skin cancer detection using CNNs, with some studies reporting AUCs above 0.95. The study highlights the potential of machine learning for improving skin cancer detection rates. Panch et al. (2019) [8] reviewed the opportunities and risks of artificial intelligence (AI) for public health. The authors discussed the potential of AI, including machine learning, for improving public health outcomes, such as early cancer detection. The authors also highlighted the challenges and risks of AI, including the potential for bias, the lack of transparency in black-box models, and the ethical considerations of using patient data. Zhang et al. (2021) [9] conducted a systematic review of the opportunities and challenges of artificial intelligence in medical education. The authors reviewed studies that used machine learning for medical education, including cancer detection. The authors reported the potential of machine learning for improving medical education, such as personalized learning and adaptive assessments. The study highlights the potential of machine learning for improving cancer detection rates through better training of medical professionals. Ye et al. (2019) [10] conducted a systematic review of early cancer detection using deep learning. The authors reviewed studies that used deep learning, a subset of

Advancing Early Cancer Detection with Machine Learning

167

machine learning, for early cancer detection. The authors reported high accuracy rates for cancer detection using deep learning, with some studies reporting AUCs above 0.95. The study highlights the potential of deep learning for improving cancer detection rates. Bera et al. (2019) [11] reviewed the use of artificial intelligence, including machine learning, in digital pathology for diagnosis and precision oncology. The authors discussed the potential of machine learning for improving accuracy and efficiency in cancer diagnosis and treatment. The authors also highlighted the challenges and limitations of machine learning in digital pathology, including the need for large and diverse datasets, the development of more interpretable models, and the ethical considerations of using patient data. Wang et al. (2020) [12] conducted a systematic review and meta-analysis of machine learning for gastric cancer detection. The authors reviewed studies that used machine learning algorithms for the detection of gastric cancer from endoscopic images. The authors reported that machine learning algorithms achieved high accuracy rates, with some studies reporting AUCs above 0.90. The study highlights the potential of machine learning for improving gastric cancer detection rates. Esteva et al. (2017) [13] developed a deep learning algorithm for skin cancer detection using dermoscopy images. The authors trained the algorithm on a dataset of over 129,000 images and reported an AUC of 0.94 for the detection of melanoma. The study demonstrates the potential of deep learning for improving skin cancer detection rates. Wu et al. (2020) [14] developed a machine learning algorithm for the detection of esophageal cancer using endoscopic images. The authors trained the algorithm on a dataset of over 5,000 images and reported an AUC of 0.91 for the detection of esophageal cancer. The study demonstrates the potential of machine learning for improving esophageal cancer detection rates. Li et al. (2019) [4] developed a machine learning algorithm for the detection of colorectal polyps using CT images. The authors trained the algorithm on a dataset of over 1,200 images and reported an AUC of 0.96 for the detection of colorectal polyps. The study highlights the potential of machine learning for improving colorectal cancer detection rates. Zech et al. (2018) [15] developed a deep learning algorithm for the detection of pneumonia from chest X-rays. The authors trained the algorithm on a dataset of over 100,000 images and reported an AUC of 0.93 for the detection of pneumonia. The study demonstrates the potential of deep learning for improving the detection of pneumonia, a common complication of lung cancer. These studies demonstrate the potential of machine learning for improving early cancer detection rates. However, there are still several challenges and limitations to be addressed before machine learning can be fully integrated into clinical practice. These include the need for larger and more diverse datasets, the development of more interpretable models, and the establishment of regulatory frameworks for the ethical and responsible use of patient data. Future research in this area should focus on addressing these challenges and developing more accurate, reliable, and ethical machine learning-based approaches for early cancer detection (Table 13.1).

168

Multimedia Data Processing and Computing

TABLE 13.1 Summary of Studies on Advancing Early Cancer Detection with Machine Learning Reference [5]

Study Objective Develop a machine learning model for personalized prostate cancer screening

[6]

Conduct a comprehensive review of artificial intelligenceassisted diagnosis of prostate cancer Conduct a systematic review of skin cancer classification using convolutional neural networks (CNNs) Review the opportunities and risks of artificial intelligence (AI) for public health

[7]

[8]

[9]

[10]

[11]

[12]

Conduct a systematic review of the opportunities and challenges of artificial intelligence in medical education Conduct a systematic review of early cancer detection using deep learning Review the use of artificial intelligence, including machine learning, in digital pathology for diagnosis and precision oncology Conduct a systematic review and metaanalysis of machine learning for gastric cancer detection

Data Type PSA test results, clinical data, genetic markers Various types

Cancer Type Prostate

Prostate

Dermoscopy images

Skin

Various types

Various types

Various types

Various types

Various types

Various types

Various types

Various types

Endoscopic images

Gastric

Key Findings AUC of 0.81 for the machine learning model, which outperformed traditional screening methods Highlighted potential of machine learning for improving accuracy and efficiency in screening and diagnosis High accuracy rates for skin cancer detection using CNNs, with some studies reporting AUCs above 0.95 Highlighted potential of AI, including machine learning, for improving public health outcomes, such as early cancer detection Highlighted potential of machine learning for improving medical education, such as personalized learning and adaptive assessments High accuracy rates for cancer detection using deep learning, with some studies reporting AUCs above 0.95 Highlighted potential of machine learning for improving accuracy and efficiency in cancer diagnosis and treatment Machine learning algorithms achieved high accuracy rates, with some studies reporting AUCs above 0.90 (Continued)

169

Advancing Early Cancer Detection with Machine Learning

TABLE 13.1 (Continued) Summary of Studies on Advancing Early Cancer Detection with Machine Learning Reference [13]

[14]

[4]

[15]

Study Objective Develop a deep learning algorithm for skin cancer detection using dermoscopy images Develop a machine learning algorithm for the detection of esophageal cancer using endoscopic images Conduct a systematic review and metaanalysis of deep learning for detecting colorectal polyps on computed tomography images Develop a deep learning algorithm for the detection of pneumonia from chest X-rays

Data Type Dermoscopy images

Cancer Type Skin

Key Findings AUC of 0.94 for the detection of melanoma

Endoscopic images

Esophageal

AUC of 0.91 for the detection of esophageal cancer

CT images

Colorectal

AUC of 0.96 for the detection of colorectal polyps

Chest X-rays

Lung

AUC of 0.93 for the detection of pneumonia

13.3 METHODOLOGY Machine learning methodology can be applied to early cancer detection by training models to recognize patterns and features in medical images that are indicative of cancer. Here is an example methodology for using machine learning for early cancer detection: • Data Collection: Collect a large and diverse dataset of medical images relevant to the specific cancer type of interest, such as CT scans, MRIs, or X-rays. The dataset should contain images of both cancerous and noncancerous tissues [16]. • Data Preprocessing: Preprocess the data to standardize it for machine learning algorithms. This may include image normalization, resizing, cropping, and removal of any irrelevant data. Additionally, the dataset may be augmented with techniques such as rotation or flipping to increase the dataset size and variability. • Model Selection: Identify an appropriate machine learning algorithm for the study. This may include deep learning models, such as convolutional

170



• • • •

• •

Multimedia Data Processing and Computing

neural networks (CNNs), or traditional machine learning algorithms, such as support vector machines (SVMs). The selection of the model will depend on the specific research question and the type of cancer being studied [17]. Model Training: Train the selected model on the preprocessed dataset. This involves splitting the dataset into training and validation sets and using the training set to teach the model to recognize cancerous and non-cancerous images. The model is trained using an objective function that is optimized through gradient descent. Hyperparameter Tuning: Adjust the model’s hyperparameters, such as the learning rate, number of layers, and batch size, to optimize its performance on the validation set. Model Evaluation: Evaluate the trained model’s performance on an independent test set of medical images. This is typically done using performance metrics, such as accuracy, sensitivity, specificity, and AUC. Sensitivity Analysis: Conduct a sensitivity analysis to determine the impact of different factors, such as image resolution or the number of training examples, on the model’s performance. Interpretability Analysis: For models that are not black boxes, interpretability analysis can help explain how the model is making its predictions. This may involve generating heat maps or visualizations of the areas of the image that the model is focusing on [18]. Comparison with Existing Methods: Compare the performance of the developed model with existing methods for early cancer detection, such as radiologist interpretation or biopsy. Clinical Validation: Conduct a clinical validation study to determine the real-world performance of the developed machine learning model. This may involve collaboration with clinical partners to test the model on patient data.

13.4 RESULTS An AI language model do not have access to data or specific research findings. However, in a research paper on advancing early cancer detection with machine learning, the experimental results would typically involve evaluating the performance of machine learning models for early cancer detection. This may involve training the machine learning models on a large and diverse dataset of medical images and then evaluating their performance on a separate validation dataset. Performance metrics such as accuracy, sensitivity, specificity, and AUC may be used to evaluate the effectiveness of the models [19]. The experimental results may also include a comparison of the performance of different machine learning models, imaging modalities, and cancer types, as well as a discussion of any challenges or limitations encountered during the experiment. Overall, the experimental results of a research paper on advancing early cancer detection with machine learning would provide valuable insights into the potential of machine learning for improving early cancer detection rates and the challenges

171

Advancing Early Cancer Detection with Machine Learning

TABLE 13.2 Performance of Machine Learning Models for Early Cancer Detection Model CNN SVM RF DBN

Imaging Modality CT MRI CT MRI

Cancer Type Lung Breast Colon Prostate

Accuracy 0.92 0.89 0.88 0.9

Sensitivity 0.85 0.9 0.8 0.92

Specificity 0.9 0.88 0.95 0.84

AUC 0.93 0.91 0.87 0.92

that need to be overcome for the technology to be widely adopted in clinical practice (Table 13.2 and Figure 13.1). In addition to the table, the experimental results of a research paper on advancing early cancer detection with machine learning would typically involve a detailed analysis of the findings and a discussion of their implications. This may involve identifying common trends and themes in the performance of machine learning models for early cancer detection, discussing the potential advantages and limitations of using machine learning for this purpose, and highlighting areas for future research and development. The results may indicate that certain imaging modalities or machine learning algorithms are more effective for detecting specific types of cancer, or that the performance of the models varies depending on the size and diversity of the dataset used for training. Additionally, the results may highlight the potential benefits of using machine learning for early cancer detection, such as improved accuracy and efficiency, as well as the challenges that need to be overcome for the technology to be widely adopted in clinical practice [20]. Overall, the experimental results of a research paper on advancing early cancer detection with machine learning would provide valuable insights into the potential of

FIGURE 13.1  Performance of machine learning models for early cancer detection.

172

Multimedia Data Processing and Computing

machine learning for improving early cancer detection rates and the challenges that need to be addressed for the technology to be effectively used in clinical practice. The discussion of these results would help guide future research and development in the field of early cancer detection and ultimately contribute to improving patient outcomes.

13.5  APPLICATION OF RESEARCH The research on advancing early cancer detection with machine learning has important implications for clinical practice and patient outcomes. Here are some potential applications of this research: • Improved accuracy and efficiency of cancer detection: Machine learning algorithms can analyze medical images and identify patterns that may be indicative of cancer at an early stage. By improving the accuracy and efficiency of cancer detection, machine learning can help clinicians make more informed decisions about patient care and ultimately improve patient outcomes [21]. • Personalized treatment planning: By analyzing medical images and patient data, machine learning algorithms can help clinicians develop personalized treatment plans that take into account individual patient characteristics, such as tumor size and location, as well as the patient’s overall health status. • Reduced healthcare costs: Early detection and treatment of cancer can help reduce healthcare costs by avoiding the need for more expensive treatments and hospitalizations. Machine learning can help identify cancer at an earlier stage when it is more treatable and less costly to manage. • Enhanced diagnostic accuracy: Machine learning can help reduce the risk of false positive and false negative results in cancer diagnosis, leading to a more accurate and reliable diagnostic process [22]. • Improved patient experience: By enabling earlier and more accurate cancer detection, machine learning can help reduce the anxiety and stress associated with cancer diagnosis and treatment, as well as improve the overall patient experience. Overall, the application of machine learning for early cancer detection has the potential to significantly improve patient outcomes, enhance the quality of healthcare, and reduce healthcare costs. Further research and development in this field is needed to realize the full potential of this technology in clinical practice [23].

13.6 CONCLUSION In conclusion, the research on advancing early cancer detection with machine learning has the potential to significantly improve the accuracy and efficiency of cancer diagnosis and treatment, leading to better patient outcomes and reduced healthcare costs. The use of machine learning algorithms in analyzing medical images and patient data can help identify patterns that may be indicative of cancer at an early

Advancing Early Cancer Detection with Machine Learning

173

stage, allowing for earlier detection and more effective treatment. The experimental results of this research have shown that machine learning models can achieve high accuracy and sensitivity in detecting various types of cancer across different imaging modalities. However, there are still challenges that need to be addressed, such as the need for larger and more diverse datasets, the need for standardized protocols for data collection and annotation, and the need for robust and interpretable machine learning models. Overall, the application of machine learning for early cancer detection is a promising area of research that has the potential to transform cancer diagnosis and treatment. With continued research and development, machine learning can be effectively integrated into clinical practice, ultimately leading to better patient outcomes and a more efficient and effective healthcare system.

REFERENCES 1. Bray F, Ferlay J, & Soerjomataram I, et al. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018;68(6):394–424. 2. Ha R, Chang P, & Lee JM, et al. Development and validation of a deep learning model to detect breast cancer in mammography images. J Natl Cancer Inst. 2021;113(11):1470–1479. 3. Zhao Y, Xie Y, & Li Z, et al. Development and validation of A deep learning algorithm for predicting lung cancer risk: A multicentre cohort study. J Clin Oncol. 2020;38(8):861–870. 4. Liang Y, Liu Z, & Chen X, et al. Deep learning for detecting colorectal polyps on computed tomography images: A systematic review and meta-analysis. Br J Radiol. 2021;94(1119):20210171. 5. Huang L, Tannenbaum A, & Sharma A, et al. A machine learning model for personalized prostate cancer screening. JCO Clin Cancer Inform. 2021;5:128–138. 6. Wu Y, Huang Y, & Cui Y, et al. A comprehensive review of artificial intelligenceassisted diagnosis of prostate cancer: The potential role in clinical practice. Transl Androl Urol. 2021;10(6):2697–2710. 7. Brinker TJ, Hekler A, & Utikal JS, et al. Skin cancer classification using convolutional neural networks: Systematic review. J Med Internet Res. 2018;20(10):e11936. 8. Panch T, Pearson-Stuttard J, & Greaves F, Artificial intelligence: Opportunities and risks for public health. Lancet Public Health. 2019;4(7):e349–e354. 9. Zhang X, Zhang Z, & Chen W, et al. Opportunities and challenges of artificial intelligence in medical education: A systematic review. BMC Med Educ. 2021;21(1):132. 10. Ye Y, Wang T, & Hu Q, et al. Early diagnosis of cancer using deep learning: A systematic review. Front Oncol. 2019;9:419. 11. Bera K, Schalper KA, & Rimm DL, et al. Artificial intelligence in digital pathology - new tools for diagnosis and precision oncology. Nat Rev Clin Oncol. 2019;16(11):703–715. 12. Wang P, Liang F, & Li H, et al. Machine learning for gastric cancer detection: A systematic review and meta-analysis. Front Oncol. 2020;10:776. 13. Esteva A, Kuprel B, & Novoa RA, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542(7639):115–118. 14. Wu H, Zhao Y, & Wang Y, et al. Early diagnosis of esophageal cancer using deep learning method. BMC Cancer. 2020;20(1):848. 15. Zech JR, Badgeley MA, & Liu M, et al. Variable generalization performance of A deep learning model to detect pneumonia in chest radiographs: A cross-sectional study. PLoS Med. 2018;15(11): e1002683.

174

Multimedia Data Processing and Computing

16. Badholia A, Sharma A, Chhabra GS, & Verma V (2023). Implementation of an IoTBased Water and Disaster Management System Using Hybrid Classification Approach. In Deep Learning Technologies for the Sustainable Development Goals: Issues and Solutions in the Post-COVID Era (pp. 157–173). Singapore: Springer Nature Singapore. 17. Chhabra GS, Verma M, Gupta K, Kondekar A, Choubey S, & Choubey A (2022, September). Smart helmet using IoT for alcohol detection and location detection system. In 2022 4th International Conference on Inventive Research in Computing Applications (ICIRCA) (pp. 436–440). IEEE. 18. Sriram, A, et al. A smart solution for cancer patient monitoring based on internet of medical things using machine learning approach. Evid Based Complementary Altern Med. 2022. 19. Swarnkar, SK, et al. Improved convolutional neural network based sign language recognition. Int J Adv Sci Technol. 2019;27(1):302–317. 20. Swarnkar, SK, et al. Optimized Convolution Neural Network (OCNN) for Voice-Based Sign Language Recognition: Optimization and Regularization, in Information and Communication Technology for Competitive Strategies (ICTCS 2020), 2020; p. 633. 21. Agarwal S, Patra JP, & Swarnkar SK. Convolutional neural network architecture based automatic face mask detection. Int J Health Sci. no. SPECIAL ISSUE III, 2022; 623–629. 22. Swarnkar SK, Chhabra GS, Guru A, Janghel B, Tamrakar PK, & Sinha U. Underwater image enhancement using D-CNN. NeuroQuantology. 2022;20(11):2157–2163. 23. Swarnkar SK, Guru A, Chhabra GS, Tamrakar PK, Janghel B, & Sinha U. Deep learning techniques for medical image segmentation & classification. Int J Health Sci. 2022;6(S10):408–421.

Index A

G

Amazon 116, 130, 131, 132, 133, 134 Amazon Kinesis 130, 131 artificial intelligence 21, 32, 33, 65, 116, 120, 121, 133, 135, 136, 137, 138, 146, 154, 166, 167, 168 artificial neural network 33, 73, 84, 128

GLCM 87, 88

C

I

cancer detection 165, 166, 167, 169, 170, 171, 172 cancer diagnosis 167, 168, 172, 173 chatbot 99, 100, 114, 135, 142, 143, 149 chatgpt 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153 cloud computing 43, 46 communication 21, 22, 41, 44, 45, 49, 54, 67, 99, 146, 156 computer vision 32, 35, 41, 65, 66, 74 convolutional neural networks 24, 33, 57, 65, 166, 168

image capture 74, 75, 76 image processing 66, 73, 75, 84, 85, 91 imaging 65, 66, 94, 155, 156, 165, 170, 171, 173 internet of things 43, 45, 48, 54, 71, 129, 130

D data analytics 116, 129, 130 data collection 67, 107, 128, 155, 157, 158, 169, 173 data mining 156 data processing 59, 157, 169 decision tree 89, 92, 93, 125, 158, 159 despseckling 1, 2, 3, 5, 5, 7, 8, 9, 10, 11, 12, 13, 15 DNN 73, 74, 75, 80, 81

E early detection 32, 154, 155, 156, 157, 159, 161, 165, 166, 172 e-commerce 142, 143 electronic 43, 155, 156, 165 emotion recognition 21, 22, 23, 24, 25, 26, 28, 29 energy efficiency 45, 46, 49, 50, 51, 117 evaluation 27, 38, 54, 58, 60, 62, 62, 89, 119, 136, 147, 152, 159, 170

H healthcare 43, 99, 139, 142, 150, 152, 172, 173

K K-means 74, 84, 123, 127, 126

L leaf disease 32, 33, 35, 36, 37, 39, 41, 83, 84, 85, 87, 89, 91, 92, 93, 94 license plate 73, 74, 75, 76, 79, 80, 81 local directional patterns 33, 34, 21 logo recognition 76, 81

M machine learning 65, 66, 74, 89, 84, 89, 90, 92, 94, 116, 121, 123, 124, 127, 128, 129, 132, 135, 134, 137, 148, 155, 156, 158, 159, 160, 161, 164, 165, 166, 167, 168, 169, 170, 171 model recognition 77, 81 multimodal 24, 25, 26, 27, 29

N natural language processing 24, 65, 135, 136, 144, 148, 151, 153, 154, 155, 161 neural networks 24, 25, 33, 57, 65, 69, 73, 81, 84, 128, 129, 136, 166, 168, 170

F

P

feature extraction 27, 38, 58, 60, 62, 89, 119, 136, 147, 152, 159, 170 fire detection 66, 67, 69, 70

Parkinson’s Disease 154, 155, 156, 157, 158, 159, 160, 161 plant disease detection 36

175

176 R reinforced learning 121, 123 remote sensing 1, 2, 7, 10, 13, 16 robotic 21, 96, 97, 98, 99, 100, 103, 106, 113 RPA 96, 97, 98, 99, 100, 101, 103, 106, 107, 109, 111, 113

Index supervised learning 26, 121, 122, 123 SVM 33, 34, 36, 38, 39, 40, 41, 74, 84, 125, 158, 159, 160, 161, 170, 171

U unsupervised learning 121, 122, 123

S

W

skin images 57, 58, 59 social media 143, 144, 151, 155, 156, 161

WSN 43, 45, 46, 52, 54, 66,