143 54 16MB
English Pages 493
Algorithms for Intelligent Systems Series Editors: Jagdish Chand Bansal · Kusum Deep · Atulya K. Nagar
Surekha Lanka Antonio Sarasa-Cabezuelo Alexandru Tugui Editors
Trends in Sustainable Computing and Machine Intelligence Proceedings of ICTSM 2023
Algorithms for Intelligent Systems Series Editors Jagdish Chand Bansal, Department of Mathematics, South Asian University, New Delhi, Delhi, India Kusum Deep, Department of Mathematics, Indian Institute of Technology Roorkee, Roorkee, Uttarakhand, India Atulya K. Nagar, School of Mathematics, Computer Science and Engineering, Liverpool Hope University, Liverpool, UK
This book series publishes research on the analysis and development of algorithms for intelligent systems with their applications to various real world problems. It covers research related to autonomous agents, multi-agent systems, behavioral modeling, reinforcement learning, game theory, mechanism design, machine learning, metaheuristic search, optimization, planning and scheduling, artificial neural networks, evolutionary computation, swarm intelligence and other algorithms for intelligent systems. The book series includes recent advancements, modification and applications of the artificial neural networks, evolutionary computation, swarm intelligence, artificial immune systems, fuzzy system, autonomous and multi agent systems, machine learning and other intelligent systems related areas. The material will be beneficial for the graduate students, post-graduate students as well as the researchers who want a broader view of advances in algorithms for intelligent systems. The contents will also be useful to the researchers from other fields who have no knowledge of the power of intelligent systems, e.g. the researchers in the field of bioinformatics, biochemists, mechanical and chemical engineers, economists, musicians and medical practitioners. The series publishes monographs, edited volumes, advanced textbooks and selected proceedings. Indexed by zbMATH. All books published in the series are submitted for consideration in Web of Science.
Surekha Lanka · Antonio Sarasa-Cabezuelo · Alexandru Tugui Editors
Trends in Sustainable Computing and Machine Intelligence Proceedings of ICTSM 2023
Editors Surekha Lanka Department of Information Technology Stamford International University Bangkok, Thailand
Antonio Sarasa-Cabezuelo Department of Information Systems and Computing Complutense University of Madrid Madrid, Spain
Alexandru Tugui Department of Business Informatics Alexandru Ioan Cuza University Iasi Romania, Romania
ISSN 2524-7565 ISSN 2524-7573 (electronic) Algorithms for Intelligent Systems ISBN 978-981-99-9435-9 ISBN 978-981-99-9436-6 (eBook) https://doi.org/10.1007/978-981-99-9436-6 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore Paper in this product is recyclable.
We are honored to dedicate the proceedings of ICTSM 2023 to all the participants, organizers and editors of ICTSM 2023.
Preface
We, the editors of ICTSM 2023 are proud to present the conference proceedings of the International Conference on Trends in Sustainable Computing and Machine Intelligence (ICTSM 2023), organized on October 5–6, 2023, in Bangkok, Thailand. ICTSM 2023 is organized with an objective to leverage an international platform for the budding engineers, academicians, industrial experts as well as research enthusiasts, to present and discuss the state-of-the-art research progress in all the domains of intelligent and sustainable computing technologies and applications. The conference has provided equal opportunities for all the visited delegates to exchange their research experiences and ideas; create research connections and business relations; and explore the global opportunities for future research collaboration. We strongly believe that the conference outcomes will lead to gain significant knowledge expertise in the emerging fields of machine intelligence. The two-day conference event has been organized in association with the Research Centre of Stamford International University by including invited lectures, keynote talks and oral presentations to enrich the existing knowledge of conference participants and ignite them to conduct more impactful research. ICTSM 2023 has received 202 research manuscripts, out of which 41 papers were accepted by our technical reviewers. Among these, 33 papers have been finalized for publication. All the submitted manuscripts have undergone the peer-review process and reviewed by appropriate research experts selected by the technical conference program chairs. We would like to thank all the conference participants for their contribution to ICTSM 2023 proceedings. We thank the conference organizing committee for their assistance throughout the conference program. We sincerely thank all the reviewers for their continuous efforts in reviewing all the submitted manuscripts. We hope that all the conference participants and readers will gain a technically rewarding experience from the ICTSM 2023 proceedings. Bangkok, Thailand Madrid, Spain Iasi Romania, Romania
Dr. Surekha Lanka Dr. Antonio Sarasa-Cabezuelo Dr. Alexandru Tugui
vii
Contents
1
2
3
4
5
Design of a Boosting-Based Similarity Measure for Evaluating Gene Expression Using Learning Approaches . . . . . . . . . . . . . . . . . . . . K. Sai Dhanush, S. V. Sudha, Rohan Puchakayala, Chandrika Morthala, and Maganti Hemanth Baji Leveraging Linear Programming for Identification of Peripheral Blood Smear Malarial Parasitic Microscopic Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tamal Kumar Kundu, Dinesh Kumar Anguraj, and Nayana Shetty The AI’s Ethical Limitations from the Societal Perspective: An AI Algorithms’ Limitation? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alexandru Tugui Enhancement of Physical Layer Security for Relay Aided Multihop Communication Using Deep Transfer Learning Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. Vachana and K. Saraswathi Marine Vessel Trajectory Forecasting Using Long Short-Term Memory Neural Networks Optimized via Modified Metaheuristic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ana Toskovic, Aleksandar Petrovic, Luka Jovanovic, Nebojsa Bacanin, Miodrag Zivkovic, and Milos Dobrojevic
6
Application of Artificial Intelligence in Virtual Reality . . . . . . . . . . . . Derouech Oumaima, Lachgar Mohamed, Hrimech Hamid, and Hanine Mohamed
7
An Extension Application of 1D Wavelet Denoising Method for Image Denoising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Prasanta Kumar Sahoo, Debasis Gountia, Ranjan Kumar Dash, Siddhartha Behera, and Manas Kumar Nanda
1
13
27
33
51
67
87
ix
x
Contents
8
Improving Navigation Safety by Utilizing Statistical Method of Target Detection on the Background of Atmospheric Precipitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 M. Stetsenko, O. Melnyk, O. Onishchnko, V. Shevchenko, V. Sapiha, O. Vishnevska, and D. Vishnevskyi
9
Effects of Exogenous Factors and Bayesian-Bandit Hyperparameter Optimization in Traffic Forecast Analysis . . . . . . . 123 Lakshmi Priya Swaminatha Rao, Suresh Jaganathan, Sharan Giri, Snehapriya Murugan, and Sankaran Vaibhav
10 Development of Autoencoder and Variational Autoencoder for Image Recognition Using Convolutional Neural Network . . . . . . 139 Tetiana Filimonova, Oleg Pursky, Anna Selivanova, Tetiana Pidhorna, Tatiana Dubovyk, and Iryna Buchatska 11 A Semantic Web-Based Prototype Exercise—Video Game for Children with Anxiety and Juvenile Myoclonic Epilepsy and Its Usability Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 Sai Akhil Kakumanu, Patha Srija, Kambhampati Kodanda Sai Harshitha, Medipally Abinay, and Karnam Akhil 12 Design and Implementation of Tiny ML Model Using STM32F Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 Sreedhar Namratha, R. Bhagya, and R. Bharthi 13 Feature Selection Techniques for Building Robust Air Quality Prediction Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 V. Santhana Lakshmi and M. S. Vijaya 14 Performance Evaluation of Wi-Fi 6 Using Different Channel Estimation Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 N. K. Dhakshith, K. Saraswathi, and Praharsha Sirsi 15 Prevention of Animal Poaching Using Convolutional Neural Network-Based Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 Trisha Shishodiya, Omkar Rane, Param Kothari, and Sudhir Dhage 16 Applying Cognitive Science for Treating Children Suffering with Biological-Genetic Autoimmune Deficiency Through Vision of Computer Accessibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 Uppala Reshmitha, Shaik Afreen, and Karnam Akhil 17 Evaluation of User Interaction Technique with Holographic Projection in Extended Reality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 Fazliaty Edora Fadzli, Ajune Wanis Ismail, Goh Eg Su, and Suriati Sadimon
Contents
xi
18 Machine Learning Algorithms for Preventing and Detecting Diabetes Mellitus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 S. Deepa and B. Booba 19 Texture Features-Based Breast Cancer Detection Using Artificial Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 Khaled Almezhghwi, Morad Ali Hassan, Adel Ghadedo, Fairouz Belhaj, and Rabei Shwehdi 20 Use of Machine Learning Algorithms to Predict the Results of Soccer Matches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 Antonio Sarasa-Cabezuelo 21 From Classical to Quantum: Evolution of Information Retrieval Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299 Manan Mehta, Jason D’souza, Mahek Karia, Vedant Kadam, Mihir Lad, and S. Shanthi Therese 22 SDN-Based DDOS Attack Identification Using Random Forest Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 K. Radha and R. Parameswari 23 Sentiment Analysis Model Using Deep Learning . . . . . . . . . . . . . . . . . 329 Supriya Sameer Nalawade and Akshay Gajanan Bhosale 24 Methods for Securing Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341 Nozima Akhmedova and Komil Tashev 25 Depression Detection Based on NLP and ML Techniques Using Text and Speech Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357 Rathnakar Achary, Chetan J. Shelke, Virendra Kumar Shrivastava, P. Mano Paul, Shanti Konda, and Muralidhar Billa 26 A Comparative Study of Machine Learning Algorithms for Anomaly Detection in Industrial Environments: Performance and Environmental Impact . . . . . . . . . . . . . . . . . . . . . . . . 373 Álvaro Huertas-García, Carlos Martí-González, Rubén García Maezo, and Alejandro Echeverría Rey 27 Effect of Multimodal Metadata Augmentation on Classification Performance in Deep Learning . . . . . . . . . . . . . . . . . . . . 391 Yuri Gordienko, Maksym Shulha, and Sergii Stirenko 28 Face Recognition-Based Smart Attendance Monitoring System in Classroom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407 P. Pramod Kumar, R. Akshay, and K. Sagar
xii
Contents
29 FreeMove: Mobile Dependence Initiative . . . . . . . . . . . . . . . . . . . . . . . . 423 Kiran Ingale, Swati Sonone, Ketan Yalmate, Yash Lahoti, Tanmay Yadav, and Rahul Wagh 30 Efficient Melanoma Disease Detection by Using Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437 O. G. Manukumaar, Raghavendra Reddy, and Prabhuraj Metipatil 31 A Smart Solution for Safe and Efficient Traffic Flow for Bridge . . . . 449 Kiran Ingale, Riya Tambe, Shubham Thakur, Sanket Thite, Aditya Warke, and Kapil Wavale 32 Recognition of Aircraft Maneuvers Using Inertial Data . . . . . . . . . . . 465 Margarita Belousova, Stepan Lemak, and Ilya Kudryashov 33 A Deep Learning Framework for Sleep Apnea Detection . . . . . . . . . . 477 A. Sathiya, A. Sridevi, and K. G. Dharani Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493
About the Editors
Dr. Surekha Lanka has a doctoral degree in Science and Engineering from Himalayan University, India, and a master’s degree in Computer Science from Andhra University, India, Master of Technology in computer science and engineering from Nagarjuna University, India. She also has done master’s courses in wireless communication from University of Athens, Greece. She has 13 years’ experience in teaching in various countries Thailand, Saudi Arabia, India, Bahrain, and also worked as Research Assistant at the International Institute of Research and Management in India. Antonio Sarasa-Cabezuelo has a Ph.D. in Computer Science from the Complutense University of Madrid. In 1999, he joined the UCM as Professor, where he has been Assistant Professor, Associate Professor at Full time, Collaborating Professor, and since July 2013, he is Contracted Doctor professor. From the point of view of research, he has focused mainly on e-learning issues, standardization, digital heritage, and software development using techniques inspired by the design and implementation of computer languages. He is Member of the official Research Group of the UCM in Engineering of Software Languages and Applications (ILSA). He has participated as Researcher in nine projects of the national plans, and in ten projects of Educational Innovation, in five of them as their Director. The results of his research have been published in more than 70 articles in journals, lectures, and book chapters. Alexandru Tugui is Full Professor at the Department of Accounting, Business Informatics and Statistics at Alexandru Ioan Cuza University of Iasi (Romania). Currently, his research fields concern the limits of artificial intelligence, societal transformations, smart economy, calm technology, technological singularity, and research methods. His research appeared in Big Data and Cognitive Computing, International Journal of Computers Communications Control, Sustainability, Remote Sensing, Transformations in Business and Economics, Carpathian Journal of Mathematics, Futurist, Encyclopedia of Information Science and Technology, Encyclopedia of Multimedia Technology and Networking. He is Ph.D. Adviser in business informatics and a futurologist in society and technology. xiii
Chapter 1
Design of a Boosting-Based Similarity Measure for Evaluating Gene Expression Using Learning Approaches K. Sai Dhanush, S. V. Sudha, Rohan Puchakayala, Chandrika Morthala, and Maganti Hemanth Baji
1 Introduction GE occurs throughout biological activities in a variety of intensities and patterns. Each gene’s expression level is essential for a cell to operate properly. It might be helpful to follow these expression levels to understand better the dynamics of biological processes and the structure and operation of cells [1]. Gene arrays simultaneously monitor hundreds of genes’ messenger RNA (mRNA) expression levels and analyze expression levels. Gene arrays are used to record the patterns of GE in cells. Temporal variations in expression levels may aid in our understanding of biological system dynamics instead of using continuous measurements [2]. Understanding the temporal patterns visible in GETS is crucial for understanding the dynamics of cellular response to temperature changes, immunological response [3], and various other cellular systems [4]. Only a tiny fraction of the genes in a cell’s genome are expressed during that process [5]. Therefore, it is frequently sought to discover these collections of coexpressed genes during gene function investigations. By using clustering methods, which are crucial machine learning techniques, it is possible to identify these gene subgroups [6]. Clustering aims to split or overlap data groups depending on the characteristics’ similarity, whereas each model has a different partitioning strategy. The number and variety of prior attempts at grouping GETS reflect its importance. K. Sai Dhanush (B) Indiana University Bloomington, Bloomington, IN, USA e-mail: [email protected] S. V. Sudha KPR Institute of Engineering and Technology, Coimbatore, India R. Puchakayala · C. Morthala · M. H. Baji VIT-AP University, Amaravati, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 S. Lanka et al. (eds.), Trends in Sustainable Computing and Machine Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-9436-6_1
1
2
K. Sai Dhanush et al.
One method of grouping is based on similarities or distances between the observations. The most prominent methods in this class are hierarchical clustering [7] and K-means [8]. Information is lost, though, because these models need to consider the time series’ temporal structure and assume that consecutive time points are unrelated. Model-based clustering techniques are another field of study and are used to group genes that adhere to the same model that is generative [9]. This class’s member, Mclust, assumes that each cluster’s mean and covariance may be represented by a Gaussian mixture model (GMM). Additionally, because it chooses this value to lower the Bayesian information criterion (BIC), it prohibits modifying the hyperparameter for the overall number of clusters [9]. Every method that has been discussed thus far uses the same dataset. Multiple database clustering strategies have recently attracted the interest of academics. Members of this category, including Clust and distributed GE data clustering, are noteworthy. A current overview of time series data clustering methods, including GE data, is provided in. The use of standard clustering methods on GE data needs to be improved by several issues, even though they can yield significant insights. Less than nine-time points were present in just 80% of GETS [9]. Due to the vast range of studied genes and a relatively small amount of time points, the bulk of what has been observed is arbitrary, making it difficult to identify the real inter-gene interactions. The majority of clustering algorithms need help as a result. The second problem is connected to determining the clustering model’s parameters, which include the number of clusters. The biological verification of the clustering is difficult due to the complexity of the fundamental biological mechanisms, and thus, it requires a while to decide whether to change the general amount of cluster parameters or to check the model’s correctness. The model might be tested using this tagged set by creating synthetic, physiologically convincing data in this context. Another option is to employ unsupervised accuracy measures that don’t need labeled data. This study introduces B-SM, a unique ML technique based on a similarity matrix for GETS data grouping. B-SM is built on time series data that has been transformed into recurrence images. Visual representations of time series can better understand the temporal dynamics of the generally unobservable phase regions. According to our argument, recurrence images may develop richer knowledge that could be applied during clustering by depicting quick and loud GETS similar to recurrence pictures. After acquiring these images, deep architectures created for image learning can be applied instead of conventional clustering techniques. We assert that a learning mechanism like this one can improve clustering performance. The work is drafted as follows: Sect. 2 analyzes diverse approaches. In Sect. 3, the methodology is elaborated. The experimental outcomes are discussed in Sect. 4, and the work summary is given in Sect. 5.
1 Design of a Boosting-Based Similarity Measure for Evaluating Gene …
3
2 Related Works Researchers can monitor an organism’s level of gene activity using DNA microarrays [10]. It is done by contrasting the amounts of GE in healthy and sick tissues. Thanks to tools connected to microarrays, it is now possible to compare each gene’s expression levels in malignant and non-cancerous samples using mathematical and statistical methods. It makes it possible to identify the top genes with differential expressions that could be connected to a particular disease. Maybe even more significant, gene ontology enrichment analysis based on prior experimental work with these genes may be used to identify potential biological processes associated with the disease. Fluorescence-tagged cDNA molecules are used in microarrays to measure fluorescence intensity and correspond to the levels of GE. The expression of hundreds of genes is simultaneously monitored using microarrays using DNA-based slides. Hundreds of probes that act as sequence-complementary to the fluorescently labeled cDNA molecules added to the array have already been pre-spotted on microarrays known as DNA/gene chips. Certain GE patterns may be connected to certain genes because of the predetermined placements of the probes on the chip [11]. The gathering of experimental samples, such as those from cancer tissue, as well as reference samples, such as those from healthy tissue, is necessary for a microarray study. After the samples’ mRNA has been extracted and converted into fluorescently labeled cDNA, the first sample is typically marked with a green fluorescent marker and the second with a red fluorescent marker. The microarray plate is then covered with the two combined samples. The degree of each hybridized gene’s expression is measured after washing the slide to eliminate any non-specific cDNA molecules. When a particular gene is much more abundant in the specimen being studied than in the original sample, a specific region on the microarray will turn red. However, if a particular gene exhibits itself at a lower level in the sample under study than in the reference sample, the spot will become green. The spot will look yellow if a gene’s expression is identical in both samples. Unless a gene is expressed in both samples, there won’t be any color. When contrasted to a reference sample, GE profiles created using the data from microarrays show simultaneous modifications to the expression of several genes. It might refer to a specific treatment or condition. A measurement technique called RNA-seq is based on next-generation sequencing technology, which has mostly supplanted microarray technology (such as Illumina HiSeq). Comparing the expression and sequencing of genes across multiple biological samples may be done using this method [12]. Only predefined mRNA transcripts may be analyzed using microarray technologies, which attach complementary DNA (cDNA) to oligonucleotide probes previously identified on an array. When determining the levels of GE using RNA-Seq, read counts are evaluated rather than relative abundance assessments [13]. Additionally, single base pair resolution, differentiating gene allelic expression, discovering new genes, and changing splice forms are all provided by RNA-Seq. Additionally, it has a greater range of motion, a better signal-to-noise ratio, and is much more precise in calculating the levels of GE. RNA-Seq is not constrained by a certain number of
4
K. Sai Dhanush et al.
probes (on an array) but rather depends on the precision of reference assemblies since it necessitates mapping a set of processed sequence reads to a reference genome (or transcriptome). As a result, novel gene-disease relationships can now be discovered that microarray approaches could have overlooked. In contrast to microarray, which is limited to exon level GE detection, exon, transcript, Coding DNA sequence (CDS), and gene expression levels may all be identified using RNA-seq [14].
3 Methodology Here, an online available dataset is taken for analysis purposes of gene expression. Let X ∈ Rd1 ∗d2 ∗n represent the input tensor created by combining the similarity matrices from all n samples in two subsequent MRI scans. The targets are X n ∈ Rd1 × d2 and Y = [y1 , . . . , yt ] ∈ R n∗t , respectively. The corresponding target (clinical scores) at various time intervals is yt = [y1 , . . . , yn ] ∈ R n . This is how the operator is applied: Z i j = m i . nj for all i, and j is indicated by the symbol Z = M N . Due to the linkage of connections between different biomarkers, the input tensor’s likeness of morphological variation trends in brain biomarkers is asymmetric, which causes half of the data to be repeated. To address the issue of redundant data, the research also provides the duplicate data rectification matrix shown below: ⎡
⎤ 0 1 1 ⎢ ⎥ K = ⎣ ... . . . ... ⎦. 0 ... 0
(1)
The goal function of the suggested method may be expressed as follows for the tth forecast time point: L t (X, yt ) =
min
Wt , At ,Bt ,ct
2 1 yˆt − yt F + λ/2 2
L t (X, yt ) = ||X − [At , Bt , Ct ]||2F + β||Wt , At , Bt , Ct ||1 yˆn =
d1 d2
Ui j
(2) (3)
(4)
i=1 j=1
U = At BtT K Wt X n , U ∈ Rd1 ∗Ad2 .
(5)
For the forecast time point, the model parameter matrix is Wt ∈ R d1 ∗d2 , and parameters are λ and β. Where the anticipated values are yˆt =
the regularization yˆ1 , . . . , yˆn ∈ Rn , and the first term defines the empirical error of the training data. At ∈ Rd1 ∗r and Bt ∈ Rd2 ∗r are the first and second biomarker dimensions’ latent
1 Design of a Boosting-Based Similarity Measure for Evaluating Gene …
5
factor matrices, respectively, and each includes r latent components. Latent factors are identified by the objective function X = [A, B, C]2F , where X = maximizing r [[A, B, C]] = i=r ai ◦ bi ◦ ci , which uses the symbol ◦ to signify the outermost product of the two vectors. ||Wt , At , Bt , and Ct ||1 are obtained by applying l1-norm to the respective Wt , At , Bt , and Ct matrices. The goal function may be written as follows for all expected time points: L(X, Y ) = min Wf
t
L t (X, yt ) + θ ||Wf P(α)||2F .
(6)
1
The model parameter matrix ||Wf P(α)||2F , where θ is the regularization parameter, represents the temporal dimension unfolding for the model parameter tensor Wf ∈ R(d1 ∗d2 )∗t . W ∈ Rd1 ∗d2 ∗t , θ is the generalized temporal smoothness term. According to generalized temporal smoothing, a doctor must evaluate the patient’s current and previous symptoms while diagnosing AD. Thus, we conclude that each AD patient’s progression is connected to all the progressions that came before it. This is how the generalized temporal smoothness prior is described: ⎧ ⎪ ⎪ w1 = δw1 ⎪ ⎪ ⎪ ⎨ w2 = α1 w1 + (1 − α1 )δw2 w3 = α2 w2 + (1 − α2 )δw3 ⎪ ⎪ ⎪......... ⎪ ⎪ ⎩ w = α w + (1 − α )δw t−1 t−2 t−2 t−2 t−1
(7)
where w progression wi denoted the progression that contained data from earlier progressions. The degree to which the present progression is connected to all previous progressions is given by the value in wi , which corresponds to the ith column of W . The relevant degree criteria vary for each step of an illness’ development since the effects of one stage on the stage after it is not always consistent. For a single patient, the ith progression δwi is as follows: δwi = wi − wi+1 , i = 1, 2, . . . , t − 1.
(8)
Therefore, the more realistic temporal smoothness assumption may be described by matrix multiplication: WP(α) = WHD1 (α1 )D2 (α2 ) . . . Dt−2 (αt−2 ),
(9)
where H ∈ R t∗A(t−1) meets the criteria for the definition below: If i = j or if i = j + 1, Hi j = 1; otherwise, it is equal to zero. The hyperparameters are contained in P(α), which indicates the correlation between progress and depends on the cross-validation results. When m = i, n = i + 1, Dim, n(αi ) is changed to αi , and when m = n = i + 1, Dim,n (αi ) is changed to 1 − αi . The identity matrix Di (αi ) ∈ R (t−1) × A´ (t−1)
6
K. Sai Dhanush et al.
is used. The model parameter W ∈ R d1 ∗d2 ∗t , the latent components A ∈ Rd1 ∗r ∗t , B ∈ R d1 ∗d2 ∗t , and C ∈ R d3 ∗r ∗t can all be found by repeatedly optimizing the goal function for each problem involving two variables. We solve each subproblem using proximal gradient descent since not every element of the objective function can be distinguished. The sections of our goal function that directly deal with the l1-norms that cause sparsity are not differentiable, but the parts of our objective function that explicitly deal with Frobenius norms are. The proximal technique is frequently used in the MTL model for constructing the proximal issue for the non-smooth objective function. We may get the sum of the smooth and non-smooth functions’ values by replacing the quadratic function with the smooth function. There are several ways to construct the quadratic functions of the Taylor series, and it is generally easier to answer the proximal questions after the early ones. The technique can facilitate the development of distributed optimization algorithms or hasten the convergence of the optimization process.
3.1 Boosting Model It has been demonstrated that ensemble learning benefits many prediction tasks. It joins a group of weak learners to produce a stronger group of students. “Boosting,” the major tactic employed in ensemble learning systems, creates a chain of ineffective learners to use the mistakes made by the one before them to create a more accurate model for the one after them. Predictors are learned sequentially rather than independently. The gradient boosting (GB) approach, an extension of the boosting method, uses gradient descent optimization techniques to find the global or local minima of the cost function. Using a sequence of weak learners, it educates the computer regarding fitting the model on the input feature space, each of which increases the prediction accuracy of the person who learned who has come preceding it. GB creates strong learners by combining many weak learners across several rounds. The suggested technique uses the GB building framework to incrementally fit a more accurate model to the residuals from the previous phase, increasing prediction accuracy. Until a highly accurate model is created, this process will continue.
4 Results and Discussion The effectiveness of various methods is assessed using a real yeast dataset. For instance, some existing approaches are evaluated against the proposed model with a similarity analysis with and without dropout. The dataset in our test scenario was split into three datasets: a validation dataset, an independent test dataset, and a training dataset that wasn’t used in training. To avoid overfitting, this was done. We also do fivefold cross-validation on the dataset used for training to create our ideal model, using a portion of the dataset used for training as a validation dataset that has yet to
1 Design of a Boosting-Based Similarity Measure for Evaluating Gene …
7
be used for training. Last, we collect and compare the prediction outputs using the trained model with learned parameters on an additional test dataset. Mean square error (MSE) was used to evaluate several models and compare their performance, as shown in Eq. (10). MSE =
n 1 (z i − yi )2 n i
(10)
If i ∈ [1, n] (n) specifies the number of samples, z i denotes the anticipated outcome, and yi denotes the original output. The results show that our model’s cross-validation-trained optimum learning rate beats other well-known techniques. The model could be improved much more with dropout. We assert that our study presents an entirely novel approach for creating prediction models using genomic data given that our ML model may consider data from various sets of data, including functional elements and epigenetic markers, and as a result, it could potentially be able to produce scalable results on larger datasets. Predicted values and actual data were found to be highly comparable. Our model only uses SNP genotypes as explanatory variables, even if certain genes’ true and predicted values might differ. The estimated GE quantifications are then found to have identical up-regulated and down-regulated GE patterns. Our model may be further improved to resemble actual GE signals more accurately by integrating other feature types (routes and regulatory components) and environmental factors (such as medium and temperature). We measure the training and prediction time utilization to demonstrate that our model scales for different ML architectures (Figs. 1, 2, 3, and 4), as in Tables 1, 2, and 3. The test is conducted on a computer with a 64-bit operating system, 7.8 GiB of RAM, an Intel Core i7-2600 CPU working at 3.40 GHz, Gallium 0.4 on AMD CEDAR graphics, and a 445.5 GB disk. The anticipated time is significantly less than the training duration. If our model has been correctly trained, we think it can manage huge datasets with various traits and labels.
5 Conclusion We provide a novel ML model that utilizes the boosting-based similarity matrix (BSM) with dropout to predict quantitative attributes from genotype data. We first train a learning model and then use a back-propagation to extract important features using the anticipated model. We demonstrate that the B-SM model can correctly predict GE levels from SNP genotypes by applying it to a well-known yeast dataset. Compared to various existing approaches with the B-SM model, the anticipated model gives better prediction outcomes. The model establishes better outcomes while analyzing gene expression. However, in the future, some novel deep learning approaches can be adopted to validate the performance.
8
Fig. 1 MSE outcomes
Fig. 2 MSE-based learning rate
K. Sai Dhanush et al.
1 Design of a Boosting-Based Similarity Measure for Evaluating Gene …
Fig. 3 Prediction outcomes
Fig. 4 Prediction outcomes
9
10 Table 1 MSE results
Table 2 MSE-based learning rate
Table 3 Training versus prediction
K. Sai Dhanush et al. ∝
MSE
0.05
0.35
0.1
0.31
0.2
0.30
0.3
0.29
0.4
0.29
0.5
0.29
0.6
0.29
0.7
0.29
0.8
0.29
Learning rate
MSE
0.1
0.289
0.01
0.290
0.001
0.289
0.0001
0.290
0.00001
0.291
Configuration
Training (s)
1500
1685
4.5
2000
2355
6.1
3000
4040
7.8
4000
5880
10.9
Prediction (s)
References 1. Acharya S, Saha S, Thadisina Y (2016) Multiobjective simulated annealing-based clustering of tissue samples for cancer diagnosis. IEEE J Biomed Health Inform 20(2):691–698 2. Liu J, Lichtenberg T, Hoadley KA, Poisson LM, Lazar AJ, Cherniack AD, Kovatich AJ, Benz CC, Levine DA, Lee AV et al (2018) An integrated TCGA pan-cancer clinical data resource to drive high-quality survival outcome analytics. Cell 173(2):400–416 3. Kong Y, Yu T (2018) A graph-embedded deep feedforward network for disease outcome classification and feature selection using gene expression data. Bioinformatics 34(21):3727–3737 4. Fang H, Li X, Zan X, Shen L, Ma R, Liu W (2017) Signaling pathway impact analysis by incorporating the importance and specificity of genes (SPIA-IS). Comput Biol Chem 71:236– 244 5. Reyna MA, Leiserson MDM, Raphael BJ (2018) Hierarchical HotNet: identifying hierarchies of altered subnetworks. Bioinformatics 34(17):i972–i980 6. Barman S, Kwon Y-K (2018) A boolean network inference from time-series gene expression data using a genetic algorithm. Bioinformatics 34(17)
1 Design of a Boosting-Based Similarity Measure for Evaluating Gene …
11
7. Grimes T, Potter SS, Datta S (2019) Integrating gene regulatory pathways into differential network analysis of gene expression data. Sci Rep 9(1):1–12 8. Kong Y, Yu T (2020) Forget a graph deep neural network model using tree-based ensemble classifiers for feature graph construction. Bioinformatics 9. Zhang Y, Chen L, Li S (2020) CIPHER-SC: disease-gene association inference using graph convolution on a context-aware network with single-cell data. IEEE/ACM Trans Comput Biol Bioinform 1 10. Goldman M, Craft B, Brooks A, Zhu J, Haussler D (2018) The UCSC Xena platform for cancer genomics data visualization and interpretation. BioRxiv 326470 11. Dastsooz H, Cereda M, Donna D, Oliviero S. A comprehensive bioinformatics analysis of UBE2C in cancers. Int J Mol Sci 12. Sherubha P (2020) Graph-based event measurement for analyzing distributed anomalies in sensor networks. Sådhanå 45:212. https://doi.org/10.1007/s12046-020-01451-w 13. Sherubha (2019) An efficient network threat detection and classification method using ANPMVPS algorithm in wireless sensor networks. Int J Innov Technol Explor Eng (IJITEE) 8(11) (2019). ISSN: 2278-3075 14. Sherubha. An efficient intrusion detection and authentication mechanism for detecting clone attack in wireless sensor networks. J Adv Res Dyn Control Syst (JARDCS) 11(5):55–68
Chapter 2
Leveraging Linear Programming for Identification of Peripheral Blood Smear Malarial Parasitic Microscopic Images Tamal Kumar Kundu, Dinesh Kumar Anguraj, and Nayana Shetty
1 Introduction Malaria is an infection that impacts global well-being and is a global problem. It is parasitic contamination by the anopheles’ category of mosquito. It prompts an intense hazardous sickness that represents a substantial worldwide well-being danger. The Plasmodium parasite has a multistage lifecycle, which prompts trademark repeated fevers. With opportune treatment, the vast majority experience quick manifestations like cerebral malaria, severe malarial anaemia, coma, or death. Severe malarial anaemia (SMA) is the preliminary cause of infant mortality in malaria-endemic regions. Counteracting severe anemia requires quick treatment of suggestive highthickness parasitemia and a decrease of asymptomatic parasite pervasiveness to give a recuperation period to reestablish the creation of erythrocytes [1]. If there should be an occurrence of cerebral malaria, a complex neurological condition of extreme falciparum malaria, it is frequently lethal and addresses significant general well-being problems [2]. Contamination is cerebral malaria, a clinically perplexing disorder of extreme lethargy and possibly reversible encephalopathy, related to a high death rate and progressively perceived long-haul sequelae in survivors [3]. The protozoan infection could be a critical bifurcator behind morbidity and mortality in endemic regions, creating monumental social and economic burdens. Current efforts to regulate protozoan infection target the reduction of impeccable morbidity and mortality. In 2016, the World Health Organization declared it an entirely treatable and preventable T. K. Kundu · D. K. Anguraj (B) Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Guntur 522302, Andhra Pradesh, India e-mail: [email protected] N. Shetty Department of E and E Engineering, NMAM Institute of Technology Nitte, Karkala, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 S. Lanka et al. (eds.), Trends in Sustainable Computing and Machine Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-9436-6_2
13
14
T. K. Kundu et al.
disease [4]. The intense treatment of malaria is essential to reducing the disappearance rate drastically. By World Health Organization statistics (2020), from 2000 to 2019, it caused more than 7.6 million passages arising out of around 1.5 million infections reliably [5]. The rapidity with which researchers work on malarial mortality rates everywhere is shown in Figs. 1 and 2. The blood films of Giemsa-stained [6–9], which were examined by microscope, remain gold-standard in detecting Plasmodium these days. Tangpukdee et al. showed that traditional microscopy enhanced malaria analysis by contrasting different techniques [6]. Infinitesimal assessment empowers the identification of thick smear parasites against a foundation of light-stained RBCs. The microscope is adjusted to low-asset and high sickness trouble regions attributable to the expense viability, straightforwardness, and adaptability. There is an immense insistence on alternative diagnostic methods that are straightforward, fast, highly sensitive, ideally do not rely on blood tests, and can potentially be conducted by the patients themselves. Therefore, early analysis and effective treatment will assist us with keeping away from high mortality rates associated with malaria. Plasmodium protozoan parasites are answerable for this life-threatening disease known as malaria. Computer-aided diagnosis has made rapid advancements technologically to become increasingly popular in medical imaging. Accordingly, the conclusion and anticipation of the infection have improved. As of late, the machine learning method assumes a significant part in reinforcing analytic exactness by utilizing clinical imaging informatics. Imaging through a microscope is the most effective strategy for assessing pathological conditions in various infections. If we observe malaria infection in blood under a microscope, we can clearly distinguish four stages—ring, trophozoite, schizont, and gametocyte. Early analysis is significant for downsizing the deadliness percentage because of malarial infection. Fig. 1 Thin and thick blood smear
2 Leveraging Linear Programming for Identification of Peripheral Blood …
15
Fig. 2 Malaria-infected a thick blood smear image and b thin blood smear image
2 Malaria 2.1 Blood Smear Type The most reliable standard for doing malaria tests is bloodstains, which are often obtained via a finger prick. There are two different blood viscosities that are used to detect malaria. They are helpful for detecting the presence of parasites because thick blood spreads probe a larger volume of blood with roughly eleven times better sensitivity. A thin blood stain is defined as a small quantity of blood spread throughout a sizable portion of the slide. Examiners can identify the specific malaria strains that are the source of the contamination through the use of thin blood smears. To evaluate a blood slide with the parasite and species recognition, a qualified microscopist needs at least 15–30 min. In general, a huge amount of money is expended in order to accurately examine innumerable blood slides for malaria. Generally, countless blood slides are genuinely evaluated for malaria, amounting to microscopically immense financial exertion required for malaria diagnosis. Figure 1 shows both thick and thin smear images.
2.2 Malarial Parasite Species Plasmodium can be contaminated by four different forms of Plasmodia. Plasmodium vivax, Plasmodium falciparum, Plasmodium malariae and Plasmodium ovale. The overall common pace of P. falciparum contamination is higher than others. P. falciparum is the riskiest individual from the plasmodia family, bringing about disappearances and is the most typical species in the planet’s tropical regions. In India, 40–50%
16
T. K. Kundu et al.
Fig. 3 Malaria-infected blood a P. falciparum, b P. vivax, c P. malariae, d P. vivax
of cases are reported on average by P. falciparum. On the other hand, in the betterquality areas of the torrid zone, P. vivax is the species that is most widely known. P. malariae is prevalent across the planet and continuously contaminates areas, sometimes for life. However being an uncommon species, P. ovale is rather prevalent throughout the continent of Africa. P. ovale and P. vivax have physical similarities, which might lead to confusion between the two species among less seasoned microscopists. Using a WHO practical microscopy guide for malaria, laboratory personnel can find comprehensive, if the tested specimen contains any malarial parasites. If parasites are found, two further tasks must be completed: identifying the species, determining the infection’s life cycle stage, and calculating the degree of infection. These jobs aren’t typically carried out separately though. One of two methodologies can be used to prepare a sample for manual microscopy assessment on a glass slide. The first is that a thick blood film makes it possible to examine more blood, making it more sensitive to the presence of parasites. However, because RBCs are destroyed during the thick film manufacturing procedure, it is challenging to distinguish between different species (Fig. 3).
2.3 Diagnosis of Malaria with Different Instruments Thus far, the following methodologies have been utilized for malaria detection. The contemporary highest-quality level strategy for malaria analysis examination is light microscopy of blood films. Despite those different types of analyses that exist and have grown to be famous in the latest years, digital microscopy tests are at the forefront, specifically in resource-negative settings. With microscopy, all parasite categories have an immense possibility of being distinguished. It permits figuring the intensity of parasitic, clearing an influenced individual after adequate treatment and a prescribed medication regime. Moreover, it’s considerably less costly than various systems and broadly accessible. Although its most tremendous burdens are the broad preparation needed for a laboratory technician to turn it into a capable malaria slide peruse, the significant expense of preparing and utilizing, looking after abilities, and the large segment of manual work included. Digital microscopes perform the same tasks as optical. Notwithstanding, there is no arrangement to view the sample directly through the eyepiece in the digital microscope included in the optical microscope.
2 Leveraging Linear Programming for Identification of Peripheral Blood …
17
A serial block-face electron microscope was used by Sakaguchi et al. to demonstrate how to identify blood cells that have been infected with the malaria parasite and their morphological alterations [10]. The work of Bhowmick et al. on an electron microscope was published in 2013 [11]. For the purpose of diagnosing malarial parasites using light microscopic pictures, Das et al. suggested a computerized machine learning method [12]. When it comes to providing basic medical treatment, Payne et al. investigated the use of a light microscope [13]. Both a fluorescence microscope and Quantitative Buffy Coat (QBC) can detect malaria infection or other blood parasites. Although the QBC approach is necessary, trustworthy, and simple to grasp, it is more expensive than standard microscopy and is unable to identify species or count the number of parasites. Nga and Adeoye noted that this test is sensitive to routine thick smears [14]. Light microscopes are substantially less expensive than fluorescence microscopes, for example. Consequently, this illness is prevalent in tropical areas. Erythrocytic characteristics were employed by Devi et al. to detect malaria in thin blood smears [15]. Kawamoto [16] developed a quick diagnostics test using a fluorescent and light microscope, containing an interference filter. According to Breslauer et al., Jan et al., and Wongsrichanalai and Kawamoto [17–19], cross-polarization microscopy and dark-discipline microscopy have been used to diagnose malaria. To improve the process of demonstrating malaria in bloodstain tests, the two systems use image correlation enhancements made from hemozoin. The polarized light microscope was a topic of study for Maude et al. [20]. Since hemozoin (a birefringent chemical) is visible under polarized light microscopy as opposed to conventional light microscopy, this makes a difference. Haditsch talked about several expensive techniques that need to enhance quality control, such as quick diagnostic tests, polymerase chain reactions, flow cytometric measurements, and others while working on an unstained blood smear [21]. Albeit, polymerase chain reaction (PCR) has revealed excessive affectability with unequivocalness than ordinary minute assessment like a normal microscope of peripheral, stained blood spreads. You can tell that the accuracy is far better than other tests and detect certain parasites with low blood parasitic concentrations. Notwithstanding, PCR is a perplexing, significant, and expensive innovation that requires numerous hours to measure via trained staff. Updated renditions of the strategy like real-time PCR, nested PCR, and reverse transcription PCR (RT-PCR) are additionally helpful techniques for the conclusion [22]. However, hardware and quality control quantity are critical highlights for PCR. On the other hand, the cell calculating technique with proper identification procedures is known as flow cytometry. Flow cytometry intrusively checks for parasites through computerized techniques is also slighter sensitive with definite methodology in this field when an uninterrupted acknowledgement is essential for the therapeutic conclusion. Especially in nations with a high pervasiveness of malaria-like those in the tropical zone, conventional microscopy could be a useful technique for calculating parasite numbers required for the growth of drug-based treatment [23]. Bruckner and Labarca worked on Raman spectroscopy for detecting malarial parasites in thin Giemsa-stained blood [24]. In this regard, “rapid diagnostic tests” (RDTs) [25] appear to be the fair and leading opponent. Despite the fact that RDTs are more expensive than a typical microscope in
18
T. K. Kundu et al.
high-conflict areas [26], it is still unclear if these tests will soon replace microscopy. However, there is no use of magnifying equipment RDTs, according to WHO, which claims that several countries employ microscopy more frequently than RDTs [27]. Approximately 45% of RDT diagnoses for malaria were accurate. RDT was used for, on average, 47% of malaria tests in areas with an endemic malaria problem. The need for a microscope for malaria diagnosis is still present despite the use of RDTs. Failure of RDTs to assess the results is a significant barrier. As a result, today, RDTs and microscopy are progressively complementing one another. Consequently, now, microscopy and RDTs are increasingly supplementing each other. Herrera et al. worked on automated RDT [28].
2.3.1
Conventional Microscopic Malaria Diagnosis
The optimum quality level approach for analysing and assessing malaria is presently light microscopy of blood films. Despite the many different types of analyses that are accessible and have grown in popularity recently, digital microscopy exams continue to be at the forefront, especially in environments with limited resources. The likelihood of distinguishing various parasite types via microscopy is quite high. By providing the patient with the necessary medical attention and a prescribed medication regimen, it makes it possible to assess the severity of parasitemia and facilitates the patient’s recovery. Additionally, compared to other systems, it is far more affordable and readily accessible. The significant amount of manual work necessary, the high expense of preparation and use, and the long training needed for a laboratory staff to convert it into a professional malaria slide reader are some of its most significant challenges. A digital and optical microscope performs the same tasks. The optical micro-built-in scope’s digital microscope still lacks a setting that enables you to view the sample through the eyepiece. Malaria parasite-infected erythrocytes with distinct shapes is shown in Fig. 4. Non-infected malaria blood through microscope imaging is shown in Fig. 5.
3 Literature Survey Giemsa-stained peripheral blood smears with many red platelets (RBCs) contaminated with typical P. falciparum parasite trophozoites with ring- and earphone-shaped structures were obtained, according to Parikh et al. [29]. In order to identify the presence of malarial parasites in human peripheral thin blood smears stained with the Leishman method, Bibin et al. suggested a method using a deep belief network (DBN) [30]. In their article, Das et al. [31] demonstrated the categorization type of pictures for malaria-infected thin blood smears using phase recognition and a microscope. In thin Giemsa stain, malaria parasites may be examined and found using a statistical approach, as demonstrated by Raviraja et al. [32]. The ideal technique to use sophisticated image analysis, according to Ahirwar et al., is to recognize and classify malarial
2 Leveraging Linear Programming for Identification of Peripheral Blood …
Fig. 4 Malaria parasite-infected erythrocytes with distinct shape
Fig. 5 Non-infected malaria blood through microscope imaging
19
20
T. K. Kundu et al.
parasites accordingly [33]. Kundu and Anguraj proposed an automated methodology for identifying and categorizing malaria parasites [34] using blood smear pictures. Erythrocyte separation from microscopic thin blood smear was suggested by Devi et al. using the KNN classifier [15]. Makkapati and Rao advised examining the stage and spices in peripheral bloodstain pictures [35]. Computer vision techniques were suggested by Srivastava et al. for the rapid and precise detection of malarial parasites [36]. The importance of this aspect of image processing for detecting malaria was explored by Frean [37]. Sunarko et al. suggested using a contrast upgrading approach to enhance and fragment the pictures of the malaria slides in thick blood smears stained with Giemsa [38]. In order to evaluate how well a machine learning system works in identifying malaria parasites in microscopic pictures, Kundu and Anguraj used it [39]. The thiazine-eosinate solution is used to stain cells in the Giemsa stain. However, Leishman’s stain is more sensitive, less expensive, and cleaner, and it consistently produces excellent results for identifying malarial parasites. Thin Leishman-stained peripheral blood pictures were used by Bibin and Punitha to demonstrate an automated framework of malarial parasites [40]. Devi et al. suggested histological characteristics that were based on classifying the erythrocyte stage using a microscope and thin dye [41]. To publicly support malarial parasites in the images of thick blood films, Luengo-Oroz et al. developed a web-based gaming strategy [42]. Wright stain was utilized by Dong et al. in their study to detect malaria-infected cells using a deep learning methodology [43] and was referenced in their works for automatically identifying malarial cells [44]. Fluorochrome staining is frequently used to detect malaria parasites in thin blood smears by Wongsrichanalai and Kawamoto, Moon et al., and Parsel et al. [19, 45, 46].
4 Proposed Methodology A system of linear constraints and a linear objective function can be optimized using linear programming. Leveraging linear programming to represent mathematical modelling techniques to address issues with malaria diagnosis through conventional microscopy identification, detection, and classification of parasites in the input image using the developed model are hence imaging challenges. With a given mathematical model for a set of criteria expressed as a list of linear connections, identifying the best solution, such as the best profit or the lowest cost, can be done mathematically using a technique called linear optimization. Maximize C T X depending on AX = B and x > 0 where x stands for the variable of the vector (to be determined), c and b are coefficient vectors (known), A is a coefficient matrix (known), and is the matrix transpose (·)T .
2 Leveraging Linear Programming for Identification of Peripheral Blood …
21
The goal function is the expression that must be maximized or reduced (cT x in this case). The constraints that describe a convex polytope over which the objective function is to be optimized are the inequalities Ax ≤ b [9]. We put our new model onto a model (Fig. 3) made up of an objective function, constraints, decision-making variables, and parameters [8]. For example, u1 p1 + v1q1 = 80
(1)
u1 p2 + v2q2 < 80
(2)
P = total P/Block and q = total Q/Block. Here, u, v = percentage of p, that means 100/(Block size). The objective function: U 1P1 + U 2P2 = X.
(3)
It is constrained by the following: u1 p1 + v1q2 = A
(4)
u2 p1 + v2q2 = B
(5)
u3 p1 + v3q2 = C
(6)
u4 p1 + v4q4 = D.
(7)
When i = 1, 2, 3, … then pi , qi > 0. In linear programming, the goal function for the problem X must be linear.
5 Result Analysis The system is meant to provide output in the form of an image that has had the impacted region marked on it. There won’t be any such marked zone as a result of the clean blood smear. The system also provides output in graphical form. In this, the graph’s X–Y-axes are used to depict the coordinates that were discovered to be contaminated, with a horizontal line standing in for clean blood (Fig. 6).
22
T. K. Kundu et al.
a
b
c
d
e
Fig. 6 a Original microscopic image, b image with detected objects, c original non-infected microscopic image, d graphical representation of non-infected image, and e graphical representation of infected image
2 Leveraging Linear Programming for Identification of Peripheral Blood …
23
6 Conclusion In this research, we deemed the parasite images to be sufficiently large and clear. When contrasted to areas that are not infected with the malaria parasite, the block of the images where we locate the infected parasite is highlighted to make them obvious. The distinction between infected and non-infected outcomes for various parasite species is also highlighted. The system size can be decreased using the full automation approaches. To make the categorization procedure easier to comprehend and improve, we have treated each parasite species in this research as a separate entity.
7 Future Scope Our approach might go in many fascinating and helpful directions by implementing linear programming for the detection of malaria parasite in microscopic peripheral blood smear images for malaria diagnosis. Additionally, you might use hybrid machine learning methods and deep learning, which might be far more accurate than our method. Acknowledgements I value the help that Dinesh Kumar Anguraj, my mentor, and the Computer Science and Engineering Department of the Koneru Lakshmaiah Education Foundation offered me to accomplish this.
References 1. Maude RJ, Beare NA, Sayeed AA, Chang CC, Charunwatthana P, Faiz MA et al (2009) The spectrum of retinopathy in adults with Plasmodium falciparum malaria. Trans R Soc Trop Med Hygiene 103(7):665–671 2. Wassmer SC, Grau GER (2017) Severe malaria: what’s new on the pathogenesis front? Int J Parasitol 47(2–3):145–152 3. Björkman A (2002) Malaria associated anemia, drug resistance and antimalarial combination therapy. Int J Parasitol 32(13):1637–1643 4. World Health Organization (2016) World malaria report 2016, pp 1–280 5. World Health Organization (2020) World malaria report 2020. Geneva, pp 1–299 6. Tangpukdee N, Duangdee C, Wilairatana P, Krudsood S (2009) Malaria diagnosis: a brief review. Korean J Parasitol 47(2):93–102 7. Yang D, Subramanian G, Duan J, Gao S, Bai L, Chandramohanadas R, Ai Y (2017) A portable image-based cytometer for rapid malaria detection and quantification. PLoS ONE 12(6):e0179161 8. Mavandadi S, Dimitrov S, Feng S, Yu F, Sikora U, Yaglidere O, Padmanabhan S, Nielsen K, Ozcan A (2012) Distributed medical image analysis and diagnosis through crowd-sourced games: a malaria case study. PLoS ONE 75:e37245
24
T. K. Kundu et al.
9. Linder N, Turkki R, Walliander M, Mårtensson A, Diwan V, Rahtu E, Pietikäinen M, Lundin M, Lundin J (2014) A malaria diagnostic tool based on computer vision screening and visualization of Plasmodium falciparum candidate areas in digitized blood smears. PLoS ONE 9(8):e104855 10. Sakaguchi M, Miyazaki N, Fujioka H, Kaneko O, Murata K (2016) Three-dimensional analysis of morphological changes in the malaria parasite infected red blood cell by serial block-face scanning electron microscopy. J Struct Biol 193(3):162–171 11. Bhowmick S, Das DK, Maiti AK, Chakraborty C (2013) Structural and textural classification of erythrocytes in anaemic cases: a scanning electron microscopic study. Micron 44:384–394 12. Das D, Chakraborty C, Mitra B, Maiti A, Ray A (2013) Quantitative microscopy approach for shape-based erythrocytes characterization in anaemia. J Microsc 249(2):136–149 13. Payne D (1988) Use and limitations of light microscopy for diagnosing malaria at the primary health care level. Bull World Health Organ 66(5):621–626 14. Adeoye G, Nga I (2007) Comparison of quantitative buffy coat technique (QBC) with Giemsastained thick film (GTF) for diagnosis of malaria. Parasitol Int 56(4):308–312 15. Devi SS, Sheikh SA, Laskar RH (2016) Erythrocyte features for malaria parasite detection in microscopic images of thin blood smear: a review. Int J Interact Multimed Artif Intell 4(2):34–39 16. Kawamoto F (1991) Rapid diagnosis of malaria by fluorescence microscopy with light microscope and interference filter. Lancet 337(8735):200–202 17. Breslauer DN, Maamari RN, Switz NA, Lam WA, Fletcher DA (2009) Mobile phone based clinical microscopy for global health applications. PLoS ONE 4(7):1–7 18. Jan Z, Khan A, Sajjad M, Muhammad K, Rho S, Mehmood I (2018) A review on automated diagnosis of malaria parasite in microscopic blood smears images. Multimed Tools Appl 77(80):9801–9826 19. Wongsrichanalai C, Kawamoto F (2014) Fluorescent microscopy and fluorescent labelling for malaria diagnosis. In: Hommel M, Kremsner P (eds) Encyclopedia of malaria. Springer, New York, pp 1–7 20. Maude RJ, Dondorp AM, Sayeed AA, Day NP, White NJ, Beare NA (2009) The eye in cerebral malaria: what can it teach us? Trans R Soc Trop Med Hygiene 103(7):661–664 21. Haditsch M (2004) Quality and reliability of current malaria diagnostic methods. Travel Med Infect Dis 2(3–4):149–160 22. Swan H, Sloan L, Muyombwe A, Chavalitshewinkoon-Petmitr P, Krudsood S, Leowattana W, Wilairatana P, Looareesuwan S, Rosenblatt JON (2005) Evaluation of a real-time polymerase chain reaction assay for the diagnosis of malaria in patients from Thailand. Am J Trop Med Hygiene 73(5):850–854 23. Janse CJ, Van Vianen PH (1994) Flow cytometry in malaria detection. Methods Cell Biol 42:295–318 24. Bruckner DA, Labarca JA (2015) Leishmania and trypanosoma. In: Manual of clinical microbiology, pp 2357–2372 25. WHO global malaria programme: world malaria report: 2013, p 255 26. WHO. Determining cost effectiveness of malaria rapid diagnostic tests in rural areas with high prevalence, pp 1–16. Available at http://www2.wpro.who.int/sites/rdt. Accessed 18 Jan 2018 27. World Health Organization (2015) World malaria report 2015, pp 1–280 28. Herrera S, Vallejo AF, Quintero JP, Arévalo-Herrera M, Cancino M, Ferro S (2014) Field evaluation of an automated RDT reader and data management device for Plasmodium falciparum/ Plasmodium vivax malaria in endemic areas of Colombia. Malar J 13(1):1–10 29. Parikh MP, Krishnan S, Ganipisetti VM, Flores S (2014) Classic image: peripheral blood smear in a case of Plasmodium falciparum cerebral malaria. Case Rep bcr2014205820 30. Bibin D, Nair MS, Punitha P (2017) Malaria parasite detection from peripheral blood smear images using deep belief networks. IEEE Access 5:9099–9108 31. Das D, Maiti A, Chakraborty C (2015) Automated system for characterization and classification of malaria-infected stages using light microscopic images of thin blood smears. J Microsc 257(3):238–252
2 Leveraging Linear Programming for Identification of Peripheral Blood …
25
32. Raviraja S, Bajpai G, Sharma SK (2006) Analysis of detecting the malarial parasite infected blood images using statistical based approach. In: 3rd Kuala Lumpur international conference on biomedical engineering, Kuala Lumpur, Malaysia, 11–14 Dec, pp 502–505 33. Ahirwar N, Pattnaik S, Acharya B (2012) Advanced image analysis-based system for automatic detection and classification of malarial parasite in blood images. Int J Inf Technol Knowl Manag 5(1):59–64 34. Kundu TK, Anguraj DK (2023) Optimal machine learning based automated malaria parasite detection and classification model using blood smear images. Trait Signal 40(1):91–99 35. Makkapati VV, Rao RM (2011) Ontology-based malaria parasite stage and species identification from peripheral blood smear images. In: International conference on engineering in medicine and biology society, 30 Aug–3 Sept. IEEE, Boston, MA, pp 6138–6141 36. Srivastava B, Anvikar AR, Ghosh SK, Mishra N, Kumar N, Houri-Yafin A, Pollak JJ, Salpeter SJ, Valecha N (2015) Computer-vision-based technology for fast, accurate and cost effective diagnosis of malaria. Malar J 14(1):1–6 37. Frean J (2018) Microscopic determination of malaria parasite load: role of image analysis. Microsc Sci Technol Appl Educ 5(4):862–866 38. Sunarko B, Bottema M, Iksan N, Hudaya KA, Hanif MS (2020) Red blood cell classification on thin blood smear images for malaria diagnosis. J Phys Conf Ser 1444(1):012036. IOP Publishing 39. Kundu TK, Anguraj DK (2023) A performance analysis of machine learning algorithms for malaria parasite detection using microscopic images. In: 2023 5th international conference on smart systems and inventive technology (ICSSIT), 23 Jan 2023. IEEE, pp 980–984 40. Bibin D, Punitha P (2013) Stained blood cell detection and clumped cell segmentation useful for malaria parasite diagnosis. In: Multimedia processing, communication and computing applications. LNEE, vol 213. Springer, pp 195–207 41. Devi SS, Roy A, Sharma M, Laskar R (2016) kNN classification-based erythrocyte separation in microscopic images of thin blood smear. In: 2nd international conference on computational intelligence and networks (CINE), 11 Jan. IEEE, Bhubaneswar, Odisha, pp 69–72 42. Luengo-Oroz MA, Arranz A, Frean J (2012) Crowdsourcing malaria parasite quantification: an online game for analyzing images of infected thick blood smears. J Med Internet Res 14(6):e167 43. Dong Y, Jiang Z, Shen H, Pan WD, Williams LA, Reddy VV, Benjamin WH, Bryan AW (2017) Evaluations of deep convolutional neural networks for automatic identification of malaria infected cells. In: EMBS international conference on biomedical & health informatics (BHI), 16–19 Feb. IEEE, Orlando, FL, pp 101–104 44. Muralidharan V, Dong Y, Pan WD (2016) A comparison of feature selection methods for machine learning based automatic malarial cell recognition in whole slide images. In: International conference on biomedical and health informatics (BHI), 24–27 Feb. IEEE, Las Vegas, NV, pp 216–219 45. Moon S, Lee S, Kim H, Freitas-Junior LH, Kang M, Ayong L, Hansen MA (2013) An image analysis algorithm for malaria parasite stage classification and viability quantification. PLoS ONE 8(4):1–12 46. Parsel SM, Gustafson SA, Friedlander E, Shnyra AA, Adegbulu AJ, Liu Y, Parrish NM, Jamal SA, Lofthus E, Ayuk L, Awasom C (2017) Malaria over-diagnosis in Cameroon: diagnostic accuracy of fluorescence and staining technologies (FAST) malaria stain and LED microscopy versus Giemsa and bright field microscopy validated by polymerase chain reaction. Infect Dis Poverty 6(1):1–9
Chapter 3
The AI’s Ethical Limitations from the Societal Perspective: An AI Algorithms’ Limitation? Alexandru Tugui
1 Introduction Society and technology form a partnership in which it is difficult to determine who plays the decisive role. Society or technology? With the transition to the industrial era through the technological assimilation of the steam engine, in the second half of the eighteenth century, society has gone through several technological phases [1], labeled as follows: Industry 1.0 through mechanization (after 1784), Industry 2.0 through electrification (after 1870), Industry 3.0 through automation (after 1960), Industry 4.0 through virtualization/digitalization (today). In the context of digital transformation, the subject of artificial intelligence (AI) has been progressively prevalent in contemporary society [2, 3]. The exponential growth of AI technologies across several domains is laying the foundation for the attainment of technological singularity, a concept outlined by Vinge [4] and Kurzweil [5], in which artificial intelligence will surpass natural intelligence by a factor of one billion. Regarding this critical juncture, it is undeniable [6] that AI outperforms humans (in speed and volume) in specific domains where these proficiencies were conventionally associated exclusively with humans. Regarding the concept of intelligence, it is evident that man is not yet able to say what natural intelligence is and what it is not, as well as how the human brain is organized and functions. These inquiries have been present since the inception of artificial intelligence, initiated by its pioneers, who posited that the brain can be likened to a biochemical computer [7] intricately linked to the human body (Bo). Following the year 2006, scholarly literature [8] has broadened the scope of inquiry into natural intelligence, shifting the focus from solely the brain (Br) to encompass the interconnectedness of the brain–body–environment (Br–Bo–En). This expanded perspective A. Tugui (B) Faculty of Economy and Business Administration, “Al. I. Cuza” University, Iasi, Romania e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 S. Lanka et al. (eds.), Trends in Sustainable Computing and Machine Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-9436-6_3
27
28
A. Tugui
presents a heightened challenge in terms of formalizing natural intelligence in a manner that enables successful emulation by AI. Technology has greatly influenced the human society [9] but on the other hand the new technologies are also developed by humans. From a societal standpoint, it is evident that there has consistently existed and will continue to exist a certain degree of hesitancy toward novel technologies until they undergo rigorous testing, establish a level of trust in their functionality, and reach a state of maturity. Thus, in relation to artificial intelligence, Schwartz [7] highlighted in “The Limits of Artificial Intelligence” a certain fear induced at the societal level “when such systems are out of control, as well as the methods and rules of human interaction with these systems” ([6], p. 258). Consequently, the limitation established by Schwartz brought attention to the ethical side of using AI technologies from the perspective of the methods and rules to be respected in human interaction with technology. This study aims to comprehensively analyze society’s ethical limitations on AI technologies and present a societal perspective on these ethical limitations. In the end, through our discussions and conclusions, we formulate an answer to whether ethical limitations constitute an AI Algorithm Limitation.
2 Material and Methods This research study explores the ethical limitations in the real-time implementation of AI technologies. In order to answer the above question, we organized the conduct of a systematic literature review (SLR), which, according to Fink [10], is a research literature review, i.e., “a systematic, explicit, and reproducible method for identifying, evaluating, and synthesizing the existing body of completed and recorded work produced by researchers, scholars, and practitioners” (p. 3). In applying the SLR, we used the PRISMA guidelines, including the stages [11], identification, screening, and eligibility. The parameters for applying the PRISMA methodology are: Search domain: Scopus database, article title only, as of June 30, 2023. Search terms: “limit*” for limits, limit, limitation; “artificial intelligence” for artificial intelligence; “AI” for artificial intelligence; societ* for society, societies, societal; “ethic*” for ethics, ethic, ethical, ethically. The search string used here is: TITLE (Limit* “Artificial Intelligence” Societ* Ethic*) OR TITLE (Limit* “AI” Societ* Ethic*) Exclusion criteria (EC): scientific papers (EC1) in English (EC2) and only those that fit our research objective (EC3).
3 The AI’s Ethical Limitations from the Societal Perspective: An AI …
29
After identifying the scholarly articles on the proposed topic, we will proceed to a content analysis from the perspective of the assumed objectives, the applied methodology, and the results obtained. Finally, we will formulate an opinion concerning to the results we will reach.
3 Data and Results The application of the research methodology on June 30, 2023, resulted in a list of papers with only one article. Table 1 shows this result. Humanities and Social Sciences Communications is a Clarivate—Web of Science (WoS)—indexed journal with an impact factor of 3.5 in 2022, in quartile Q1, in two research areas, namely humanities, multidisciplinary, and social science, interdisciplinary. The journal is also indexed in Scopus. The article currently has one citation in WoS but none in Scopus. The fields to which the journal is assigned are in line with the societal side we are analyzing. The proposed methodology has only filtered one research article published in 2022. From this, it is evident that this particular topic has not gained significant research attention. The paper exhibits an intriguing framework that lacks a well-defined methodology, which forces us to model the described situations using combined inductivedeductive and even abductive logic. The article posits that there will inevitably be a juncture in which society must confront the ethical implications associated with the advancement and integration of artificial intelligence (AI) technologies, as “we will have to start debating the rights of robots, the ethics of autonomous AI algorithms, and their potential citizenship, simply because AI will ask for them at some point.” In a similar vein, Hauer [12] asserts that throughout the development and use of AI technologies, we will need to refer to algorithms that incorporate the ethical considerations anticipated in the decision-making context in which AI technologies will operate. Inferred from the content analysis of some cases of AI technology implementation, it becomes evident that there is a requirement to model and incorporate into AI algorithms the ethical decision-making process that is challenging for human subjects. This ethical behavior can be difficult to model in real life, but developers are required to endow AI technologies with ethical algorithms that are activated in hypothetical Table 1 Results of interrogation in Scopus database Author(s)
Articles Title
Hauer [12] Importance and limitations of AI ethics in contemporary society
Source
Vol. (issue), DOI
Humanities and social sciences communications
9(1),272, 10.1057/ s41599-022-01.300-7
30
A. Tugui
ethical situations. In other words, developers of AI technologies confront an ethical conundrum that restricts the interaction capabilities of AI with society. Hauer [12] concludes that there is a pressing requirement for the progressive development of human society to allocate more significant consideration toward the superintelligences that will be encountered in the future. Additionally, it is imperative to incorporate ethical dimensions into any artificial intelligence algorithms. As the author notes, this limitation stems from society’s orientation toward teams developing and employing AI technologies.
4 Discussion Throughout this process, we observe a dual manifestation of the ethical boundary. Firstly, society will be unable to fully formalize the reasonable ethical requirements that AI technologies should adhere to. Secondly, there is an inherent difficulty in completely translating these requirements into AI algorithms. Therefore, it can be observed that both forms of ethical limitation revolve around the human element, making society susceptible to significant vulnerabilities. Our scientific inquiry into the ethical limitations imposed by society on the development process of AI technologies entitles us to formulate an opinion on how we see the evolution in the field of social ethics imposed by society on AI technologies. One initial consideration involves the necessity of categorizing domains based on varying degrees of ethical magnitude, which will result in a specific prioritization of the development of ethical models assigned to a domain. In this regard, at the societal level, the question of developing a methodology to follow in this ethical taxonomy approach will be raised. The second aspect we want to emphasize is, in our opinion, the nature of the discussions to be held on such a delicate topic with a difficult-to-formalize qualitative tinge. The persistence of this matter will continue until a comprehensive AI ethical meta-model is established at the societal level, accompanied by AI ethical models tailored to the unique requirements of each domain where these technologies are implemented. Consequently, an emphasis will be placed on ethical “do-case” scenarios with specific implementations. The third matter we aim to emphasize pertains to a subject of great sensitivity, which acknowledges the fact that humans, as integral members of society, present a dual limit when it comes to addressing ethical obligations in the context of implementing AI technologies. In an algorithm-based society [13], the emergent solution to this challenge is to substitute human involvement gradually and carefully in the development of AI technologies with specialized AI technologies that possess expertise in ethical considerations. This solution is supported by the technological spiral of evolution [9], which is a viable working hypothesis in the sense that technologies will evolve and thus ensure the creation of new AI technologies that are increasingly performative in terms of volume, speed, and reasoning. This technological evolution may lead to a situation in which AI technologies attain the level of performance at
3 The AI’s Ethical Limitations from the Societal Perspective: An AI …
31
which they “grow themselves big” [14] ethically, in the sense of becoming mature and responsible in this field. This ethical maturity of AI technologies implies that they will go through a process of knowledge accumulation on ethical knowledge steps. It is even possible that these AI technologies will train each other during this process of qualitative accumulation on the ethical component, which would indicate that “they grow on each other.” Further, in support of this solution, we expect that bio-techno systems [15] will make a significant contribution in terms of facilitating communication between biological and technical systems. We refer to communication from the cell (bio component) to the technical system, and vice versa, using a natural language of the living cell (N2LC) [6] that enables us to understand what the living cell is transmitting (either chemically [16] or computationally [17]), as well as by developing a programming language for living cells [18] that instructs the living cell what to do. Our opinion has been formulated inductively-abductively by correlating all the aspects identified from the application of SLR, and other sources [6, 14] relevant to the investigated topic. Regarding whether the limits highlighted by Hauer represent an AI algorithm’s limitation, we consider that this will be a limitation in integrating ethical aspects into their content. However, based on existing literature, we cannot anticipate an answer in the case of the automated generation of AI algorithms.
5 Conclusion In accordance with the objective of this study, we conducted a literature search to acquire a comprehensive understanding of the ethical limits imposed by society on AI technologies and to formulate an opinion on these ethical limits from a societal standpoint. By applying SLR as a research methodology and using PRISMA guidelines, we identified only one paper in the SCOPUS database that met the query criteria after applying the exclusion criteria EC1, EC2, and EC3. From the research analysis, it is evident that AI technologies are influencing human lives in every possible. The major challenge for AI is the ethical issues. This necessitates the need to develop advanced AI algorithms and models that satisfy ethical requirements. The potential limitation observed in this research study is the restricted scope of our research, as we solely focused on a single database. Although we conducted a supplementary search on the Web of Science (WoS) out of curiosity, we only retrieved the same article. It would be advantageous to expand our technique to encompass databases beyond Scopus and WoS. Another limitation would be to expand the search to include the abstract in addition to the title, but when we simulated this method of identifying scholarly papers, we discovered that the topic was not addressed explicitly from either a societal or ethical perspective. Hence, the rationale behind our decision to focus solely on the topic of titles as a presumed area of research.
32
A. Tugui
References 1. Duggal AS, Malik PK, Gehlot A, Singh R, Gaba GS, Masud M, Al-Amri JF (2022) A sequential roadmap to industry 6.0: exploring future manufacturing trends. IET Commun 16:521–531. https://doi.org/10.1049/cmu2.12284 2. Holmström J (2022) From AI to digital transformation: the AI readiness framework. Bus Horiz 65(3):329–339. https://doi.org/10.1016/j.bushor.2021.03.006,ISSN0007-6813 3. Lemieux F (2023) Digital transformation and artificial intelligence: opportunities and challenges. In: Reza Djavanshir G (ed) Digital strategy and organizational transformation. Chap. V. World Scientific Publishing, pp 1–17 4. Vinge V (1993) The coming technological singularity: how to survive in the post-human era. In: Vision 21: interdisciplinary science and engineering in the era of cyberspace. NASA, Lewis Research Center, pp 11–22 5. Kurzweil R (1999) The age of spiritual machines: when computers exceed human intelligence. Viking Press, London 6. Tugui A, Danciulescu D, Subtirelu M (2019) The biological as a double limit for artificial intelligence: review and futuristic debate. Int J Comput Commun Control 14(2):253–271 7. Schwartz JT (1986) The limits of artificial intelligence. In: Shapiro S, Eckroth D (eds) Technical Report #212, New York University. Encyclopedia of artificial intelligence. Wiley, Hoboken 8. Lungarella M, Iida F, Bongard JC, Pfeifer R (2007) AI in the 21st century: with historical reflections. In: Lungarella M, Iida F, Bongard JC, Pfeifer R (eds) 50 years of artificial intelligence. Essays dedicated to the 50th anniversary of artificial intelligence. Springer, Heidelberg, pp. 1–8 9. Tugui A (2014) Cloud computing: a calm technology for humans-business-environment triad. J Res Pract Inform Technol 46(1):31–45 10. Fink A (2014) Conducting research literature reviews: from the internet to paper, 4th edn. Sage Publications, New York 11. Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gøtzsche PC, Ioannidis JPA et al (2009) The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. PLoS Med 6(7):e1000100. https://doi. org/10.1371/journal.pmed.1000100 12. Hauer T (2022) Importance and limitations of AI ethics in contemporary society. Hum Soc Sci Commun 9(1):272. https://doi.org/10.1057/s41599-022-01300-7 13. Olhede SC, Wolfe PJ (2018) The growing ubiquity of algorithms in society: implications, impacts and innovations. Phil Trans R Soc A 25:364. https://doi.org/10.1098/rsta.2017.0364 14. Tugui A (2023) Limits of humanoid robots based on a self-literature review of AI’s limits. In: International conference on electrical, computer, communications and mechatronics engineering conference proceeding ICECCME 2023, in progress. Tenerife, Spain 15. Tugui A, Genete DL (2007) Could we speak of a fifth wave of IT&C? In: Proceedings of 6th WSEAS international conference on E-activities. Puerto de la Cruz, pp 221–224 16. Mukwaya V, Mann S, Dou H (2021) Chemical communication at the synthetic cell/living cell interface. Commun Chem 4:161. https://doi.org/10.1038/s42004-021-00597-w 17. Universität Bayreuth (2022) Proteins and natural language: artificial intelligence enables the design of novel proteins. ScienceDaily. www.sciencedaily.com/releases/2022/08/220804102 540.htm. Accessed 2 Aug 2023 18. Trafton A (2016) A programming language for living cells. MIT News. https://news.mit.edu/ 2016/programming-language-living-cells-bacteria-0331. Accessed 2 Aug 2023
Chapter 4
Enhancement of Physical Layer Security for Relay Aided Multihop Communication Using Deep Transfer Learning Technique C. Vachana and K. Saraswathi
1 Introduction The spectacular proliferation of smart gadgets has resulted in a considerable increase in data traffic in mobile wireless communication [1, 2]. By 2023, there will be 3.6 mobile devices per person on average, as predicted by Cisco [3]. The scarcity of spectrum is thus a significant issue for wireless networks. Future cellular communications will likely use D2D communication to improve frequency efficiency since it allows close pairs of devices to interact without routing via a base station (BS) [4]. High spectrum efficiency, minimal latency, and low power consumption are advantages of D2D communications [5]. Two D2D communication approaches are inband communication and outband D2D communications based on the exploitation of spectrum bands. In the inband, D2D users are allowed to coexist in the same spectrum band as cellular users; however in the outband, D2D users use the unlicensed spectrum band [6]. Two more categories—overlay and underlay—can be used to further categorize inband D2D communication. In particular, overlay D2D communications separate the cellular network’s spectrum band into non-overlapping frequency sets, with cell phone users using single and device users using another. Therefore, interference control between cellphone user and device users will not be necessary in overlay device transmission. In D2D communications underlay, however, since cellphone and device users include equal frequency, interference control is essential [7].
C. Vachana (B) · K. Saraswathi Department of Electronics and Telecommunication Engineering, RV College of Engineering, Bengaluru, Karnataka, India e-mail: [email protected] K. Saraswathi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 S. Lanka et al. (eds.), Trends in Sustainable Computing and Machine Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-9436-6_4
33
34
C. Vachana and K. Saraswathi
Wireless networks are now often used to deliver vital and secure information in everyday applications. For open wireless medium’s presenting nature, security is therefore seen as a crucial concern for future wireless technology, including fifthgeneration (5G) networks [8]. Using conventional cryptography techniques, however, is insufficient and even not appropriate in light of the rise of mobile Internet, as private key exchanges require a route that is even more secure [9]. Information must be protected from unauthorized devices using brand-new security techniques from the foundations of information theory, which concentrate on the propagation channel’s secrecy capability [10]. Wyner [11] originally investigated physical layer security (PLS), which is regarded as innovative approach to improve wireless security via merely utilizing the characteristics of wireless channels, including fading, noise, and interference. In PLS, the benefits of cooperative communications have been considerably investigated [12]. More particularly, cooperative jamming (CJ) techniques have been extensively employed to enhance secrecy and decrease the amount of information that eavesdroppers can wiretap in wireless transmission [13]. Even though jamming signals can be considered undesirable interference for legitimate users, they can be utilized to secure wireless communication transmissions [14]. To offer secure and private communication data in wireless networks beyond 5G, PLS can be coupled with other security technologies [15, 16]. Cooperative beamforming [17], generated noise [18], and multi-antenna beamforming [19] are only a few of researches to make wiretapped signals of worse quality at the listener [20]. Machine learning has been utilized by wireless communication systems and wireless networks for resource distribution, allocation of power, dense prediction, and estimating channel [21]. Due to the size of data, non-labeled data, system difficulty, lengthy learning, susceptibility of cellphone jammers and eavesdroppers, and the operational wireless circumstances, user portability, and network organization are all continually changing. The problems of machine learning for mobile devices are numerous [22]. Deep transfer learning, which uses the information from previous tasks that are comparable to the ones at hand to teach learners how to solve new issues, is emerging as one of the practical answers to these difficulties. Furthermore, it is crucial to remember that deep transfer learning has been studied under a number of names, including learning to learn, multitask learning, continuous learning, incremental learning, and cumulative learning [23]. The learning model is deployed and fine-tuned utilizing limited data from a new context after being educated using the data in an earlier trained setting utilizing traditional deep learning techniques, according to the paper’s deep transfer learning proposal [24, 25]. The remainder of the work is structured as follows: the D2D communication design. It is shown in Sect. 2. The third portion investigates the likelihood of cellular network outages, evaluates D2D users’ secrecy behavior in the context of SOP, and evaluates the probability of nonzero secrecy capacity. Section 3 describes deep transfer learning techniques. The main results are discussed in Sect. 5. The study is concluded in Sect. 6, which lists the key contributions and outcomes of the work.
4 Enhancement of Physical Layer Security for Relay Aided Multihop …
35
2 Model of D2D Communication System Think of an instance where a spectral band is distributed by both the cellphone network and the device network within a particular environment, which is reflected in Fig. 1, in a D2D relay connection during a downward communication. The D2D connection consists of a eavesdropper (E) with many antennas, a decodeand-forward (DF) D2D relay, device transmitter T, and device receiver D. Imagine a cellular network where the BS and NB each have several antennas but the C only has one. It’s crucial to keep in mind that the recommended scenario’s extensive shadowing prevents the exact communication between the devices. Therefore, the sole means of transmission between transmitter and receiver is the relays. D2D communications thus need two stages. The D2D relays completely decipher the signals from T that they have received during phase one. Using the utmost secrecy capability in phase two, to send decoded signal to D, the desired device relay is selected from the bulk of devices. E may wiretap relay’s communication at this time. In both phases, the BS broadcasts. In addition, consider that all communication channels exhibit Rayleigh-based flat fading. Channel coefficients of transmitter, relay, cellular, base station, eavesdropper, and receiver connections are designated, respectively, as h trk , h tc , h bc , h be , h rke , h brk , h bd , and h rkd . Additionally, |hab|2 , are random variables with an exponential distribution and a mean of ab, represent the channel power increases. Additionally, dab represents the Euclidean distance.
Fig. 1 Model of D2D communication system [6]
36
C. Vachana and K. Saraswathi
3 Deep Transfer Learning Technique In deep learning, a subfield of machine learning, artificial neural networks are employed to model and resolve complex problems. In this, pathways of neurons are composed of multiple layers, allowing them to learn and represent intricate patterns and features from data. Deep learning has proven to perform exceptionally well in a number of applications, including recognition of speech, image identification, and processing natural languages. Deep learning models in Fig. 2 typically have a data input layer and several hidden layers, including an output layer among other layers. Intensity of the architecture allows for hierarchical feature extraction, enabling the model to learn complex representations. Deep learning is excellent at immediately learning pertinent features and representations from raw data. As a result, manual feature engineering is frequently not necessary. It has enabled machines to perform tasks that were previously considered challenging for traditional algorithms. Deep learning models require a significant amount of labeled data for training, especially for complex tasks. Obtaining and annotating large datasets can be timeconsuming, expensive, and sometimes impractical. Training deep learning models, especially large ones with many layers and parameters, requires substantial computational resources. Deep transfer learning is especially useful when you have limited data for the target task, as it allows you to take advantage of the knowledge learned from a related task with a larger dataset. Deep transfer learning in Fig. 3 begins with a system that has already been developed on a large dataset for a specific task. These models are trained on massive datasets, enabling them to learn useful features, representations, and patterns. Reduced training time and enhanced neural network performance are two of deep transfer learning’s primary benefits, while it also has other benefits and not require large data. When building a neural network from scratch requires a lot of data, yet getting accessibility to such data might not be practical, deep transfer learning might be helpful. Traditional ML models require well-structured and labeled data for training. Feature engineering is a critical step in extracting relevant information effectively. It typically consists of shallow architectures with a limited number of layers. These
Fig. 2 Deep learning from the scratch [7]
4 Enhancement of Physical Layer Security for Relay Aided Multihop …
37
Fig. 3 Deep transfer learning with pre-trained network [7]
models are designed to learn from structured data. Deep transfer learning can work with both structured and unstructured data. It benefits from large, diverse pretraining datasets and can generalize to new tasks with smaller labeled datasets. These networks are capable of automatically learning intricate hierarchical features from raw data, reducing the need for manual feature engineering. Traditional machine learning (ML) versus deep transfer learning is shown in Fig. 4.
4 Performance Analysis Parameters The proposed work investigates deep transfer learning for physical layer security of underlay multihop device-to-device (D2D) communication. Physical layer security is investigated for D2D communication to evaluate the secrecy performance for the parameter. The secrecy outage probability (SOP) investigates the confidentiality of data transmission. The probability of nonzero secrecy (PNSC) is evaluated which determines the positive secrecy and investigates the channel capacity. The cellular network performance is evaluated in terms of cellular outage probability (COP) which investigates reliable connectivity of the network.
4.1 The Secrecy Outage Probability (SOP) The D2D secrecy capability, C s , and the SOP are examined. It is defined as the metric that quantifies the probability of an eavesdropper successfully decoding the confidential information transmitted over a communication channel. When the needed secrecy rate, C s , is lower than the preset secrecy rate, Rs , a secrecy outage event
38
C. Vachana and K. Saraswathi
Fig. 4 Traditional machine learning (ML) versus deep transfer learning [7]
happens. The likelihood of a secrecy outage incident occurring is SOP. SOP = Pr (Cs ≤ Rs )
(1)
Secrecy outage probability represented for different relays. 1+γD 1 log2 < Rs = Pr N 1+γE
Ps,out
(2)
where Ps,out is the secrecy outage probability, N is the number of relays, γ D is SNR at receiver, γ E is SNR at eavesdropper, Rs is the pre-defined secrecy rate. Increasing the number of relays in a communication system can have an impact on SOP, which is a measure of the secure communication link failing to achieve the desired level of secrecy or confidentiality. In relay-assisted communication systems, multiple intermediate nodes are used to transmit signals from the device transmitter to
4 Enhancement of Physical Layer Security for Relay Aided Multihop …
39
device receiver. These relays can potentially introduce diversity in the communication process, which can be leveraged to improve secrecy.
4.2 The Probability of Nonzero Secrecy Capacity (PNSC) PNSC is defined as a positive secrecy. It is a metric used in wireless communication systems to quantify the likelihood that a secure communication link achieves a positive secrecy capacity. Positive secrecy capacity indicates that it is possible to transmit information securely even in the presence of eavesdroppers. PNSC can be achieved when (γ D) SNR at destination (receiver) is greater than the (γ E) SNR at eavesdropper. PNSC = γ D > γ E
(3)
The probability of positive secrecy can be formulated as Pr (Cs > 0) = Pr
1+γD 1 log2 >0 N 1+γE
(4)
where, C s is the positive secrecy capacity, N is the number of relays, γ D is SNR at receiver, γ E is SNR at eavesdropper. The formulation implies that positive secrecy is the complement of the error probability at the specified SNR threshold. In other words, it quantifies the probability that the error rate at the specified SNR is below a certain level, allowing the legitimate receiver to successfully decode the transmitted information.
4.3 The Cellular Outage Probability (COP) Cellular outage probability quantifies that a communication link does not meet a certain signal quality threshold. It refers to the likelihood that a user in a mobile network will experience loss in a service due to poor signal quality, indicates the probability that the signal-to-interference-plus-noise ratio (SINR) will reach less than a threshold, rendering the communication link unreliable and causing an outage. The cellular outage probability, Pout , is represented as Pout = Pr (γ BC ≤ ϑc) = Fγ BC (ϑc)
(5)
where γ BC is the desired SNR at C and vc = 2 Rc − 1, Rc is data transmission rate. The BS uses an antenna selection strategy to preserve the variety and durability benefits of numerous antennas while avoiding the significant hardware complexity. Multiple antennas can take advantage of the spatial separation between antennas to
40
C. Vachana and K. Saraswathi
improve the SNR of received signals. This is particularly useful in environments with high levels of noise or interference.
5 Results and Discussions The system model of multihop D2D communication has been developed by considering ten D2D users. In this model, data transmission between D2D transmitter and D2D receiver takes place in three hops with the communication range of 20 m.
5.1 The Secrecy Outage Probability Analysis Secrecy outage probability is a metric that quantifies the probability of an eavesdropper successfully decoding the confidential information transmitted over a communication channel. When secrecy capacity falls below the predetermined secrecy rate, Rs , a secrecy outage event happens. Secrecy outage probability is the likelihood that an event causing a secrecy outage will occur (Table 1). The SOP of the D2D Communications with different values of SNR dB is investigated in Fig. 5 and is examined via Monte Carlo simulations. Let target SNR be assumed as 10 dB, the noise variance for all the nodes is normalized as unit, where the pre-defined secrecy rate, Rs is 1 b/s/Hz according to IEEE 802.15.8 standard of D2D Communication. In relay-assisted D2D communication, relays act as intermediaries between the direct device transmitter and legitimate receiver. The relays help to extend the communication range, improve signal quality, and potentially mitigate the effect of eavesdroppers. As the relay count increases, potential for improved performance related to secrecy grows. The SOP of the D2D communications for different number of relays with increasing value of SNR in dB is investigated in Fig. 6 and is validated through Monte Carlo simulations. Let the target SNR is assumed as 10 dB, the pre-defined secrecy Table 1 Simulations assumptions for inband underlay D2D users [8]
Parameters
D2D users
Number of UE’s
7
Time slot for each transmission
1000
Max transmit power (UE’s)
24 dBm
Min transmit power (UE’s)
− 40 dBm
Noise figure
7 dB
4 Enhancement of Physical Layer Security for Relay Aided Multihop …
41
Fig. 5 Secrecy outage probability for different SNRs in dB
rate, Rs is 1 b/s/Hz. It has been noticed that SOP reduces as relay count rises, demonstrating impact of cooperative communications on improving the devices secrecy performance.
5.2 The Probability of Nonzero Secrecy Capacity Analysis When a wireless communication system has more relays, the PNSC represents the possibility that secure communication connection is able to maintain a positive secrecy capacity despite the existence of listeners. With more relays involved in D2D communication, there is a greater chance for increased secrecy performance. The PNSC of the D2D Communications for different number of relays with increasing values of SNR in dB is investigated in Fig. 7. Lets assume target SNR as 10 dB, the pre-defined secrecy rate, Rs is 1 b/s/Hz and the Number of Eavesdropper antennas, NE be 3. The signal quality of the transmitted channel improves as SNR increases, resulting in improving the possibility of reaching positive secrecy capacity.
42
C. Vachana and K. Saraswathi
Fig. 6 Secrecy outage probability for different number of relays
Fig. 7 Probability of nonzero secrecy capacity for different relay count
4 Enhancement of Physical Layer Security for Relay Aided Multihop …
43
5.3 Cellular Outage Probability Analysis Cellular outage probability refers to the likelihood that a user in a mobile network will experience a undesired service due to poor signal quality. The Cellular outage probability for different values of SINR in dB to evaluate the performance of cellular network is investigated in Fig. 8. For the performance analysis assume target SINR be 2 dB, Number of Base station antenna as 2, The maximum transmit power of base station and user is assumed as 24 dBm according to IEEE 802.15.8 standard. The plot indicates decrease in cellular outage probability with increase in SINR implies increased signal quality in turn increases reliability by ensuring the adequate network capacity (Table 2).
Fig. 8 Cellular outage probability for different SINRs in dB
Table 2 Simulations assumptions for cellular network (IEEE 802.15.8 standard) [8]
Parameters
Cellular network
Number of UE’s
7
Time slot for each transmission
1000
Max transmit power (UE’s)
24 dB
Min transmit power (UE’s)
− 40 dBm
Noise figure
5 dB
44
C. Vachana and K. Saraswathi
In cellular networks, increasing the number of base station antennas enhances spatial diversity. Spatial diversity is the concept that different antennas receive signals that have taken different paths through the wireless channel due to reflection, diffraction, and scattering. With more antennas, base stations can transmit signals more precisely toward specific users, reducing interference, improving the overall signal quality and reducing outage probability. The cellular outage probability for different numbers of base station antennas to evaluate the performance of cellular network in terms of signal quality and coverage area is investigated in Fig. 9. Lets assume the target SINR as 2 dB. The plot indicates decreased cellular outage probability with increase in base station antenna, which implies coverage area increases and more antennas can help to reduce signal degradation resulting in reliable communication. The cellular outage probability for different values of SNR in dB with number of base station antennas is investigated in Fig. 10. From the plot, it is observed that at the extremely low SNR conditions, having more antennas might not yield significant improvements, as the received signal is still dominated by noise.
Fig. 9 Cellular outage probability for different base station antennas
4 Enhancement of Physical Layer Security for Relay Aided Multihop …
45
Fig. 10 Cellular outage probability for increasing base station antennas
5.4 Deep Transfer Learning Technique Analysis Deep transfer learning involves extracting the data from the D2D transmitted signal at the receiver’s end that includes communication data in the presence of eavesdroppers as shown in Fig. 11. This dataset contains both legitimate communication pairs and eavesdropped pairs. Preprocess the dataset, including noise addition to simulate eavesdropping, data augmentation, and necessary normalization. For relay-assisted communication system model of ten D2D users, to detect the eavesdropper signal at legitimate receiver, the received signal along noise and the eavesdropper presence for 1000 Monte Carlo simulation is exported to the excel as eavesdropper data and eavesdropper labels to test the received signal. Initialize the pre-trained model with the weights learned from the previous step. Fine-tuned the model using the communication dataset that includes eavesdropper presence. The goal is to make the model learn to differentiate between legitimate communication and eavesdropped communication. Train the fine-tuned model in multiple epochs. For simulation of dataset, ten epochs are considered as shown in Fig. 12. Each epoch involves passing the entire dataset with learning rate of 0.01 using Adam optimizer through the model, updating the model’s weights, and iteratively refining its performance.
46
Fig. 11 Detection of eavesdropper presence in the received signal
Fig. 12 Fine-tuned model in ten epochs
C. Vachana and K. Saraswathi
4 Enhancement of Physical Layer Security for Relay Aided Multihop …
47
Designing a loss function encourages the model to enhance security. This could involve minimizing the reconstruction error for legitimate communication pairs while maximizing the difference between legitimate and eavesdropped pairs. Applying regularization techniques like dropout, L2 regularization, and batch normalization can help improve the model’s generalization and prevent overfitting. Monitored the model’s performance on validation and test sets. Tracked metrics such as accuracy and loss. A crucial tool in assessing the effectiveness of a deep learning or machine learning model for classification is the confusion matrix. It gives a thorough evaluation of how well the model performs in sorting instances into various groupings or categories. A confusion matrix is plotted in Fig. 13 for actual and predicted received signal of the testing sets for the accuracy and precision of learning rate obtained from 10 epochs. This confusion matrix is used to assess security and overall performance. As the model trains, it adapts its knowledge of the presence of eavesdroppers. The transfer learning aspect ensures that the model leverages its understanding of legitimate communication while adjusting to eavesdropper-induced noise. The confusion matrix provides the accuracy of the true received signal, which helps to distinguish the eavesdropper signal and improves secrecy.
Fig. 13 Confusion matrix
48
C. Vachana and K. Saraswathi
6 Conclusion In the paper, downlink transmission scenario is considered for underlay D2D communication, where cellular network and D2D network share same spectral band; hence, interference management is crucial. The performance of cellular network is evaluated for cellular outage probability with the increasing number of base station antennas and signal-to-noise ratio (SNR). It is observed that increasing SNR reduces the cellular outage probability by improving the signal quality of the network. The D2D secrecy performance is evaluated in terms of SOP for relay-assisted network. The simulation results conclude that relays mitigate the impact of eavesdropper and improve secrecy rate. The deep Transfer Learning technique is implemented for underlay communication. The performance metrics assess the security by utilizing pre-trained network with reduced energy consumption. Deep transfer learning has been found to improve PLS (user privacy rates) in wireless environments with less latency.
References 1. Das A, Das N (2020) Multihop D2D communication to minimize and balance SAR in 5G. In: 2020 international conference on COMmunication Systems & NETworkS (COMSNETS), Bengaluru, India, pp 590–593. https://doi.org/10.1109/COMSNETS48256.2020.9027453 2. Gismalla MSM et al (2022) Survey on device to device (D2D) communication for 5GB/6G networks: concept, applications, challenges, and future directions. IEEE Access 10:30792– 30821. https://doi.org/10.1109/ACCESS.2022.3160215 3. Shaikh FS, Wismüller R (2018) Routing in multi-hop cellular device-to-device (D2D) networks: a survey. IEEE Commun Surv Tutor 20(4):2622–2657. https://doi.org/10.1109/COMST.2018. 2848108 4. Penda DD, Wichman R, Charalambous T, Fodor G, Johansson M (2019) A distributed mode selection scheme for full-duplex device-to-device communication. IEEE Trans Veh Technol 68(10):10267–10271. https://doi.org/10.1109/TVT.2019.2932046 5. Huang J, Xing C-C, Qian Y, Haas ZJ (2018) Resource allocation for multicell device-todevice communications underlaying 5G networks: a game-theoretic mechanism with incomplete information. IEEE Trans Veh Technol 67(3):2557–2570. https://doi.org/10.1109/TVT. 2017.2765208 6. Moualeu JM, Ngatched TMN (2019) Relay selection strategies for physical-layer security in D2D-assisted cellular networks. In: 2019 IEEE 90th vehicular technology conference (VTC2019-fall), Honolulu, HI, pp 1–7. https://doi.org/10.1109/VTCFall.2019.8891400 7. Rawat DB (2021) Deep transfer learning for physical layer security in wireless communication systems. In: 2021 third IEEE international conference on trust, privacy and security in intelligent systems and applications (TPS-ISA), Atlanta, GA, pp 289–296. https://doi.org/10.1109/TPS ISA52974.2021.00033 8. Khoshafa MH, Ngatched TMN, Ahmed MH (2021) Reconfigurable intelligent surfaces-aided physical layer security enhancement in D2D underlay communications. IEEE Commun Lett 25(5):1443–1447. https://doi.org/10.1109/LCOMM.2020.3046946 9. Khuntia P, Hazra R (2018) Device-to-device communication aided by two-way relay underlaying cellular network. In: 2018 international conference on wireless communications, signal processing and networking (WiSPNET), Chennai, India, pp 1–6. https://doi.org/10.1109/WiS PNET.2018.8538501
4 Enhancement of Physical Layer Security for Relay Aided Multihop …
49
10. Wang W, Teh KC, Li KH (2017) Enhanced physical layer security in D2D spectrum sharing networks. IEEE Wireless Commun Lett 6(1):106–109. https://doi.org/10.1109/LWC.2016.263 4559 11. Khoshafa MH, Ngatched TMN, Ahmed MH, Ibrahim A (2020) Enhancing physical layer security using underlay full-duplex relay-aided D2D communications. In: 2020 IEEE wireless communications and networking conference (WCNC), Seoul, Korea (South), pp 1–7. https:// doi.org/10.1109/WCNC45663.2020.9120626 12. Liu J, Nishiyama H, Kato N, Guo J (2016) On the outage probability of device-to-devicecommunication-enabled multichannel cellular networks: an RSS-threshold-based perspective. IEEE J Sel Areas Commun 34(1):163–175. https://doi.org/10.1109/JSAC.2015.2452492 13. Zhang A, Lin X (2017) Security-aware and privacy-preserving D2D communications in 5G. IEEE Netw 31(4):70–77. https://doi.org/10.1109/MNET.2017.1600290 14. Dun H, Ye F, Jiao S, Li Y, Jiang T (2019) The distributed resource allocation for D2D communication with game theory. In: 2019 IEEE-APS topical conference on antennas and propagation in wireless communications (APWC), Granada, Spain, pp 104–108. https://doi.org/10.1109/ APWC.2019.8870437 15. Zhang P, Kang X, Li X, Liu Y, Wu D, Wang R (2019) Overlapping community deep exploringbased relay selection method toward multi-hop D2D communication. IEEE Wireless Commun Lett 8(5):1357–1360. https://doi.org/10.1109/LWC.2019.2917907 16. Lin Z, Du L, Gao Z, Huang L, Du X, Guizani M (2016) Analysis of discovery and access procedure for D2D communication in 5G cellular network. In: 2016 IEEE wireless communications and networking conference, Doha, Qatar, pp 1–6. https://doi.org/10.1109/WCNC. 2016.7564761 17. Jose J, Agarwal A, Gangopadhyay R, Debnath S (2019) Outage analysis based channel allocation for underlay D2D communication in fading scenarios. In: 2019 international conference on wireless communications signal processing and networking (WiSPNET), Chennai, India, pp 485–490. https://doi.org/10.1109/WiSPNET45539.2019.9032748 18. Muthanna A, Ateya AA, Balushi MA, Kirichek R (2018) D2D enabled communication system structure based on software defined networking for 5G network. In: 2018 international symposium on consumer technologies (ISCT), St. Petersburg, Russia, pp 41–44. https://doi.org/10. 1109/ISCE.2018.8408913 19. Cai Y, Ke C, Ni Y, Zhang J, Zhu H (2021) Power allocation for NOMA in D2D relay communications. China Commun 18(1):61–69. https://doi.org/10.23919/JCC.2021.01.006 20. Lee J, Lee JH (2019) Performance analysis and resource allocation for cooperative D2D communication in cellular networks with multiple D2D pairs. IEEE Commun Lett 23(5):909– 912. https://doi.org/10.1109/LCOMM.2019.2907252 21. Wang J, Ma J, Li Y, Liu X (2021) D2D communication relay selection strategy based on twohop social relationship. In: 2021 IEEE 4th international conference on electronic information and communication technology (ICEICT), Xi’an, China, pp 592–595. https://doi.org/10.1109/ ICEICT53123.2021.9531330 22. Huang X, Feng D, Xiao S, He C (2019) Power-spectrum trading for full-duplex D2D communications. In: 2019 11th international conference on wireless communications and signal processing (WCSP), Xi’an, China, pp 1–5. https://doi.org/10.1109/WCSP.2019.8927897 23. Wang L, Shi Y, Chen M, Cui J, Zheng B (2017) Physical layer security in D2D communication system underlying cellular networks. In: 2017 9th international conference on wireless communications and signal processing (WCSP), Nanjing, China, pp 1–5. https://doi.org/10. 1109/WCSP.2017.8171115 24. Shamganth K, Sibley MJN (2017) A survey on relay selection in cooperative device-to-device (D2D) communication for 5G cellular networks. In: 2017 international conference on energy, communication, data analytics and soft computing (ICECDS), Chennai, India, pp 42–46. https://doi.org/10.1109/ICECDS.2017.8390216 25. Ahmed RE (2021) A novel multi-hop routing protocol for D2D communications in 5G. In: 2021 IEEE 11th annual computing and communication workshop and conference (CCWC), NV, USA, pp 0627–0630. https://doi.org/10.1109/CCWC51732.2021.9375946
Chapter 5
Marine Vessel Trajectory Forecasting Using Long Short-Term Memory Neural Networks Optimized via Modified Metaheuristic Algorithm Ana Toskovic , Aleksandar Petrovic , Luka Jovanovic , Nebojsa Bacanin , Miodrag Zivkovic , and Milos Dobrojevic
1 Introduction Global transportation and trade are heavily dependent on maritime vessels. During transit, massive amounts of data are generated to describe the vessel’s movement and behavior. Most of the data is generated to maintain the course of the vessel but it can be used for various functions. The maritime route data importance can be traced to the early history and geographical discoveries. The foundation for modern maritime systems relies on geographical location, topographical analyses, and the history of exploring routes over water [1]. A. Toskovic · A. Petrovic · L. Jovanovic · N. Bacanin (B) · M. Zivkovic · M. Dobrojevic Teacher Education Faculty, University of Pristina in Kosovska Mitrovica, 38220 Kosovska Mitrovica, Serbia e-mail: [email protected] A. Toskovic e-mail: [email protected] A. Petrovic e-mail: [email protected] L. Jovanovic e-mail: [email protected] M. Zivkovic e-mail: [email protected] M. Dobrojevic e-mail: [email protected] A. Petrovic Faculty of Informatics and Computing, Singidunum University, 11010 Belgrade, Serbia © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 S. Lanka et al. (eds.), Trends in Sustainable Computing and Machine Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-9436-6_5
51
52
A. Toskovic et al.
Recent advancements in data analysis allowed for the extraction of higher-quality features. For example, trajectory point-based vessel behavior can be modeled by utilizing density-based spatial clustering of applications with noise (DB-SCAN). Applying this technique can provide information about stops that were not planned, route deviation, and speed inconsistencies. This information can be used to increase situational awareness, prevent collisions, safer navigation, and plan routes. ‘Big’ and ‘semantic’ trajectory concepts are recognized through the use of rich trajectory datasets combining semantic and spatiotemporal aspects [2]. This results in improved comprehension of data revealing human, animal, and object patterns. Big data and edge computing are in close relation and considering the previously stated the problem of vessel trajectory prediction can be solved by edge computing in real time [3]. By doing so, the predictions are based on a combination of historical data and the real-time incident identification mechanism. The automatic identification system (AIS) is a better approach due to the limitations of the most used global positioning system (GPS) for purposes of naval travel and the irregularities of such which occur due to its nature [4, 5]. Additionally, irregularities that are observed can indicate illegal activity as well. Fishing crews turn off their AIS transponders and perform such activities using illegal gear or fishing techniques which results in catastrophic consequences for nature. In this study, an LSTM neural network-based approach is suggested for optimization by an augmented variation of the firefly algorithm (FA) for vessel trajectory forecasting. The LSTM is the main predictor, and its hyperparameters are optimized by the FA. The swarm intelligence sub-group of metaheuristic solutions known for their capabilities of solving problems of nondeterministic polynomial time hardness (NP-hard) has been extensively applied to this issue. Most of the problems in the real world are of NP-hard complexity which is why the importance of swarm-based solutions is high. The FA algorithm belongs to this group of algorithms while having remarkable performance making it suitable for optimization of NP-hard problems and hybridization with other metaheuristic solutions. The proposed solution has been rigorously tested against high-performing meta-heuristic solutions. The proposed framework is used for trajectory multivariate forecasting of fishing vessels. This work’s primary contributions are listed in the following text: • The application of a novel and reliable framework for the solution of critical issues in maritime vessel tracking. • Proposal of an LSTM network for time series prediction of trajectories in marine transport. • Introduction of an altered version of the FA for hyperparameter tuning of the model hyperparameters. The following organization is used for the paper: Sect. 2 introduces the techniques used in this work, Sect. 3 provides the details on the original FA solution and the performed modifications, Sect. 4 is used for explaining the experimental approach, while the Sect. 5 provides the results of those experiments, and the Sect. 6 finalizes the paper.
5 Marine Vessel Trajectory Forecasting Using Long Short-Term Memory …
53
2 Background The present section provides the fundamentals of the provided research. The basics of the LSTM follow with the description of their parameters coming after. The necessary information on the metaheuristic methods is provided, including their application to real-world problems.
2.1 Long Short-Term Memory Neural Networks A prominent variation of recurrent neural networks (RNNs) is the LSTM, known for its distinctive specialized cells, which employs these cells to preserve data across multiple layers of the network. Memory cells encompass three gates: the forget gate f t , the input gate it , and the output gate ot , with t representing the current time step. These gates regulate the state of the memory cells S t . Collaboratively, they steer the behavior and progression of the LSTM as it advances through time steps t. During the initial stage, the LSTM is tasked with determining which information should be retained and which will be removed from the previous cell state. To achieve this, the input sequence undergoes processing using a sigmoid activation function, mapping values to a range between 0 and 1. This sigmoid transformation contributes to deciding which information from the prior cell state to keep. This function is applied during the forget and input gates. The computation of forget gate values f t is carried out using the following formula: ) ( f t = singmoid Wx , xt + Wh , h t − 1 + b f ,
(1)
where W f , x and W f , h mark the weight matrices, x t marks the input vector at time step t, h 1 is the output at time step t 1 which are multiplied by the weight matrices, and bf marks the bias vector of the forget gates which is added after. The output is binary, with 0 indicating to forget the data, while 1 shows otherwise. In the subsequent stage, the LSTM architecture distinguishes the information that holds sufficient importance to be integrated into the cell state S t . This determination is influenced by the sigmoid function. Additionally, the inputs are adjusted to fall within a range of [− 1, 1] through the application of the hyperbolic tangent tanh transformation. Subsequently, the candidate value S˜t and the initialization value it are computed and employed for the input gates. The ensuing equations are utilized to derive these indicated values: ∼ ) ( St = tanh Ws˜,x xt + Ws˜,h xt + bs˜ ,
(2)
( ) i t = sigmoid Wi,x xt + Wi,h xt + bi ,
(3)
54
A. Toskovic et al.
where Ws˜,x , Ws˜,h , W i,x , and W i,h denote the weight matrices, bs˜ and bi are bias vectors. Next, the status of st is calculated: ∼
st = f t ◦ st−1 + i t ◦ st ,
(4)
where marks the Hadamard product. Lastly, the output ht is determined using the sigmoid and tanh activation functions and the subsequent formulas: ) ( ot = sigmoid Wo,x xt + Wo,h h t−1 + bo ,
(5)
h t = ot ◦ tanh(st ),
(6)
the W o,x and W o,h are representing weight matrices, bo signifies output gate bias vector.
2.2 Metaheuristic Methods and Related Works Machine learning optimization popularity has recently significantly increased. This is due to many reasons some of which include the rising complexity of models as they become more advanced and the growing hyperparameter number. The process of tuning hyperparameters has been performed by trial and error but due to the mentioned complexity of models, this is no longer achievable in a rational amount of time classifying this an NP-hard problem. The aim is to maintain realistic computational requirements while solving NP-hard problems. When the parameter selection problem is treated as an optimization task the metaheuristic algorithms are applicable for drastic performance improvements. A sub-group of these algorithms which is identified by its cooperating behavior has been further distinguished as excellent optimizers which are the swarm metaheuristics. Often applied solutions for optimization from this group include the FA [6], particle swarm optimization (PSO) [7], Harris hawk optimizer (HHO) [8], genetic algorithm GA [9], reptile search algorithm (RSA) [10], bat algorithm (BA) [11], and artificial bee colony (ABC) [12] algorithm. Furthermore, a Chimp optimization algorithm (ChOA) [13] recently introduced can be applied in the same way. Many use cases on challenging issues have been recorded supporting the argument that the swarm solutions are widely applied. Internet of things and wireless sensor optimization [14–17], fog, cloud and cloud-edge computing systems organization [18–20], prediction of COVID-19 cases [21, 22], processing and classification of images in healthcare [23], intrusion detection with network and computer systems [24–26], credit card frauds identification [27, 28], feature selection [29, 30], energy production and consumption prediction [31–33], different ML structures tuning [34–38], and air pollution detecting and predicting and environment surveillance [39–41].
5 Marine Vessel Trajectory Forecasting Using Long Short-Term Memory …
55
Some of swarm intelligence approaches for trajectory prediction challenges include the following: flight trajectory predictions [42], urban trajectories of land vehicles [43], vessel trajectories [44], and thrust trajectory forecasting [45].
3 Introduced Modified Metaheuristics A brief outline of the base FA algorithm is provided. The modifications to the FA are described right after as well as the inner workings of the modified approach.
3.1 Firefly Algorithm The initial FA proposed in [6] was inspired by the communication of fireflies. Bioluminescence ability allowing them to create natural light is the key component in communication. The units that emit weaker light are drawn to those with higher intensity and as the fireflies are unisex, gender does not affect this. Brightness is formulated as attractiveness and for the case of equal attraction, the units fly at random. The guiding function is targeted for improvement influencing the light amount emitted by a single unit. The illumination intensity is determined by the guiding function: ( I (r ) =
1 , f (x)
if f (x) > 0 , 1 + | f (x)|, otherwise
(7)
where agents appeal is given as I(x) and the value of guiding function at location x is f (x). The intensity of light drops with distance as per physics and this behavior is translated as the next formula states: I (r ) =
I0 , 1 + γ r2
(8)
where the illumination at distance r is I(r) and the vigor at the light source is I 0 . The coefficient of absorption is shown as γ for the case where nearby surfaces absorb light. The Gaussian form describes the inverse square influence for said parameters and given γ according to the next equation: I (r ) =
I0 . 1 + γ r2
(9)
56
A. Toskovic et al.
The consequences of distance on the intensity of light are provided as follows: β(r ) = β0 · e−γr , 2
(10)
for which the r = 0 as distance and the attractiveness at such distance is β 0 . In most cases, it is encouraged to replace Eq. (10) with Eq. (11). β(r ) =
β0 . 1 + γ r2
(11)
Stronger emitting individual j moving at t + 1 for each iteration attracts a random unit i that is searched for and described in the next equation: ) 2 ( xit+1 = xit + β0 · e−γri, j x tj − xit + α t (κ − 0.5)
(12)
in which α is the randomization parameter, κ marks the random number by Gaussian, r i,j is the distance between two units i and j. This algorithm achieves best results with values [0, 1] for α and 1 for β. The Cartesian distance r i,j gets measured by the following equation:
ri, j
[ | D ∑( || || | )2 xi,k − x j,k , = ||xi + x j || = √
(13)
k=1
where D presents the challenge-specific parameters.
3.2 Modified FA Approach The basic form of FA serves as a sturdy fine-tuning algorithm. However, it does display certain shortcomings, as evident during extensive simulations on widely recognized Congress on Evolutionary Computation (CEC) benchmark functions [46]. Firstly, there is room for enhancement in terms of exploration prowess and the convergence speed of BA [11]. Another identified concern is that in certain runs, the algorithm may become trapped in locales housing local optima. Should this occur, the algorithm will prematurely converge within a suboptimal realm, bypassing the zone harboring the most optimal outcomes. With the goal of rectifying these imperfections within the fundamental FA variant, this study introduces two adjustments, which are elaborated upon in the subsequent sections.
5 Marine Vessel Trajectory Forecasting Using Long Short-Term Memory …
57
The initial modification introduced involves incorporating a technique known as chaotic elite learning. This strategy assists the algorithm in evading untimely convergence toward erroneous sections of the region of interest, thus rendering the algorithm effective and proficient in uncovering promising regions. The enhancement of the agent’s performance is accomplished by employing a chaotic sequence to generate novel agents in close range. The given approach aids in maintaining the population’s diversity while additionally facilitating a smoother exit from suboptimal regions. The proposed alteration centers around the utilization of a logistic map, as depicted: ci+1 = 4 × ci × (1 − ci ),
(14)
where i denotes iteration count, ci marks the chaotic value in iteration i, and the starting rate of c0 is randomly chosen for a [0, 1] distribution. The best solution Pi is therefore updated using the described procedure as follows: Pi,' j = Pi, j + rand × (2 × ci − 1),
(15)
where, Pi,' j is connected to the jth element for the best updated agent. To enhance the process of updating, this paper considers the utilization of the Levy flight principle, in contrast to relying on a random walk which may not be the most effective agent [47, 48]. The Levy flight principle has demonstrated remarkable efficiency [49, 50], also harnessed here to refine the global scouting process. Each agent is granted the opportunity to occasionally engage in a longer flight distance, enabling escapes from suboptimal regions and contributing to the overall solution performance. With the algorithm converging and refining its focus to the promising domain, the flight distance gradually decreases over time, rendering the need for larger jumps unnecessary at this stage. This approach is elucidated by Eqs. (16) and (17). X i,' j = X i,best j × L × e, e=
− → a . 2
(16)
(17)
Here, X i,' j presents the jth element of the ith agent getting altered, further X i,best j is representing the optimal agent guiding of the ith agent. The e value plays a role in scaling, lowering in subsequent executions, a→ is calculated: e=
− → a , 2
(18)
58
A. Toskovic et al.
where in t is representing the ongoing iteration, the value of T denotes the highest allowed count of rounds. Finally, L is the Levy flight distribution, obtained from L=s×
u×φ 1
|ν| τ
.
(19)
Thus, τ is representing the Levy index, s marking a constant set to 0.01, whose role is in suppressing exceedingly lengthy hops, parameters u and ν are arbitrary values in range [0, 1]. Lastly, the φ parameter may be calculated using the next equation: ( φ=
( ) ) τ1 [(1 + τ ) × sin π ×τ 2 , ) ( τ −1 ×τ ×2 2 [ 1+τ 2
(20)
where Γ is representing the gamma function obtainable by ∫
∞
[(1 + τ ) =
x τ e−x dx.
(21)
0
The described algorithms pseudo-code can be observed in 1.
4 Experimental Configuration For performance validation of the suggested method and demonstration of the potential of the optimizer AIS positional data is implemented for prediction of the trajectory of the vessel. This section describes the experimental procedures, dataset, and metrics. Algorithm 1 Pseudo-code for the proposed MFA Set initial control values and create an agent population while T > t do Evaluate agent’s fitness based on predetermined procedure Determine the best solution in the population if φ¡ = 0.5 then Update agents in accordance with the FA procedures else Update agents in accordance with the BA procedures end if Applying a chaotic update to the best solution Update dynamic control parameters’ values end while
.
5 Marine Vessel Trajectory Forecasting Using Long Short-Term Memory …
59
4.1 Datasets and Preprocessing The public US marine cadastre is the source of the dataset for vessel trajectory forecasts.1 The Coast Guard of the USA collected the data from an onboard navigation safety device in the USA as well as international waters. The data used is from 31.03.2022. Because the dataset contained a redundant number of vessels, a single vessel’s data was selected for experimentation by its characteristic of rich data gathered for it. Preprocessing was required before training and testing. Firstly, irrelevant features to trajectory were removed. The remaining data covers latitude and longitude, speed over ground, course over ground, and heading. Furthermore, inconsistencies appear because of the flaws of transmissions, and the data needed to be re-sampled into intervals of 60 s. Missing values were polynomially interpolated. Lastly, the directional speed of the vessel was engineered from existing features in X and Y directions. The complex relationships of heading and trajectory are improved due to the introduction of these two features. Experimentation covered contrasting the efficiency of a couple of acclaimed optimization algorithms with the newly proposed metaheuristic. Every algorithm chooses a set of parameters with the goal of creating the model with the best performance. The original FA [7] and GA [9] were also assessed in addition to the newly introduced metaheuristics. Further, a comparison of well-known optimizers was also conducted, including the PSO [12], and the relatively newly RSA [10] and ChOA [13] were evaluated. Models were given the vessel’s latitude, longitude, speed over ground, course over ground, and heading for trajectory forecasting, and their task was predicting the vessel’s latitude and longitude. In order to facilitate model training, the LSTM models were given data with 16 lags (samples used as inputs for a network) and a batch size of 20 using the Keras module TimeSeriesGenerator. To balance outcomes and processing demands an empirical determination of the number of lags need to be made. Fifty percent of the available data was utilized for conducting tests, while 50% was used for the purposes of training. The best LSTM network architecture and parameters were chosen using meta-heuristics. The factors chosen for optimization were once more chosen because of their substantial impact on model performance. Learning rate within a range of [0.0001, 0.01], dropout [0.05, 0.2], and the number of training epochs [17, 44] are among the parameters. Finally, optimization of the network architecture was also done, and a layer count in the range [10, 18] was chosen. Additionally, each layer’s number of neurons is optimized, while the number of neurons per layer belongs to a [9, 36] range. The limitations, however, depend on how many lags are really used. A range of [lags, lags 2] was taken into consideration for neuron constraints. A decision to not utilize a validation set was made, which was divided into train and test sets in a 50/50 ratio.
1
https://marinecadastre.gov/ais/.
60
A. Toskovic et al.
Comparative and validation metrics Regression metrics were utilized for the time series forecasting experiment vessel trajectory. The recorded metrics are the mean absolute error (MAE) provided in Eq. (22), mean square error (MSE) Eq. (23), root mean squared error (RMSE) Eq. (24), and determination coefficient (R2 ) Eq. (25). The index of alignment (IoA) was used as well Eq. (26) and the Euclidean distance error (EDE) Eq. (27) for the determination of the distance between the real and assumed location of the vessel. n | 1 ∑|| yi − yi |, MAE = n i=1 Δ
(22)
)2 1 ∑( yi − yi , n i=1 [ | n |1 ∑( )2 yi − yi , RMSE = √ n i=1 n
Δ
MSE =
(23)
Δ
∑n ( R =1− 2
(24)
)2
Δ
i=1 yi ∑n i=1 (yi
− yi
, − y)2 )2 ∑n ( i=1 yi − yi | |)2 , ( |yi − y| + |xi − yi |
(25)
Δ
IoA = 1 − ∑n i=1
Δ
(26)
for which the observed values are denoted as yi , the forecast value as yˆi , and the mean of yi as y, while the observed samples as n. The actual and the predicted value distance is described by the EDE in Eq. 27. √ EDE =
(
)2
Δ
xi − xi
)2 ( + yi − yi , Δ
(27)
for which the actual coordinates of the vessel are x i and yi , and the each coordinates predicted values are xˆi and yˆi . The objective function for optimization was the MSE metric with a minimization goal. The EDE mean is used for the indicator function.
5 Results from Experiments, a Comparative Examination, and Subsequent Discussion The outcomes of the previously described experiments and the setups for testing are presented in this section. Trajectory prediction was performed for which the results were validated.
5 Marine Vessel Trajectory Forecasting Using Long Short-Term Memory …
61
5.1 Experimental Observations and Comparative Analysis For input features, [‘LAT’, ‘LON’, ‘SOG’, ‘COG’, ‘Heading’] multivariate forecasting is performed. [‘LAT’, ‘LAN’] are the target features that represent the vessel’s location. Overall forecasting outcomes for the objective function are shown in terms of best, worst mean, and median in Table 1. The best obtained metrics in all tables are marked with bold style. As demonstrated, the introduced MFA outperformed competing algorithms comparing the best, worst, and mean, with the relatively recently introduced ChOA showing the best outcomes in the median case. Furthermore, impressive stability is demonstrated by the GA despite this algorithm not attaining optimal outcomes. A more intuitive comparison between algorithms in terms of stability can be seen in Fig. 1, alongside algorithms convergence graphs. Distributions given in Table 1 indicate the objective and indicator function outcome distributions of each evaluated algorithm. As can be observed the modified algorithm achieves improved average outcomes compared to the original FA algorithm. However, the distribution of outcomes is slightly increased signifying an improvement in algorithm diversity. Detailed metrics for the regression predictions made by each model are further demonstrated in Table 2. Table 1 Overall objective function results Method
Best
Worst
Mean
Median
Std
Var
LSTM-MFA
0.006190
0.008838
0.007602
0.008838
0.001321
1.75E−06
LSTM-FA
0.008021
0.009289
0.008740
0.009289
0.000628
3.95E−07
LSTM-GA
0.012017
0.012047
0.012033
0.012047
0.000015
2.27E−10
LSTM-PSO
0.006882
0.011415
0.009602
0.011415
0.002221
4.93E−06
LSTM-RSA
0.008522
0.010156
0.009393
0.010156
0.000815
6.64E−07
LSTM-ChOA
0.007564
0.009219
0.008282
0.007564
0.000820
6.72E−07
AIS vessel trajectory prediction - objective box plot diagram
AIS vessel trajectory prediction -
0.012 0.011
0.998
0.010
R2
Objective
0.996
0.009 0.008 0.007
0.994
0.992
0.006
Algorithm
Algorithm
Fig. 1 Objective and indicator function distribution and convergence plots
R
2
violin plot diagram
62
A. Toskovic et al.
Table 2 Comprehensive analysis between the top-performing models Method
R2
MAE
MSE
RMSE
IoA
LSTM-MFA
0.998283
0.005894
0.000076
0.008728
0.999833
LSTM-FA
0.997512
0.007570
0.000111
0.010552
0.999756
LSTM-GA
0.993916
0.011564
0.000269
0.016405
0.999407
LSTM-PSO
0.997939
0.006975
0.000090
0.009463
0.999805
LSTM-RSA
0.996859
0.008284
0.000139
0.011771
0.999695
LSTM-ChOA
0.997595
0.007729
0.000105
0.010229
0.999768
Table 3 Best obtained model hyperparameters selections Method
Learning rate
Dropout epochs
Layers
Neurons layer 1
Neurons layer 2
LSTM-MFA
0.007667
0.122235 40
1
54
–
LSTM-FA
0.004687
0.200000 50
2
64
32
LSTM-GA
0.010000
0.200000 30
2
64
61
LSTM-PSO
0.009530
0.174241 37
1
52
–
LSTM-RSA
0.010000
0.050000 30
1
64
–
LSTM-ChOA
0.003363
0.200000 46
1
47
–
Best predictions denormalized actual predicted
latitude
0.6
0.4
0.2
0.0 0.6
0.7
0.8
0.9 longitude
1.0
1.1
1.2
1.3
Fig. 2 Best model predictions compared to actual values
A clear observation can be made that the introduced algorithm yields models that produce the highest degrees of excellence across all evaluation metrics. Finally, to enforce experimental reputability, hyperparameter selections for each of the bestperforming constructed models are provided in Table 3. Finally, the trajectory forecasts created by the model alongside actual vessel trajectories are provided in Fig. 2.
5 Marine Vessel Trajectory Forecasting Using Long Short-Term Memory …
63
6 Conclusion The research proposes a new vessel trajectory forecasting method using the LSTM model for time series forecasting. The model was implemented to a real-world dataset which consists of AIS data. The benefits of the proposed solutions are multi-fold, but the most important is that the negative effects on wildlife can be reduced. Additionally, illegal actions are harder to perform with the employment of the proposed mechanism; hence, safety is also enhanced. The improved solution of FA solution was proposed as well to mitigate the downsides of the original algorithm. The new solution was named MFA for tuning the LSTM. Extensive trip data was used to make predictions including parameters like the time gaps from the last recorded position. Route optimization benefits from this mechanism as well. The proposed solutions have been rigorously tested against other high-performing solutions for NP-hard optimization. The proposed method was dominant in the overall comparison. The limitations when conducting such experiments are well-known and hence the number of compared algorithms was limited, as well as the population sizes, and the number of runs. Furthermore, the noise in the AIS data is an obstacle as well as the inconsistency of recorded data. The authors plan to condone future work in the direction of addressing the mentioned obstacles for improvement on this issue, as well as to explore the possible uses for other time series forecasting problems.
References 1. Han X, Armenakis C, Jadidi M (20210 Modeling vessel behaviours by clustering ais data using optimized dbscan. https://doi.org/10.3390/SU13158162 2. Renso C, Bogorny V, Tserpes K, Matwin S, Macedo J (2021) Multiple-aspect analysis of semantic trajectories (master). https://doi.org/10.1080/13658816.2020.1870982 3. Huang J, Zhu F, Huang Z, Wan J, Ren Y (2021) Research on real-time anomaly detection of fishing vessels in a marine edge computing environment. https://doi.org/10.1155/2021/559 8988 4. Wang X, Xiao Y (2023) A deep learning model for ship trajectory prediction using automatic identification system (ais) data. Information 14(4):212 5. Zhou Y, Daamen W, Vellinga T, Hoogendoorn SP (2019) Ship classification based on ship behavior clustering from ais data. Ocean Eng 175:176–187 6. Yang XS, Slowik A (2020) Firefly algorithm. Swarm intelligence algorithms. CRC Press, New York, pp 163–174 7. Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of ICNN’95international conference on neural networks, vol 4. IEEE, pp 1942–1948 8. Heidari AA, Mirjalili S, Faris H, Aljarah I, Mafarja M, Chen H (2019) Harris hawks optimization: algorithm and applications. Fut Gen Comput Syst 97:849–872 9. Mirjalili S (2019) Genetic algorithm. Evolutionary algorithms and neural networks. Springer, New York, pp 43–55
64
A. Toskovic et al.
10. Abualigah L, Abd Elaziz M, Sumari P, Geem ZW, Gandomi AH (2022) Reptile search algorithm (rsa): a nature-inspired meta-heuristic optimizer. Exp Syst Appl 191:116158 11. Yang XS, Hossein Gandomi A (2012) Bat algorithm: a novel approach for global engineering optimization. Eng Comput 29(5):464–483 12. Karaboga D (2010) Artificial bee colony algorithm. Scholarpedia 5(3):6915 13. Khishe M, Mosavi MR (2020) Chimp optimization algorithm. Exp Syst Appl 149:113338 14. Bacanin N, Tuba E, Zivkovic M, Strumberger I, Tuba M (2019) Whale optimization algorithm with exploratory move for wireless sensor networks localization. International conference on hybrid intelligent systems. Springer, New York, pp 328–338 15. Zivkovic M, Bacanin N, Tuba E, Strumberger I, Bezdan T, Tuba M (2020) Wireless sensor networks life time optimization based on the improved firefly algorithm. In: Proceedings of the 2020 international wireless communications and mobile computing (IWCMC). IEEE, pp 1176–1181 16. Zivkovic M, Bacanin N, Zivkovic T, Strumberger I, Tuba E, Tuba M (2020) Enhanced grey wolf algorithm for energy efficient wireless sensor networks. In: Proceedings of the 2020 zooming innovation in consumer technologies conference (ZINC). IEEE, pp 87–92 17. Zivkovic M, Zivkovic T, Venkatachalam K, Bacanin N (2021) Enhanced dragonfly algorithm adapted for wireless sensor network lifetime optimization. Data intelligence and cognitive informatics. Springer, New York, pp 803–817 18. Bacanin N, Bezdan T, Tuba E, Strumberger I, Tuba M, Zivkovic M (20190 Task scheduling in cloud computing environment by grey wolf optimizer. In: Proceedings of the 2019 27th telecommunications forum (TELFOR). IEEE, pp 1–4 19. Bezdan T, Zivkovic M, Antonijevic M, Zivkovic T, Bacanin N (2020) Enhanced flower pollination algorithm for task scheduling in cloud computing environment. Machine learning for predictive analysis. Springer, New York, pp 163–171 20. Bezdan T, Zivkovic M, Tuba E, Strumberger I, Bacanin N, Tuba M (2020) Multi-objective task scheduling in cloud computing environment by hybridized bat algorithm. International conference on intelligent and fuzzy systems. Springer, New York, pp 718–725 21. Zivkovic M, Bacanin N, Venkatachalam K, Nayyar A, Djordjevic A, Strumberger I, Al-Turjman F (2021) Covid-19 cases prediction by using hybrid machine learning and beetle antennae search approach. Sustain Cit Soc 66:102669 22. Zivkovic M, Venkatachalam K, Bacanin N, Djordjevic A, Antonijevic M, Strumberger I, Rashid TA (2021) Hybrid genetic algorithm and machine learning method for covid-19 cases prediction. In: Proceedings of international conference on sustainable expert systems: ICSES 2020, vol 176. Springer Nature, New York, p 169 23. Zivkovic M, Bacanin N, Antonijevic M, Nikolic B, Kvascev G, Marjanovic M, Savanovic N (2022) Hybrid CNN and xgboost model tuned by modified arithmetic optimization algorithm for covid-19 early diagnostics from x-ray images. Electronics 11(22):3798 24. Alzaqebah A, Aljarah I, Al-Kadi O, Damaˇseviˇcius R (2022) A modified grey wolf optimization algorithm for an intrusion detection system. Mathematics 10(6):48 25. Bacanin N, Zivkovic M, Stoean C, Antonijevic M, Janicijevic S, Sarac M, Strumberger I (2022) Application of natural language processing and machine learning boosted with swarm intelligence for spam email filtering. Mathematics 10(22):4173 26. Stankovic M, Antonijevic M, Bacanin N, Zivkovic M, Tanaskovic M, Jovanovic D (2022) Feature selection by hybrid artificial bee colony algorithm for intrusion detection. In: Proceedings of the 2022 international conference on edge computing and applications (ICECAA). IEEE, pp 500–505 27. Jovanovic D, Antonijevic M, Stankovic M, Zivkovic M, Tanaskovic M, Bacanin N (2022) Tuning machine learning models using a group search firefly algorithm for credit card fraud detection. Mathematics 10(13):2272 28. Petrovic A, Bacanin N, Zivkovic M, Marjanovic M, Antonijevic M, Strumberger I (2022) The adaboost approach tuned by firefly metaheuristics for fraud detection. In: Proceedings of the 2022 IEEE world conference on applied intelligence and computing (AIC). IEEE, pp 834–839
5 Marine Vessel Trajectory Forecasting Using Long Short-Term Memory …
65
29. Bacanin N, Budimirovic N, Venkatachalam K, Jassim HS, Zivkovic M, Askar S, Abouhawwash M (2023) Quasi-reflection learning arithmetic optimization algorithm firefly search for feature selection. Heliyon 9(4):e15378 30. Bezdan T, Cvetnic D, Gajic L, Zivkovic M, Strumberger I, Bacanin N (2021) Feature selection by firefly algorithm with improved initialization strategy. In: Proceedings of the 7th conference on the engineering of computer based systems, pp 1–8 31. Bacanin N, Jovanovic L, Zivkovic M, Kandasamy V, Antonijevic M, Deveci M, Strumberger I (2023) Multivariate energy forecasting via metaheuristic tuned long-short term memory and gated recurrent unit neural networks. Inform Sci 642:119122 32. Bacanin N, Stoean C, Zivkovic M, Rakic M, Strulak-Wojcikiewicz R, Stoean R (2023) On the benefits of using metaheuristics in the hyperparameter tuning of deep learning models for energy load forecasting. Energies 16(3):1434 33. Stoean C, Zivkovic M, Bozovic A, Bacanin N, Strulak-Wojcikiewicz R, Antonijevic M, Stoean R (2023) Metaheuristic-based hyperparameter tuning for recurrent deep learning: Application to the prediction of solar energy generation. Axioms 12(3):266 34. Bacanin N, Stoean C, Zivkovic M, Jovanovic D, Antonijevic M, Mladenovic D (2022) Multiswarm algorithm for extreme learning machine optimization. Sensors 22(11):4204 35. Bacanin N, Zivkovic M, Al-Turjman F, Venkatachalam K, Trojovsky P, Strumberger I, Bezdan T (2022) Hybridized sine cosine algorithm with convolutional neural networks dropout regularization application. Sci Rep 12(1):1–20 36. Gajic L, Cvetnic D, Zivkovic M, Bezdan T, Bacanin N, Milosevic S (2021) Multi-layer perceptron training using hybridized bat algorithm. Computational vision and bio-inspired computing. Springer, New York, pp 689–705 37. Jovanovic L, Jovanovic D, Bacanin N, Jovancai Stakic A, Antonijevic M, Magd H, Thirumalaisamy R, Zivkovic M (2022) Multi-step crude oil price prediction based on lstm approach tuned by salp swarm algorithm with disputation operator. Sustainability 14(21):14616 38. Milosevic S, Bezdan T, Zivkovic M, Bacanin N, Strumberger I, Tuba M (2021) Feed-forward neural network training by hybrid bat algorithm. Modelling and development of intelligent systems: 7th international conference, MDIS 2020, Sibiu, Romania, October 22–24, 2020, revised selected papers 7. Springer International Publishing, New York, pp 52–66 39. Bacanin N, Sarac M, Budimirovic N, Zivkovic M, AlZubi AA, Bashir AK (2022) Smart wireless health care system using graph lstm pollution prediction and dragonfly node localization. Sustain Comput Inform Syst 35:100711 40. Jovanovic G, Perisic M, Bacanin N, Zivkovic M, Stanisic S, Strumberger I, Alimpic F, Stojic A (2023) Potential of coupling metaheuristics-optimized-xgboost and shap in revealing pahs environmental fate. Toxics 11(4):394 41. Jovanovic L, Jovanovic G, Perisic M, Alimpic F, Stanisic S, Bacanin N, Zivkovic M, Stojic A (2023) The explainable potential of coupling metaheuristics: optimized-xgboost and shap in revealing vocs’ environmental fate. Atmosphere 14(1):109 42. Zhang Z, Yang R, Fang Y (2018) LSTM network based on antlion optimization and its application in flight trajectory prediction. In: Proceedings of the 2018 2nd IEEE advanced information management, communicates, electronic and automation control conference (IMCEC). IEEE, pp 1658–1662 43. Xiao Z, Li P, Havyarimana V, Hassana GM, Wang D, Li K (2018) Goi: A novel design for vehicle positioning and trajectory prediction under urban environments. IEEE Sens J 18(13):5586– 5594 44. Liu J, Shi G, Zhu K (2019) Vessel trajectory prediction model based on ais sensor data and adaptive chaos differential evolution support vector regression (acde-svr). Appl Sci 9(15):2983 45. Hofmann C, Topputo F (2021) Rapid low-thrust trajectory optimization in deep space based on convex programming. J Guid Control Dyn 44(7):1379–1388 46. Mohamed AW, Hadi AA, Mohamed AK, Awad NH (2020) Evaluating the performance of adaptive gaining sharing knowledge based algorithm on CEC 2020 benchmark problems. In: Proceedings of the 2020 IEEE congress on evolutionary computation (CEC). IEEE, pp 1–8
66
A. Toskovic et al.
47. Chechkin AV, Metzler R, Klafter J, Gonchar VY (2008) Introduction to the theory of levy flights. In: Anomalous transport: foundations and applications, pp 129–162 48. Yang XS, Deb S (2009) Cuckoo search via levy flights. In: Proceedings of the 2009 world congress on nature and biologically inspired computing (NaBIC). IEEE, pp 210–214 49. Heidari AA, Pahlavani P (2017) An efficient modified grey wolf optimizer with levy flight for optimization tasks. Appl Soft Comput 60:115–134 50. Kaidi W, Khishe M, Mohammadi M (2022) Dynamic levy flight chimp optimization. Knowl Based Syst 235:107625
Chapter 6
Application of Artificial Intelligence in Virtual Reality Derouech Oumaima, Lachgar Mohamed, Hrimech Hamid, and Hanine Mohamed
1 Introduction Virtual reality is a technology that uses a combination of imaging and computer processing to create an immersive world that represents the real world. The role of these virtual worlds is to create real and dynamic simulations to mimic the real world with a high degree of accuracy and allow users to interact with elements as if they were in the real world. The use of 3D modeling techniques is a major source that contributes to the creation of these immersive environments and creates a very high sense of reality that allows the user to be more engaged and interact in a positive way [1]. VR refers to the computer-generated reconstruction of a three-dimensional image or picture that allows interaction that is like that which occurs in the real world. This becomes achievable by the use of particular electronics, such as sensor-equipped gloves or headgear with a built-in display [2]. The notion of producing a virtual scene or a 3D scene that corresponds to a real environment while allowing real-time interaction is what we find in virtual reality these days. Thanks to the integration of hardware components and specialist elements in the world of VR, the creation of these virtual environments has become increasingly achievable. Gloves containing sensors, helmets with integrated screens, and motion-detection gadgets are examples of this advanced technology, enabling users to create a link between the real and virtual worlds [3]. Users can interact with virtual objects or change aspects of the VR environment simply by using their hands when wearing sensor-equipped gloves, making activities tangible and natural. Helmets with integrated screens, often referred to as D. Oumaima (B) · L. Mohamed · H. Mohamed ENSA School-University Chouaib Doukkali, El Jadida, Morocco e-mail: [email protected] H. Hamid ENSA School of Berrechid-University Hassan First, Settat, Morocco © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 S. Lanka et al. (eds.), Trends in Sustainable Computing and Machine Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-9436-6_6
67
68
D. Oumaima et al.
VR headsets, immerse users in the digital world by offering a 360° image of the virtual environment, removing the limits of the real world from their field of vision. These headsets are also equipped with motion sensors that detect head movements, reinforcing the sensation of being in a separate environment. Les applications de RV impliquent l’immersion de l’utilisateur dans un environnement virtuel qui se rapproche de la vie réelle et peuvent être utilisées dans différents domaines, tels que: • Gaming: VR games allow players to immerse themselves in virtual and interactive 3D worlds, enhancing the gaming experience by allowing them to physically interact with the game’s environment and characters [4]. • Training and Simulation: VR is used for training in industries including aviation, military, and emergency services. VR simulations enable learners to practice difficult activities and scenarios in a risk-free environment [5]. • Education: VR offers students immersive learning experiences. It can take children on virtual excursions, allowing them to discover historical sites or scientific phenomena in an interactive 3D way. VR has the potential to engage students in these environments [6]. • Health Care: VR is an important tool for medical education, surgical planning, and rehabilitation. Surgeons can perform operations in a virtual environment, while patients can use VR to reduce pain and anxiety relief [7]. • Architecture and Design: VR is used by architects and designers to see and manipulate 3D models of structures and interiors. Clients may view virtual walkthroughs of suggested designs before construction begins [8]. • VR leads the way in an immersive approach to storytelling and entertainment. Examples of these are virtual concerts, art shows, and immersive theater. • Tourists can digitally explore places before planning their trip thanks to VR. They can get a glimpse of real environments by taking 360-degree tours of hotels, sites, and attractions [9]. VR applications use cutting-edge technology to present users with realistic, interactive experiences that have the potential to transform the way we learn and explore the real world in a virtual way. AI is the extraordinary attempt to simulate cognitive capabilities like human intellect inside the world of computers, specifically computer systems. This diverse topic encompasses a wide range of applications, each of which uses AI technology to solve tasks that generally need human-like comprehension and reasoning abilities. The adaptability of AI has resulted in the creation of a wide range of applications, each with its own set of capabilities and contributions to the advancement of human–computer interaction and problem-solving [2]. Expert systems play an important role in the field of AI. They are designed to ingest and analyze knowledge repositories, enabling them to solve problems and provide expert-level information. Expert systems find applications in fields such as health care, diagnostic support and finance, investment decision support. By exploiting AI’s ability to evaluate datasets and draw conclusions based on specialized knowledge, these systems significantly improve decision-making processes.
6 Application of Artificial Intelligence in Virtual Reality
69
Natural Language Processing (NLP): refers to a further stage in the advancement of AI where computers are able to interpret and converse in human language. This enables robots to understand the nuances of language such as context, mood, and tone. Chatbots that hold human-like dialogues, language translators, or sentiment analyzers for public opinion on social networks are all areas where NLP can be applied [10]. Voice Recognition: AI’s capabilities extend to voice recognition, allowing robots to effectively comprehend spoken words. This technology powers voice assistants such as Siri and Alexa, allowing users to communicate with gadgets using natural spoken instructions. Speech recognition has practical uses in transcribing services, accessibility solutions for those with impairments, and hands-free operation of machines in a variety of sectors, in addition to improving user convenience [11]. Machine Vision: Another aspect of AI is machine vision, which allows computers to interpret and analyze visual information collected from their surroundings. This technology, which allows vehicles to detect their surroundings and make real-time driving judgments, is a cornerstone of self-driving automobiles. Machine vision systems in manufacturing examine and ensure the quality of items on assembly lines, identifying faults with exceptional precision. Furthermore, machine vision has uses in medical imaging, where it assists in illness diagnosis from radiological scans, and augmented reality (AR), where it superimposes digital information on the actual environment [12]. AI plays a role in technology by striving to simulate and improve intelligence to make it more human-like. Applications include expert systems, natural language processing, speech recognition, and computer vision. These advances have had a positive impact on the way we interact with technology and have helped us to solve problems. The future looks bright for AI contributions, across all sectors, that will bring improvements to various aspects of our lives. Together, AI and VR create a more enveloping and invigorating experience. The combination of AI and VR creates a synergy that transforms the virtual world into a realistic arena. Here are some important links between VR and AI: • Realistic environments: By building credible virtual environment models that mimic nature, physics, and human behavior, AI can simulate reality [13]. • Immersive interactions: With VR, machine learning methods from computer vision and natural language processing are used to make virtual characters more engaging [14]. • Personalization: If AI makes information available for consumption, adjusts the learning level or special conditions desired by the user, personalization is likely to occur [15]. In this era of science and technology, the development of AI is rapidly advancing. Though machines’ supremacy over humans is undeniable in chess, a historical investigation of human advancement emphasizes multiple areas where AI lags the human mind. Machines cannot replace, surpass, or replicate specific facets of human cognition and comprehension [16].
70
D. Oumaima et al.
Today, AI is a significant technological advance that promotes an overall improvement in skills. It enables computers to perform cognitive tasks such as perception, logic, learning, and interaction. The convergence of three advanced and interconnected technological developments—improved algorithmic intelligence, huge data warehouses, and an abundance of affordable processing and storage resources—has led to the rapid adoption of AI in our daily lives [17]. In the following section, we critically review previous studies on the incorporation of artificial intelligence into the virtual reality environment. We cover numerous methodologies, tactics, and approaches to using AI to enhance and elevate virtual worlds.
2 Related Works This chapter contains a comprehensive overview of previous work examining the link between AI and VR. Our aim is to review many cutting-edge studies, approaches, and ideas to demonstrate how the seamless integration of AI has enhanced virtual environments. This allows us to gain a great deal of insight into the many strategies used to improve interaction and capability with virtual landscapes used for research. The study presented in this paper builds on previous studies to provide an overview of the effects of AI on virtual environments, which have significantly altered the potential available in this field. Significant advances in computer visual technology have enabled the creation and manipulation of increasingly believable games and virtual worlds. Nonetheless, the incorporation of AI into these virtual worlds constitutes a significant step forward, improving both engagement and authenticity inside these immersive settings. AI’s broad goals include the creation of virtual environments in which varied virtual entities, such as virtual critters or human avatars, may interact and perform various activities autonomously. This AI integration goes beyond basic visual upgrades, bringing a layer of intelligence that endows virtual beings with the ability to make decisions, adapt to changing situations, and interact in lifelike ways. The synergistic combination of AI with virtual worlds produces experiences that go beyond the limits of traditional simulations. Within these AI-powered virtual domains, users face scenarios in which virtual creatures guided by AI algorithms display real-world-like behaviors and reactions. This not only improves overall immersion but also opens a world of possibilities for a wide range of applications, from gaming to training simulations and beyond. The convergence of AI with virtual environments offers a transformational progression that will revolutionize how consumers interact with digital places. It improves interaction quality, creates a stronger feeling of realism, and broadens the frontiers of what is possible within the vast geography of virtual realities [13]. The combination of AI and VR has ushered in a period of disruption in many sectors, heralding new opportunities and enhanced experiences. The combination of AI and VR has fundamentally changed the way students approach and understand
6 Application of Artificial Intelligence in Virtual Reality
71
difficult subjects in education. Thanks to this integration, immersive learning environments have been created, offering students unprecedented opportunities to make discoveries about complex ideas, historical events, or scientific phenomena. The significance of this paradigm change is eloquently illustrated in the article [18], the numerous benefits of incorporating AI and VR technology into educational environments. The most significant of these advantages is the significant improvement in pupils’ creativity and attention. The immersive aspect of AI-powered VR attracts students’ attention with amazing effectiveness, allowing for a better and more lasting understanding of the subject matter. One striking result of this integration has been a significant reduction in test-related anxiety among children. Learners gain confidence and competency by immersing themselves in VR-driven instructional experiences. They are encouraged to experiment, explore, and learn at their own speed, creating an environment that reduces the stress associated with traditional testing. Students also show great enthusiasm and active participation in sessions when the use of VR powered by AI technology is highlighted. This increased participation creates a stimulating environment that encourages students’ natural curiosity and allows their creative potential to flourish. However, this is particularly evident in fields such as fine art, where students can fully engage in the imaginative creative process, use a variety of techniques and present themselves as if the digital setting were truly authentic. Finally, this represents a new paradigm in teaching through AI and VR and changes the fundamental framework of education. It supports an integrated strategy that fosters student creativity, increases concentration, and reduces anxiety while imparting knowledge. The combination of AI and VR has the potential to completely transform the way we learn, examine, and develop our cognitive and creative abilities in this ever-changing educational environment. A new era in health care has arrived thanks to AI-driven VR simulations that are transforming medical training, patient care, and therapeutic approaches. The integration of AI and VR technology offers healthcare professionals powerful tools that can improve many facets of the healthcare ecosystem. One of the most important benefits is the analytical capability of AI, which deftly navigates vast amounts of medical data and enables rapid and accurate diagnosis. This accelerates the decisionmaking process, resulting in faster medical interventions and, ultimately, improved patient outcomes. The integration of AI and VR into medical education is bringing about a significant change. Aspiring practitioners can gain hands-on experience in a safe, regulated environment through immersive, hyper-realistic training programs. This fosters the development of the skills and knowledge needed for clinical practice, ultimately leading to better patient care. Beyond training, VR-based therapies usher in fresh approaches to pain management, rehabilitation, and exposure treatment administration. These treatments make use of VR’s immersive characteristics to provide individualized interventions that reduce pain, improve rehabilitation, and assist patients in confronting and defeating their fears and phobias.
72
D. Oumaima et al.
Furthermore, the combination of AI and VR is revolutionizing surgical methods. It improves surgical planning and precision by giving comprehensive, threedimensional images, and simulations of difficult operations to surgeons. This enables them to improve their skills, lower surgical risks, and improve patient outcomes. Lifelike remote consultations enabled by AI and VR cross geographical barriers, allowing patients to get professional medical advice and care from the comfort of their own homes. This democratizes healthcare access and improves healthcare delivery efficiency. In conclusion, the fusion of AI and VR offers immense potential for the medical field. In addition to improving medical education and diagnostics, it also offers cutting-edge solutions for telemedicine, rehabilitation, exposure therapy, and pain management. As it develops, this integration has the potential to change healthcare procedures, improve treatment optimization and, ultimately, improve people’s wellbeing and outcomes on a global scale [7, 19]. AI technology in VR, which includes games, offers stimulating frameworks for the rapid training and testing of new AI systems. The user experience can be enhanced by using these AI robots as NPCs in video games. However, some sources claim that the AI enemies used in video games are pre-programmed and non-adaptive. Five of the 36 studies reviewed focused on gaming applications, integrating AI-based creatures into virtual worlds utilizing platforms like REALISM software and Unity. The gaming environment is greatly enhanced by this combination of AI and VR, increasing game complexity, interactivity and user engagement [4]. In the corporate environment, VR enables immersive data visualization, facilitating more logical analysis and decision-making. By enabling remote teams to collaborate in real-life circumstances, it also modifies cooperation. According to the article [20], companies are integrating AI to better understand customer preferences, providing tailored experiences and effectively archiving user behavior data for future improvements. With detailed data reporting and an integrated cross-functional strategy, companies can stand out from the competition, improve product design, efficiently allocate resources and plan for success in advance. In other words, AI in VR drives innovation, stimulates user engagement, and changes the way organizations operate. The study in the article [21] delves into the interesting convergence of AI and the domain of virtual worlds, with a focus on the dynamic interplay of AI agents in these digital settings. Furthermore, it investigates the possible ramifications of such interactions in driving AI to a level of sophistication equivalent to human intellect. The comprehensive approach that pulls from multiple fields, weaving together ideas to present a well-rounded view of the numerous issues that arise at this juncture, distinguishes this work. The study provides a broad viewpoint point that serves a twofold purpose by linking these numerous challenges. On the one hand, it emphasizes the importance of AI operator progress within virtual environments in driving their continuing evolution and pushing the frontiers of their capabilities. The study, on the other hand, highlights how these virtual worlds provide an ideal environment
6 Application of Artificial Intelligence in Virtual Reality
73
for examining and deconstructing the variety of difficulties that constitute the complicated landscape of AI. In summary, this study emphasizes not only the necessity of AI’s cooperation with virtual environments but also their joint potential in unraveling the complexity of AI. In today’s virtual environment, AI is extremely effective at enabling human– computer interaction (HCI). The article [22] demonstrates that gesture recognition is a technique in which the actions done by users are recognized by the recipient. Gestures are physical movements that involve motions of the fingers, hands, arms, head, face, or body and serve two basic purposes: transmitting meaningful information and engaging with the surroundings. These motions are a condensed but intriguing subset of conceivable gestures. AI could be utilized in VR to identify hand gestures, allowing for more realistic interaction with virtual items. Findings from AI and VR Integration are shown in Fig. 1. Figure 2 shows an illustrated graphic that summarizes the important conclusions from the study effort. It illustrates the integration of AI and VR across several domains, emphasizing the transformational influence in gaming, education, health care, and human–computer interaction. The results show that the use of AI and VR has significantly changed a number of industries. AI has greatly enhanced user interaction and the sense of authenticity in virtual environments, giving users access to more than just aesthetic changes. The range of games and training simulations is expanded as it enables self-aware interactions and realistic behaviors. AI-powered VR heralds a paradigm shift in education by improving creativity, concentration, and test anxiety levels. Medical diagnosis, training, pain management, rehabilitation, surgical precision, and telemedicine have all benefited from AI and VR.
Fig. 1 Findings from AI and VR integration
74
D. Oumaima et al.
Fig. 2 AI’s contribution to VR
In gaming, however, certain AI rivals remain non-adaptive, potentially restricting the degree of challenge. While these improvements are encouraging, they are accompanied by technological integration obstacles accessibility concerns, and ethical issues about AI reaching human-level intelligence in virtual worlds. Nonetheless, the convergence of AI and VR offers revolutionary and immersive experiences in a variety of fields ranging from education to health care and beyond. The fascinating fusion of AI and VR is explored in the following section. This cutting-edge technical development has the potential to revolutionize the way we apprehend virtual worlds and their immersive content.
3 Unveiling the AI-Powered VR Future The confluence of AI and VR is a remarkable technological development that has the potential to change the way we perceive and interact with virtual worlds and the content they store. This innovative partnership is at the forefront of technological progress and is poised to usher in a new era of transformative experiences in a variety of applications, including entertainment, education, health care, and beyond.
6 Application of Artificial Intelligence in Virtual Reality
75
Table 1 presents the features of AI in VR in different sections; these features improve the usability and efficacy of AI in VR across a wide variety of applications, from entertainment and education to health care and data analysis [13, 19–21, 23, 24]. Better immersion: AI-driven VR can make virtual experiences far more immersive. By using AI algorithms to track user behavior and modify the virtual world in real time, VR simulations can become more perceptive and adapt to users’ tastes and activities. The boundary between reality and the virtual world can become less obvious because of this increasing immersion, making virtual environments more realistic and attractive. By enabling the creation of more intelligent and realistic virtual environments and characters, as well as personalized and customizable experiences, AI makes a significant contribution to VR. The function of AI in VR also includes personalization. AI tailors VR experiences to individual interests and requirements by continually studying user behavior, preferences, and interactions. This flexibility means that users are active participants, designing their virtual travels in real-time rather than passive observers. AI-driven personalization improves engagement and immersion by altering the difficulty level of a game, adapting instructional content, or modifying the ambiance of a virtual location [25]. The adaptability of AI in VR also includes reactive parameters. Virtual environments can change and develop according to the user’s actions and decisions, producing dynamic, personalized experiences. For example, if a user shows interest in a specific section of the virtual world, AI can modify the configuration to better engage the user, ensuring that each experience is distinct and tailored to the user [26]. Table 1 Potential advancements in AI-powered VR Potential developments
Features
Better immersion
• Realistic, immersive virtual environments • Real-time interaction
Customized experiences
• Adaptation to user needs and learning styles • Efficient content consumption, games and learning
Realistic avatars
• Development of flexible, realistic virtual avatars • Reproduce the movements, facial expressions, and actions of human beings
Natural language processing • Convincing voice instructions and speech • Improve user engagement and communication Data visualization
• Create interactive 3D images from complex data • Help analysts and researchers draw conclusions from data
Health care
• Diagnosis, planning, and modeling of surgical procedures • Customized interventions, virtual therapy sessions, and progress monitoring
Education and training
• Create immersive learning environments • Adapt content to students’ learning styles and objectives
76
D. Oumaima et al.
This can lead to more immersive and engaging user experiences, which can be used in a range of sectors including entertainment, education, and business [25]. In our rapidly evolving technological environment, the fusion of AI and virtual environments represents a huge paradigm shift. The whole framework within which we engage and immerse ourselves in digital worlds has changed as a result of this integration, which has unveiled hitherto unsuspected possibilities. It marks the beginning of a new era characterized by greater involvement, extraordinary adaptability, and revolutionary problem-solving skills, which are fundamentally changing the way we interact with and perceive virtual worlds. This illustration represented in Fig. 2 is a visual study of the several benefits of integrating an AI layer into virtual worlds, explore each area to learn about the creative ways in which AI modifies virtual worlds, taking them to levels beyond conception [17, 24, 27, 28]. AI-driven VR applications are rapidly spreading across a wide range of industries, including gaming, object detection, Industry 4.0 and Internet of Things (IoT), education and training, and health care. This convergence of technologies is producing breakthrough solutions that are radically altering how consumers interact with VR and how industries function. Below, we look at a few examples of AI-enhanced VR applications from these various sectors: Gaming: To create incredibly realistic 3D representations of game characters, carefully crafted virtual environments and dynamic game parameters that change in real time according to player input, AI algorithms have become essential in the gaming industry. This AI-driven immersion offers gamers captivating, constantly evolving gaming experiences that push the boundaries of conventional gaming while enhancing visual fidelity [4]. Object Recognition: AR and VR experiences are evolving with AI systems capable of object recognition. These AI systems are excellent at tracking and detecting objects in the present, enabling seamless interaction between virtual objects and the user’s real environment. AI enhances the adaptability and responsiveness of AR and VR experiences by detecting hand actions, tracking real objects and modifying virtual content to adapt to changing environmental conditions [24]. Health Care: Innovative approaches to pain management, physical therapy and mental health therapy are made possible in health care through AI-powered VR applications. AI’s real-time object recognition and tracking capabilities enable dynamic, responsive VR experiences that can adapt to patients’ movements and needs. AI also plays a vital role in medical data analysis, aiding diagnosis and the creation of effective treatment plans that ultimately improve patient care and outcomes [11]. Education: By tracking students’ actions, achievements and preferences in virtual environments, AI-powered VR applications are revolutionizing the educational landscape. The difficulty and speed of information can be dynamically altered using this data-driven methodology, to meet the needs of each individual learner. What’s more, the voice commands made possible by NLP technology in VR simplify user interactions and enable a more natural exploration of virtual environments [29].
6 Application of Artificial Intelligence in Virtual Reality
77
In short, the convergence of AI and VR is having a significant influence on a variety of sectors, ushering in innovations that result in smarter, more tailored, and profoundly immersive experiences. These AI-powered VR apps are at the vanguard of technical innovation, pushing the limits of what is possible. They carry the promise of a future in which the merger of AI and VR opens hitherto unexplored frontiers in terms of engagement, problem-solving capacities, and human–computer interactions. This convergence is set to transform how we perceive and engage with technology, presenting exciting opportunities for innovation and development across a wide range of areas. In the next part, we conduct a detailed and in-depth comparative examination of various AI applications in the area of VR. Our investigation digs into the various ways in which AI is used to improve and progress VR experiences. We carefully examine how AI is used to complement real-time modeling, automate content generation, improve motion recognition and reaction, and allow for the fine-tuning and customizing of virtual settings. We hope to give a complete grasp of the various and inventive roles that AI will play in influencing the future of VR through this in-depth investigation.
4 Exploring the AI-VR Synergy: Advances, Challenges, and Future Prospects The combination of AI and VR is an exciting and promising future in an everchanging technological landscape. The marriage of cognitive AI and immersive VR has unleashed a wave of scientific research and intense curiosity. The transformative potential of AI applications in VR is currently the subject of in-depth study by researchers, who have embarked on an exciting quest to learn more about their potential for cooperation. In this part, we will begin an illuminating investigation of the most recent achievements and trailblazing contributions in this dynamic nexus. Convolutional Neural Networks (CNNs) play a critical role in bridging the AI-VR nexus, and we cast a light on their relevance. In addition, we dive into novel solutions used when faced with the difficulty of limited data, showcasing researchers’ resourcefulness as they negotiate this challenging terrain. Our tour also includes human fall detection, where AI-powered VR shows its ability to solve crucial safety problems. Aside from that, we investigate the vast possibilities of this marriage of AI and VR, a synergy set to change numerous aspects of computer vision and associated businesses. We’ll explore the fascinating fusion of virtual worlds and AI-driven artificial life as our investigation evolves, highlighting the astonishing synergy that results from the meeting of two previously distinct fields. This interdisciplinary effort not only raises intriguing questions, but also has profound implications for how the technology
78
D. Oumaima et al.
will develop in the future. It transcends boundaries and envisions a scenario in which VR and AI work together to reshape our perception of reality [30]. The keen interest in AI has led to an ever-renewed desire among scientists to conduct active research. This in-depth review has examined in detail the most recent developments in the field of AI applications in VR technology. It has taken on the essential responsibility of providing a thorough overview, logical classification and recognition of the significant contributions of research papers in this field. What stands out is the careful differentiation made between broad developments of the Convolutional Neural Network (CNN) algorithm and those that are specifically related to VR. This difference was made to provide a more accurate search engine for discovering relevant material on this topic. The painstakingly chosen training dataset was rigorously constructed to find an ideal balance between preparation time and training performance. Furthermore, in situations when data availability is restricted, a reasonable proposal has been made to investigate the use of pre-training procedures [23]. The merging of VR and AI offers a viable solution to the persistent problems of detecting human falls, a major public health issue. The article [30] reveals a revolutionary use of VR that tackles the issues associated with the paucity and quality of training data for AI systems. This novel strategy not only overcomes fall data difficulties but also offers up a wide range of options. It is critical to note that the VR-driven data-collecting technique demonstrated here has applications well beyond fall detection. It is critical to note that the VR-driven data-collecting technique demonstrated here has applications well beyond fall detection. This trailblazing technology has applications in a variety of sectors, including human motion detection and identification, as well as the developing field of autonomous cars. This study clearly demonstrates VR technology’s disruptive potential as a catalyst for improving AI capabilities. This revolutionary potential extends beyond the context of fall detection, providing distinctive and flexible solutions that reverberate across a broad range of computer vision disciplines and beyond. The study’s demonstration of the symbiotic relationship between VR and AI is a pioneering step toward a future in which these technologies seamlessly merge to push the bounds of what is possible in the world of intelligent systems and their applications. Historically, research in virtual environments and AI/artificial life has been performed by separate entities, each with its own set of purposes and problems. However, a visible convergence is now taking place across these formerly distinct fields, ushering in a new era of interdisciplinary study. We are seeing a significant increase in the number of programs that investigate user-independent behavior within virtual environments. These applications broaden their scope to include dynamic interactions between populations and other entities. This comprehensive strategy capitalizes on the strengths of both sectors, bringing the integration of approaches and concepts to the forefront. The importance of this integration is growing in areas such as computer-generated entities, virtual identities, and synthetic agents. Researchers are rapidly realizing how useful it is to combine AI with the knowledge and methods of virtual environments
6 Application of Artificial Intelligence in Virtual Reality
79
to enhance the capabilities of these entities. This synergy has led to the creation of increasingly interesting, flexible, and intelligent agents, capable of operating easily in virtual environments. Scholars in the Virtual Environments area are devoting their efforts in this expanding terrain to perfecting certain features and components within these immersive digital realities. At the same time, the AI community is intensively investigating the untapped potential of virtual worlds as fertile ground for nurturing and creating more compelling and intelligent agents [4]. Some artificial life researchers are starting audacious projects that stretch the imagination. They are not satisfied with traditional tactics and are instead developing virtual ecosystems with complicated, emergent characteristics like those observed in biological systems. This audacious endeavor goes above and beyond the usual, aiming to recreate the complexity of life itself within virtual surroundings. In short, the fusion of artificial life, AI, and virtual worlds represents an intriguing convergence of fields. It combines a variety of skills and objectives, successfully redrawing the boundaries of research and innovation. Because of their ongoing overlap and mutual inspiration, these fields are having a significant impact on technological progress and our understanding of complex systems. Finally, the future of human–machine interaction and the development of artificial intelligence will be strongly influenced by this confluence, which will offer exciting possibilities and achievements in the years to come [7]. These goals include the creation of virtual environments meant to house digital life forms. A cohesive set of physical rules is developed within these domains, which may imitate or diverge from the natural laws regulating the actual world. According to some experts, distributed interactive virtual environments such as Active Worlds are essential platforms for creating these virtual domains. These distributed interactive virtual environments serve as online laboratories, similar to digital laboratories, in which researchers investigate the subtle dynamics of AI and the long-lasting, self-sustaining connections between artificial life forms populating virtual worlds. The major goal of creating these finely crafted digital landscapes is to create rule-based systems that may either closely match or drastically vary from the underlying laws guiding our reality [25]. Platforms like Active Worlds focus on the fundamentals of these virtual worlds, serving as a focal point for researchers to systematically study the behaviors, interactions and developmental trajectories of AI-driven creatures. These systems provide an ideal framework for studying the adaptation and effectiveness of AI entities, which are sometimes regarded as digital life forms, within their specialized rule-based ecosystems. In essence, these platforms allow researchers to investigate how AI creatures were meant to mimic biological forms live, and flourish within the organized limitations of their virtual environments. Active Worlds and comparable systems are critical tools for academics seeking to understand the dynamics and effects of AI-driven organisms in these rule-bound environments. These virtual environments act as digital laboratories, offering a unique environment for research into AI and artificial life. They enable us to understand how these
80
D. Oumaima et al.
digital entities react to different sets of rules, adapt to changing circumstances and ultimately thrive in the complex ecosystems of virtual worlds. The implications of this discovery go far beyond scientific research. They could have an impact on the development of artificial intelligence algorithms, on the design of autonomous systems and even on our understanding of complex systems in virtual and real worlds. In essence, this project is a dynamic fusion of AI, virtual environments, and artificial life. It highlights the fascinating dynamics of these digital life forms and their environments, offering an incisive look at the complex interactions between digital entities and the rule-based environments in which they exist. This interdisciplinary partnership will help us to better understand the complex relationships between artificial intelligence, virtual worlds, and the wider technological environment. Table 2 summarizes the main points made in the section on the relationship between AI and VR research. Table 2 Overview of the key points of AI in VR Key aspects
Description
Research focus on AI and VR
• Huge interest in AI has driven continuous research • Research looks at the uses of VR for AI • Categorizes research articles according to their applications in AI and VR
Differentiation of CNN advancements
• Clear distinction between general CNN advancements and VR-related advancements • Enhances precision in the literature search
Training dataset and pre-training
• A carefully selected set of training data for better performance and preparation • Suggestion to use pre-training techniques in cases where less data is available
VR and AI convergence for fall detection
• Integrating VR and AI to detect human falls • VR as a prime data source for training AI systems • Applicability beyond fall detection (e.g., human motion detection, self-driving cars)
VR technology’s transformational impact
• Shows how VR can enhance AI capabilities in various areas of computer vision
Convergence of virtual environments and AI • Historical independence of VR and AI research • The current fusion of the fields of VR and AI • Growth of apps studying user-independent behaviors • Importance of combining approaches from both sectors (e.g., synthetic agents, virtual persons) • Research into artificial intelligence and life using distributed interactive virtual environments
6 Application of Artificial Intelligence in Virtual Reality
81
5 Discussion A powerful alliance is formed when AI and VR are combined, taking virtual experiences to new heights. Immersive, adaptive, and interactive simulations will push the boundaries of entertainment, education, and a number of other sectors thanks to the seamless integration of AI and VR. Fundamentally, AI-powered algorithms play an important role in enhancing the reality of virtual environments, bringing a remarkable sense of realism and presence to interactions with virtual beings. When it comes to personalization, the revolutionary influence of this synergy is evident. VR material becomes increasingly adaptable when AI is included, adjusting smoothly to individual interests, preferences, and actions. This personalization of virtual experiences generates a greater feeling of engagement, as users find themselves engaged in virtual settings that connect deeply with their individual preferences. In addition, AI impacts natural language processing in VR, changing the interaction scene. Users can connect with each other in VR in a meaningful, context-sensitive way thanks to AI-driven natural language processing. Thanks to these new communication channels, users can easily interact with artificial beings, instructors or other participants. In short, the convergence of AI and VR is a driving force behind a dramatic shift in how we experience and interact with digital surroundings. It improves immersion by allowing for more realistic simulations, and it fosters customization by guaranteeing that virtual material connects profoundly with users. Furthermore, it brings in a new era of conversational interactions in VR, promoting engagement and increasing the entire user experience [4]. Personalization becomes possible, allowing VR material to be tailored to individual interests and thereby increasing user engagement. Furthermore, AI-powered natural language processing promotes meaningful interactions within VR, opening new channels of communication and engagement. In action, personalization means that VR experiences can change to reflect the user’s preferences and habits. For example, content can be tailored to a student’s learning style or pace in an educational VR environment, enhancing engagement and understanding. AI-enabled natural language processing raises the level of conversations and interactions in VR environments by enabling users to engage in more compelling and contextually relevant conversations with virtual characters or other users. In doing so, the VR experience is enhanced, and its application in virtual training scenarios, customer service simulations, and therapeutic procedures is broadened. While the pairing of AI and VR is exciting, there are several challenges that need to be properly addressed: • Knowledge of both technical and conceptual aspects is necessary for the successful integration of AI and VR. Developers and researchers need to master AI algorithms, machine learning, computer vision, and VR technologies. They also need
82
D. Oumaima et al.
to know the nuances of user experience design, psychology, and human–computer interaction to create meaningful and engaging VR experiences. • Privacy issues related to VR settings: User data, such as movements, activities, and interactions, are regularly collected and analyzed. User privacy must be protected, which requires robust data security measures and appropriate data processing techniques. It’s difficult to strike the right balance between personalization and privacy. • Computing resources and power: Since AI and VR applications rely heavily on computation, robust technology is needed to deliver seamless experiences. It may be necessary to improve programs and investigate entry-level hardware to ensure accessibility to a wider audience. • Ethical issues: AI-powered VR raises ethical issues. Developers face challenges such as informed consent, potential dependency, and the appropriate use of AI in VR situations. Setting ethical standards and ensuring appropriate development processes are essential. Overcoming these obstacles requires a multidisciplinary approach combining technological competence, ethical awareness, and a deep understanding of human behavior and experience. To realize the full potential of integrating AI and VR, it is imperative to strike the right balance between technological progress and moral considerations. For a variety of reasons, AI is becoming increasingly relevant in the field of VR. It improves the whole VR experience by producing more realistic landscapes and interactions, resulting in greater immersion. AI-powered NLP allows more fluid and engaging dialogues within VR settings. Another significant advantage is personalization, with AI studying user behavior to adjust information and experiences to individual interests. AI-assisted adaptive gaming guarantees that players are continually challenged at a suitable level. AI also improves immersion by correctly monitoring human motions and gestures [31]. Furthermore, AI-powered object detection and interaction methods make it simpler for users to interact with virtual items in a seamless manner, enhancing the entire VR experience. The combination of AI and VR brings up a plethora of research and development opportunities. It enables academics to investigate topics such as building more realistic and dynamic virtual worlds, enhancing behaviors inside these worlds, and increasing NLP enabling lifelike dialogues in VR. Furthermore, individualized content delivery enabled by AI enhances the potential uses of VR in entertainment, education, and training [32]. The healthcare sector offers considerable potential for AI-enhanced VR, with applications ranging from rehabilitation to training simulations for doctors. To ensure that all users, whatever their abilities or limitations, are included in VR experiences, research needs to address the ethical and safety issues associated with AI-driven experiments. Several important conclusions came from our investigation, providing focus on the significant impact of AI in enriching VR experiences, providing research possibilities, and highlighting critical ethical and accessibility concerns. These findings
6 Application of Artificial Intelligence in Virtual Reality
83
Fig. 3 Synergy of AI and VR
are summarized in Fig. 3, underlining the vital need for a multidisciplinary approach to fully realize the promise of AI and VR integration.
6 Conclusion In conclusion, our in-depth analysis has demonstrated the intriguing possibilities and inherent limitations of merging AI and VR. This is a turning point that has the power to transform the VR industry and bring substantial advances in the areas of immersion, personalization, and overall engagement in virtual worlds. In the future, the role of AI in VR will change and grow exponentially. The intersection of AI and VR opens a wide terrain for academic research, brimming with undiscovered domains ripe for investigation. One intriguing area of research is the use of deep reinforcement learning to create dynamic and constantly developing VR experiences. This fascinating field has a lot of room for additional research and development. In addition, there is a need for a thorough examination of the ethical issues associated with AI-powered virtual worlds. To ensure the ethical development of this technology, it is essential to carefully assess any potential ethical issues, and to safeguard the values and well-being of people in these immersive digital spaces. In addition, the integration of AI into therapeutic applications, including VR-based mental health therapies, holds tremendous promise for treating mental health problems and improving well-being. These projects demonstrate the substantial societal impact that AI-powered VR can have when used appropriately and ethically.
84
D. Oumaima et al.
Given the quick rate of technological innovation, it is critical to establish a collaborative synergy among scholars, researchers, and industry practitioners. This joint effort assures that the full promise of AI in VR is realized while also addressing emerging ethical and societal challenges. Only via this collaborative effort will we be able to exploit this tremendous and inventive technological combination for the benefit of society. Human–computer interaction is about to enter a period of transformation that will alter the way we communicate, learn, care for ourselves, and perceive our environment. Our commitment to responsible innovation and a relentless quest for knowledge will guide us as we explore this uncharted territory. That’s how we’ll ensure that the fusion of AI and VR continues to be a powerful force driving breakthroughs and societal progress. This journey will lead us to a time when technology plays an important role in human development and well-being, paving the way for an era of development and prosperity for all.
References 1. Wu Y (2022) Application of artificial intelligence within virtual reality for production of digital media art. Comput Intell Neurosci 2022:1750. https://doi.org/10.1155/2022/3781750 2. Usmani A, Imran M, Javaid Q (2022) Usage of artificial intelligence and virtual reality in medical studies. Pak J Med Sci 38(4):777–779 3. Khalid S et al (2021) The effect of combined aids on users performance in collaborative virtual environments. Multimed Tools Appl 80(6):9371–9391. https://doi.org/10.1007/s11042-02009953-9 4. Reiners D, Davahli MR, Karwowski W, Cruz-Neira C (2021) The combination of artificial intelligence and extended reality: a systematic review. Front Virt Real 2:1–13. https://doi.org/ 10.3389/frvir.2021.721933 5. Xie B et al (2021) A review on virtual reality skill training applications. Front Virt Real 2:1–19. https://doi.org/10.3389/frvir.2021.645153 6. Le Noury PJ, Polman RC, Maloney MA, Gorman AD (2023) XR programmers give their perspective on how XR technology can be effectively utilised in high-performance sport. Sport Med Open 9(1):593. https://doi.org/10.1186/s40798-023-00593-5 7. Bhugaonkar K, Bhugaonkar R, Masne N (2022) The trend of metaverse and augmented and virtual reality extending to the healthcare system. Cureus 14(9):29071. https://doi.org/10.7759/ cureus.29071 8. García AS et al (2019) Collaborative virtual reality platform for visualizing space data and mission planning. Multimed Tools Appl 78(23):33191–33220. https://doi.org/10.1007/s11042019-7736-8 9. Weissker T, Froehlich B (2021) Group navigation for guided tours in distributed virtual environments. IEEE Trans Vis Comput Graph 27(5):2524–2534. https://doi.org/10.1109/TVCG. 2021.3067756 10. Lee JH, Lee M, Min K (2023) Natural language processing techniques for advancing materials discovery: a short review. Int J Precis Eng Manuf Green Technol 10(5):1337–1349. https://doi. org/10.1007/s40684-023-00523-6 11. Chen M, Decary M (2020) Artificial intelligence in healthcare: an essential guide for health leaders. Healthc Manag Forum 33(1):10–18. https://doi.org/10.1177/0840470419873123 12. Imam M et al (2023) The future of mine safety: a comprehensive review of anti-collision systems based on computer vision in underground mines. Sensors 23(9):4294. https://doi.org/ 10.3390/s23094294
6 Application of Artificial Intelligence in Virtual Reality
85
13. Turan E, Çetin G (2019) Using artificial intelligence for modeling of the realistic animal behaviors in a virtual island. Comput Stand Interf 66:103361. https://doi.org/10.1016/j.csi. 2019.103361 14. Dobre GC, Gillies M, Pan X (2022) Immersive machine learning for social attitude detection in virtual reality narrative games. Virtual Real 26(4):1519–1538. https://doi.org/10.1007/s10 055-022-00644-4 15. Dubosc C, Gorisse G, Christmann O, Fleury S, Poinsot K, Richir S (2021) Impact of avatar facial anthropomorphism on body ownership, attractiveness and social presence in collaborative tasks in immersive virtual environments. Comput Graph 101:82–92. https://doi.org/10.1016/j. cag.2021.08.011 16. Yang Y (2021) Application of artificial intelligence technology in virtual reality animation aided production. J Phys Conf Ser 1744(3):32037. https://doi.org/10.1088/1742-6596/1744/3/ 032037 17. Ergen M (2019) Gaming. Anatol J Cardiol 22:5–7 18. Yoon KS, Duncan T, Lee SW-Y, Scarloss B, Shapley KL (2007) Reviewing the evidence on how teacher professional development affects student achievement. Issues Answ Rep 33:62 19. Wang G et al (2022) Development of metaverse for intelligent healthcare. Nat Mach Intell 4(11):922–929. https://doi.org/10.1038/s42256-022-00549-6 20. Kiruthika J, Khaddaj S (2017) Impact and challenges of using of virtual reality and artificial intelligence in businesses. In: Proceedings of the 2017 16th international symposium in distribution and computer application to business, engineering sciences DCABES 2017, pp 165–168. https://doi.org/10.1109/DCABES.2017.43 21. Petrovic VM (2018) Artificial intelligence and virtual worlds-toward human-level AI agents. IEEE Access 6:39976–39988. https://doi.org/10.1109/ACCESS.2018.2855970 22. Sagayam KM, Hemanth DJ (2017) Hand posture and gesture recognition techniques for virtual reality applications: a survey. Virt Real 21(2):91–107. https://doi.org/10.1007/s10055-0160301-0 23. Augustauskas R, Kudarauskas A, Canbulut C (2018) Implementation of artificial intelligence methods for virtual reality solutions: a review of the literature. CEUR Workshop Proc 2145:68– 74 24. Hwang GJ, Chien SY (2022) Definition, roles, and potential research issues of the metaverse in education: an artificial intelligence perspective. Comput Educ Artif Intell 3:100082. https:// doi.org/10.1016/j.caeai.2022.100082 25. Rong Q, Lian Q, Tang T (2022) Research on the influence of AI and VR technology for students’ concentration and creativity. Front Psychol 13:1–9. https://doi.org/10.3389/fpsyg.2022.767689 26. Miller E, Polson D (2019) Apps, avatars, and robots: the future of mental healthcare. Issues Ment Health Nurs 40(3):208–214. https://doi.org/10.1080/01612840.2018.1524535 27. Smit R, Smuts H (2023) Game-based learning—teaching artificial intelligence to play minecraft: a systematic literature review. Epic Ser Comput 93:188–202 28. Lim JZ, Mountstephens J, Teo J (2020) Emotion recognition using eye-tracking: taxonomy, review and current challenges. Sensors 20(8):1–21. https://doi.org/10.3390/s20082384 29. Yang T, Zeng Q (2022) Study on the design and optimization of learning environment based on artificial intelligence and virtual reality technology. Comput Intell Neurosci 2022:909. https:// doi.org/10.1155/2022/8259909 30. Bui V, Alaei A (2022) Virtual reality in training artificial intelligence-based systems: a case study of fall detection. Multimed Tools Appl 81(22):32625–32642. https://doi.org/10.1007/ s11042-022-13080-y 31. Dodds TJ, Ruddle RA (2009) Using mobile group dynamics and virtual time to improve teamwork in large-scale collaborative virtual environments. Comput Graph 33(2):130–138. https://doi.org/10.1016/j.cag.2009.01.001 32. Lei Y, Su Z, He X, Cheng C (2023) Immersive virtual reality application for intelligent manufacturing: applications and art design. Math Biosci Eng 20(3):4353–4387. https://doi.org/10. 3934/mbe.2023202
Chapter 7
An Extension Application of 1D Wavelet Denoising Method for Image Denoising Prasanta Kumar Sahoo, Debasis Gountia, Ranjan Kumar Dash, Siddhartha Behera, and Manas Kumar Nanda
1 Introduction In day-to-day life, the signal has a major role in information transformation and data representation, such as music, speech and voice recording, picture (image), television broadcasting many and more. Signal has various forms and can be represented in functions of time, temperature, distance, pressure, and position. Mathematically, a signal can be denoted in terms of basic functions either in the domain of the original independent variable or in the transferred domain [1]. Most of the signals are generated naturally and synthetically. The retrieval of information from synthetic signals is called signal processing. Signal processing depends upon the nature and type of signals [2]. Image is a one-form signal. An image is a spatial representation of a 2D or 3D signal of the scene. A two-dimensional (2D) representation of an image is nothing but a 2D array, in which image pixels are arranged in a tabular form or in row and column form, so the image is a matrix. A digital grayscale image is a two-dimensional P. K. Sahoo (B) · R. K. Dash School of Computer Science, OUTR, Bhubaneswar, Odisha 751029, India e-mail: [email protected] R. K. Dash e-mail: [email protected] D. Gountia IIT Roorkee, Roorkee, Uttarakhand, India e-mail: [email protected] S. Behera Department of Electrical Engineering, GITA, Bhubaneswar, Odisha 752054, India e-mail: [email protected] M. K. Nanda Department of Computer Application, ITER SOA University, Bhubaneswar, Odisha 751030, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 S. Lanka et al. (eds.), Trends in Sustainable Computing and Machine Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-9436-6_7
87
88
P. K. Sahoo et al.
image that can be stored in the digital computer in the form of a pixel matrix. Each pixel of such stored image is mapped by one–one matrix element, and the matrix element is an integer value. These numeric integer values are uniformly changed from 0 to 255. The numeric value zero (0) represents black pixels (or no information is available) and 255 represents white pixels (or information is present). A digital color image is a three-dimensional (3D) image. In comparison to a two-dimensional image, a three-dimensional image has one more dimension to store color information of color image using RGB or CYN models. Since computer memory (RAM) is a 1D array, both 2D and 3D images are stored in a 1D array with different degrees of components [3]. Image processing is a method of converting an image into digital form by doing certain operations to retrieve valuable information from the processed image. Usually, the image processing technique treats all images as a 2D signal when applying certain determined signal processing methods. So, image processing is a subset of signal processing [4, 5]. Variations in color information or the brightness of an image or a signal are called image noise or noise in a signal. It is a concept of electronic noise. Besides the electronic noise, the image noise may originate from film grain and also from the unavoidable noise of the photon detector. So, the processing of removal of grainy spots and discoloration of images is called denoising [6, 7]. In this paper, we have used the 1D wavelet denoising method for the removal of noise from noisy signals. Since the signal is a one-dimensional quantity, the 1D wavelet denoising method is sufficient to remove the noise from noisy signals. The experiment work of the 1D wavelet denoising method has been implemented in the rest section of the paper. However, the 1D wavelet denoising method is not sufficient to denoising the 2D signals like images [8, 9]. So, to avoid this difficulty we have extended the denoising capability of the 1D wavelet denoising method in our proposed algorithm or procedures in the rest section of our work, whose schematic diagram is shown in Fig. 1.
Fig. 1 Schematic diagram of converting 1D signal to 2D signal [10]
7 An Extension Application of 1D Wavelet Denoising Method for Image …
89
The study is divided into different sections. Section 2 provides the performance of the existing method over a 1D signal. Section 3 describes the implementation and the different modalities of the proposed algorithm of 2D signals. Section 4 discusses performance evaluation and its comparison between existing and proposed methodology over 1D and 2D signals. Section 5 describes the performance outcomes of the proposed algorithm and its future work.
2 The Mathematical Model and Materials In this section, we give a brief introduction to the mathematical model for the 1D wavelet denoising method using nonparametric function estimations. Details are available in [7, 8]. The frequently used model for this is X (n) = f (n) + σ e(n),
(1)
where X(n) = Discrete time signal f (n) = Wavelet coefficient function σ = Wavelet coefficient variance e(n) = Gaussian random variables and n = 0, 1, 2, …, N − 1 are random vectors. For practicing X(n) is considered as a discrete time signal with identical time steps which are corrupted by noise and we denoise that signal to recover the original signal. But generally, we represent the signal X(n) in the form of N-dimensional random vectors, which takes the form of as follows: ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝
⎞ ⎛ ⎞ ⎛ ⎞ f (0) f (0) + σ e(0) σ e(0) ⎟ ⎜ f (1) ⎟ ⎜ σ e(1) ⎟ f (1) + σ e(1) ⎟ ⎜ ⎟ ⎜ ⎟ ⎟ ⎜ f (2) ⎟ ⎜ σ e(2) ⎟ f (2) + σ e(2) ⎟ ⎜ ⎟ ⎜ ⎟ ⎟ ⎜ ⎟ ⎜ ⎟ . . . ⎟+⎜ ⎟=⎜ ⎟. ⎟ ⎜ ⎟ ⎜ ⎟ ⎟ ⎜ ⎟ ⎜ ⎟ . . . ⎟ ⎜ ⎟ ⎜ ⎟ ⎠ ⎝ ⎠ ⎝ ⎠ . . . f (N − 1) f (N − 1) + σ e(N − 1) σ e(N − 1)
(2)
In Eq. (2), we replace (N × 1) random vector with (N × M) random matrices to extend the 1D wavelet denoising method for recovering an image in the proposed algorithm.
90
P. K. Sahoo et al.
2.1 1D Wavelet Method for Denoising the Signal—A Review Wavelet is a mathematical function used for processing and compression of digital signals. The idea of 1D Wavelet is a level-by-level time-dependent threshold used to increase the denoising capability of the nonstationary discrepancy noise model. Since signals are transmitted over some distance through transmission media, there is a chance of contamination of some noises [11]. So, this contaminated signal is mathematically defined as follows: S ' (n) = S(n) + T (n),
(3)
where S ' (n) = Contaminated signal S(n) = Original signal T (n) = Noise. Here noise is a random, adaptive, and high-frequency signal. This is the existing method for denoising the signal which has been implemented by considering a signal X(n) = f (n) + e(n) and applying the denoising 1D wavelet method by considering a vector 1000 × 1 N (0, 1). So modified signal will take the form of X (n) = f (n) + e(n) ∼ N (0, 1).
(4)
With f (n) = 0, which is called a noisy signal. The noisy signal will be denoise by following the steps and producing results as per the following output in Fig. 1. Step 1: Choose a wavelet with level N and perform wavelet decomposition to signal X(n) down to Level N. √ Step 2: Choose the threshold as 2 ln(N ), i.e., ‘sqtwolog’ and for each level from 0 to N − 1 with signal length N. Step 3: Calculate wavelet reconstruction by using an original coefficient approximation of length N and modify the coefficient details of level from 1 to N.
2.2 Existing Method Implementation and Result Description In this section, we repeated the existing 1D wavelet method of denoising using our own MATLAB coding for our experimental analysis over any random signal. We created a random signal in MATLAB as follows: thresh = 0.4; Y = linspace(− 1,1,100); Ythrehhard = wthresh(Y,‘h’,thresh); Ythreshsoft = wthresh(Y,‘s’,thresh);
7 An Extension Application of 1D Wavelet Denoising Method for Image …
91
Fig. 2 Created original signal
Creation of Original Signal Figure 2 describes the creation of the original signal. This signal has been implemented using MATLAB command init = 2.0556e + 09 and structure rng. The rng structure is characterized as follows: Type: ‘twister’ Seed: 2055615866 State: [625 × 1 uint32]. So created signal is 1D wave data. Addition of Noise to Original Signal Figure 3 describes the creation of a noisy signal by adding white Gaussian noise to the original signal by using MATLAB function wnoise() by taking signal-to-noise ratio value 4. The blue colors are of white Gaussian noise with high frequencies. The resulting noise signal is called a contaminated signal. The contaminated signal causes low resolution, and low resolution causes difficulties in feature extraction and analysis of signal. So, it needs to be denoise to detect the original signal. Denoising of Signal Figure 4 describes the output of the denoise signal. In science and technology denoising a signal is essential for feature extraction and the process of synthesis. Denoising has a vital role in data security, signal improvement, and alteration. But the question is here whether the signal has been denoised or not. So, to know this we calculated the peak signal-to-noise ratio (PSNR) of the denoised signal concerning noisy signal, which has been described in the result analysis section of this paper [12, 13]. From the experimental work and result analysis section, we got the PSNR value of the 1D noisy signal is − 0.0951 dB and the PSNR value of the denoised signal is
92
P. K. Sahoo et al.
Fig. 3 Created noisy signal Fig. 4 Denoising signal using 1D wavelet
0.1165 dB, which the denoising accuracy value is 22.502% concerning the denoised signal. So 1D wavelet denoising method for synthetic signals is very well. It occurred due to an insignificant number of huge coefficients in original synthetic signals. So, to avoid a small number of large coefficients we considered some electrical signals. In the electrical signal, we tried the 1D wavelet denoising method by using a highly perturbated part of the electrical signal [14, 15].
7 An Extension Application of 1D Wavelet Denoising Method for Image …
93
Fig. 5 Creation of original electrical signal
Figure 5 represents the original electrical signal with noise created using MATLAB code as follows: load leleccum ind_x = 2000:3450; x = leleccum(ind_x); To add noise to the original signal, here we kept suspect non-white noise. When non-white noise is suspected threshold will be rescaled by level-dependent estimation depending on the level of noise. So we rescaled into ‘ind_x = 2000:3450;’. After that we denoised the signal by using the MATLAB function wdenoise(X). This function denoises the signal X by using the empirical Bayesian method using a Cauchy prior denoises. So, this signal is called acontaminated signal. This signal contains high frequencies and is called noise. The presence of noise in the signal causes the loss of information. So, the signal should be denoised to avoid loss of information. This information is very important in case of medical signals like electrocardiogram (ECG) signal [16–18]. Figure 6 shows noise removed from the original contaminated signal using the 1D wavelet denoising method. The PSNR value of the original contaminated electrical signal is − 67.7592 dB, and the PSNR value of the denoised signal is − 15.3320 dB. The denoised accuracy has been improved by 77.39%. So, the result is quite good despite the time heterogeneousness behavior of the noise before and after the beginning of the sensor failure [19–22].
94
P. K. Sahoo et al.
Fig. 6 Denoising electrical signal using 1D wavelet
3 1D Wavelet Denoising Method Extension to Image Denoising In this section, we presented the proposed algorithm of the 1D wavelet denoising method extension procedure by converting the signal to energy and that energy has acted as another dimension of the image. Then apply the algebraic mean method for compression, decompression, and denoising the image.
3.1 Proposed Methodology The block diagram of the proposed model shows how the proposed algorithm will work efficiently to convert a 1D signal to a 2D grayscale image. Since we are taking synthetic signals for our experiment, there is no need to apply decomposition to find intrinsic mode functions (IMFs) to describe the signal. From the beginning, lower IMFs have been removed from lower frequencies which makes our signal smoother, as lower frequencies have been considered as noise. So, directly we have to calculate energy from the signal which will be used for constructing 2D images; i.e., we constructed 2D images from energy spectrum. In our experimental work, we have calculated energy from the signal using the following formula,
7 An Extension Application of 1D Wavelet Denoising Method for Image …
95
∫∞ |x(t)|2 dt.
E=
(5)
−∞
The above equation can be normalized as follows. Let’s consider the v(t) is the voltage source, where the signal has been created. R is the unit resistance across the signal. i(t) is the conducting current of the signal. So instantaneous power dissipated by the register is v 2 (t) = i 2 (t)R. R
p(t) = v(t)i (t) =
(6)
Since we have R = 1 Ω, then Eq. (6) can be rewritten as p(t) = v 2 (t) = i 2 (t).
(7)
So, the total energy and the average power are defined as the limits ∫T E = lim
i 2 (t)dt, J
T →∞ −T
(8)
and 1 P = lim T →∞ 2T
∫T i 2 (t)dt, W
(9)
−T
Therefore, the total energy and the average power normalized to the unit resistance of any arbitrary signal x(t) can be defined as ∫T E = lim
T →∞ −T
|x(t)|2 dt, J
(10)
and 1 T →∞ 2T
∫T |x(t)|2 dt, W.
P = lim
(11)
−T
In this equation, rather than using the numeric values of the time domain we are using the energy of the arbitrary wave signal x(t). Since we will construct 2D grayscale images, we followed the above-normalized energy spectrum value within the range of 0–255. The resulting normalized value will form 2D grayscale image [11, 23–25, 26, 27].
96
P. K. Sahoo et al.
3.2 Block Diagram of Proposed Method The diagram is the block diagram of the proposed method. In the proposed method first, we read a 1D wave signal. The large wave signal is then decomposed into a series of frequencies of intrinsic mode functions (IMF) and each successive IMF contains oscillations of lower frequencies preceding the next one. After that, we calculated energy using a certain formula. Finally, the output energy acts as a second component of the signal to form a 2D signal in the form of images [28, 29]. After finding the image, apply the mean algebraic method for compression, and decompression to obtain a denoising image, which is the extension application of 1D wavelet method for image denoising [30–32]. Block diagram of proposed model is shown in Fig. 7. Proposed Algorithm Step 1: Read any signal say X(n) = f (n) + σ e(n). Step 2: Convert the signal into equal frames using the following segmentation formula | K =1+
| N − NF . NH
Here N = Signals of N samples, N F = Frame Length < N, N H = Hope length with K = Frames. Step 3: From the frame length N F define matrix height and width. For example, if M = Frame size and N = Number of frames, the size of the matrix will be M × N.
Fig. 7 Block diagram of proposed model
7 An Extension Application of 1D Wavelet Denoising Method for Image …
97
Step 4: Calculate the energy from the signal using the formula: ∫∞ |X (n)|2 dn.
E= −∞
Step 5: Put the individual frame energy value as the individual corresponding element of the matrix, provided the value of each frame will be placed in the matrix vertically. As a result, the last value of the first frame will be placed into the first cell of the last column as the height of the matrix is equal to the size of the frame such that every value will fit into the column. Step 6: Repeat Step 5 until other frame values are put into the matrix sequentially. Step 7: After the construction of the matrix from the signal, we have to normalize the frame values within the range of 0–255 by considering minimum matrix element 0 and maximum matrix element 255 and calculate all numerical values as per their ratios. Step 8: Set the new values by replacing the old values of the matrix which will form the 2D image and it is represented as X (i, j ) = f (i, j ) + σ e(i, j ). Step 9: Apply the 1D wavelet denoising method over the constructed 2D image. Step 10: Apply L-level scalar quantization over each pixel of X(i, j) and calculate quotient Q L (i, j) matrix and remainder matrix (RL (i, j)) for each pixel, i.e., ( Q L (i, j) = floor
(i, j) L
) and RL (i, j) = (i, j )%L ,
where ∀ i, j ∈ I are individual integer pixel values of image X(i, j). Matrix representation of the converted 1D signal is shown in Fig. 8. Step 11: Compress the remainder matrix RL (i, j) is of 3 × 4 dimension and denote it as as Rcomp which is(computed ) Rcomp = floor
(i+ j ) 2
, where adjacent element (i, j) ∈ Columni .
Step 12: Add noise to the compress remainder and produce RL (i, j) matrix of 4 × 4 dimension.
98
P. K. Sahoo et al.
Fig. 8 Matrix representation of converted 1D signal
Step 13: Perform reverse L-level scalar quantization (Lr) on each pixel (i, j) and compute reverse quotient matrix QLr (i, j) and reverse remainder matrix RLr (i, j) each of 3 × 4 dimension. ( ) Q Lr (i, j ) = floor (i,Lj ) and RLr (i, j ) = (i, j )%L, where ∀ i, j ∈ I are individual integer pixel values of image X(i, j). Step 14: Decompress RLr (i, j) matrix and perform a mean algebraic operation to get a matrix of dimension ) 4 matrix and denote it as RDecomp . ( 4× RDecomp = floor
(i+ j) 2
, where the adjacent element (i, j) ∈ Columni .
Step 15: Obtained the denoised image (DI) by operating, DI = L × Q Lr (i, j ) + RDecomp .
4 Implementation and Experimental Work In this section, the extended application of the 1D wavelet denoising method through the proposed algorithm for two-dimensional image signals. In general, 1D wavelet denoising method is suitable for signal denoising. With the help of our proposed algorithm, we converted the 1D signal into energy. The calculated energy acted as another dimension of the signal. So, a 1D signal is converted into a 2D signal, which is represented as X (i, j ) = f (i, j ) + σ e(i, j ). This is nothing but a mathematical function of an image [16, 17, 33]. Creation of Original Image This original image of a woman was read and displayed for experimental purposes using MATLAB functions as shown below. It is a 2D signal. This image is a 2D grayscale image with dimensions 256 × 256 double. Over this image/signal, we tested the experiment through the existing 1D wavelet denoising method by making
7 An Extension Application of 1D Wavelet Denoising Method for Image …
99
Fig. 9 X(i, j): original image
it to noisy image. The procedure of making noise signals is described in the next section of this paper [12, 13, 18, 19, 34]. Figure 9 shows the original image. image = imread(‘woman.png’); imshow(image); Creation of Noisy Image Figure 10 image is noisy. The original image was converted into a noisy image by using randn() MATLAB function. The randn() MATLAB function generates an array of random numbers which are normally distributed with mean zero and variance 1 along through the size of the matrix. Or the original image using the following MATLAB code. image_noisy(RL (i, j )) = Q L (i, j ) + 15 × randn(size(X (i, j ))); Figure 11 is a denoised image. The denoied has been done through the extension application of 1D wavelet denoising method using the algebraic mean algorithm. In the 1D wavelet denoising method, denoising is performed automatically for onedimensional signals but in the case of 2D signal or image denoising, compressions and decompressions are performed using the proposed mean algebraic algorithm in MATLAB platform.
100
P. K. Sahoo et al.
Fig. 10 RL (i, j): noisy image [experimental work]
Fig. 11 Denoised image [experimental work]
4.1 Experimental Work and Result Analysis The table shows calculation values of the peak signal-to-noise ratio (PSNR) of the original signal, noisy signal, and denoised signal. After the calculation of PSNR, we made a comparison of PSNR values of noisy signal and denoised signal to find out the percentage of image recovered, which is called as accuracy of denoised signal. The PSNR value has been calculated via mean square error (MSE) using the following logarithmic decibel scale [24, 35–40]. MSE =
M−1 N −1 ∑∑ 1 [NI(i, j) − DI(i, j )]2 . M × N i=0 j=0
Here, M × N = Image dimension N(i, j) = Noisy image DI(i, j) = Denoised image
(12)
7 An Extension Application of 1D Wavelet Denoising Method for Image …
101
(
) MAX2I PSNR = 10 × log10 MSE ( ) MAXI = 20 × log10 √ MSE = 20 × log10 (MAXI ) − 10 × log10 (MSE). Here, MAXI = 255, which is the grayscale image’s maximum possible pixel value. From Table 1, we conclude that our proposed algorithm 1D wavelet denoising method also helps to denoising 2D signal, since the PSNR value of the denoised signal is higher than noisy signal. Graphical Representation Figures 12, 13, and 14 show the graphical representations. Pie Chart Representation See Fig. 12.
Table 1 Percentage of denoised comparison between noise signal and denoised signal Signal type
1D signal (WAVE)
Model (algorithm) Existing model
2D signal (IMAGE) 2D signal (IMAGE)
Proposed model
Fig. 12 Pie Chart representation of denoising comparison between existing method and proposed method of 2D signal (image)
SNR (dB)/PSNR (dB) Original signal (dB)
Noisy signal (dB)
Denoised signal (dB)
Accuracy (in % of denoised)
− 0.0951
0.1165
22.502
32.78
12.5329
20.4572
63.20
32.7834
14.1879
25.3545
78.11
7.4892
102
P. K. Sahoo et al.
Fig. 13 Bar Chart representation of denoising comparison between existing method and proposed method of 2D signal (image)
Fig. 14 Line Chart representation of denoising comparison between existing method and proposed method of 2D signal (image)
Bar Chart Representation See Fig. 13.
7 An Extension Application of 1D Wavelet Denoising Method for Image …
103
Line Chart Representation See Fig. 14.
5 Conclusion and Future Work From the above experimental work, it can be said that the extension application of the 1D wavelet denoising method is helpful in denoising images, and it can be done through a mean algebraic proposed algorithm. In the proposed algorithm, although the normalized numeric value replaced the original frame energy value of the matrix, there may be the cause of a decrease in the original signal but it helped to minimize the noise in the original signal; still, it produced a very good accuracy ratio, i.e., 78%. However, frequency domain 2D representation of the signal is efficient for useful feature extraction, as output image resolution or quality estimation with PSNR is very good. So, in the future, the quality of denoised images can be improved by 78% accuracy to 100% accuracy by doing further research.
References 1. Kim S, Bose N, Valenzuela H (1990) Recursive reconstruction of high-resolution image from noisy undersampled multiframes. IEEE Trans Acoust Speech Signal Process 38:1013–1027 2. Stéphane M (1998) A wavelet tour of signal processing. Academic Press Inc., p xxiv+577 3. Tekalp A, Ozkan M, Sezan M (1992) High-resolution image reconstruction from lowerresolution image sequences and space-varying image restoration. In: IEEE international conference on acoustics, speech, and signal processing, vol III. IEEE Press, Piscataway, NJ, pp 169–172 4. Bose N, Boo K (1998) High-resolution image reconstruction with multisensors. Int J Imaging Syst Technol 9:294–304 5. Capel D, Zisserman A (2000) Super-resolution enhancement of text image sequences. In: International conference on pattern recognition. International Association for Pattern Recognition 6. Ng M, Chan R, Tang W-C (1999) A fast algorithm for deblurring models with Neumann boundary conditions. SIAM J Sci Comput 21:851–866 7. Antoniadis A, Oppenheim G (1994) Wavelets and statistics. In: Lecture notes in statistics, vol 103. Papers from the conference held in Villard de Lans, 16–18 Nov 1994. Springer-Verlag, p 1995ii+411 8. Donoho D (1995) De-noising by soft-thresholding. IEEE Trans Inform Theory 41:613–627 9. Daubechies I. Ten lectures on wavelets. In: CBMS-NSF regional conference series in applied mathematics, vol 61. Society for Industrial and Applied Mathematics (SIAM), p 1992xx+357 10. Tuan DV, Chong UP (2011) Signal model-based fault detection and diagnosis for induction motors using features of vibration signal in two-dimension domain. J Mech Eng 57(9):655–666 11. Sahoo P, Nanda M (2023) A modified partitioning algorithm for classification of e-waste. J Stat Manag Syst 26:53–65. https://doi.org/10.47974/JSMS-947 12. Ephraim Y, Malah D (1984) Speech enhancement using a minimum-mean square error shorttime spectral amplitude estimator. IEEE Trans Signal Process 32(6):1109–1121
104
P. K. Sahoo et al.
13. Yi H, Philipos CL (2004) Speech enhancement based on wavelet thresholding the multitaper spectrum. IEEE Trans Speech Audio Process 12(1) 14. Orhan AE, Pitkow X (2018) Skip connections eliminate singularities. In: International conference on learning repesentations, Vancouver, BC 15. Gao T, Du J, Dai LR (2018) Densely connected progressive learning for LSTM-based speech enhancement. In: 2018 IEEE international conference on acoustics, speech and signal processing, Calgary, Canada. IEEE 16. Dragomiretskiy K, Zosso D (2014) Variational mode decomposition. IEEE Trans Signal Process 62(3):531–544. https://doi.org/10.1109/TSP.2013.2288675 17. Boll SF (1979) Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans Signal Process 27(2):113–120 18. Lim JS, Oppenheim AV (2005) Enhancement and bandwidth compression of noisy speech. Proc IEEE 67(12):1586–1604 19. Wang ZQ, Wang DL (2020) Deep learning based target cancellation for speech dereverberation. IEEE Trans Signal Process 28:941–950 20. Williamson DS, Wang DL (2017) Time-frequency masking in the complex domain for speech dereverberation and denoising. IEEE Trans Signal Process 25:1492–1501 21. Mayer F, Williamson DS, Mowlaee P, Wang DL (2017) Impact of phase estimation on singlechannel speech separation based on time-frequency masking. JASA 141:4668–4679 22. Attabi Y, Champagne B, Zhu WP (2021) DNN-based calibrated filter models for speech enhancement. Circuits Syst Signal Process 23. Van DN (2013) Two dimension representation approach for developing the fault diagnosing system based on vibration signal. University of Ulsan, Republic of Korea 24. Sahoo PK, Nanda MK, Mohammed J (2019) A modified framework for reversible digital image watermarking. In: 2019 international conference on applied machine learning (ICAML), Bhubaneswar, India, pp 228–234. https://doi.org/10.1109/ICAML48257.2019.00049 25. Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Angus N, Liu B, Yu PS, Zhou ZH, Steinbach M, Hand DJ, Steinberg D (2007) Top 10 algorithms in data mining. J Knowl Inf Syst 14(1):1–37 26. Huang NE, Shen Z, Long SR, Wu MC, Shih HH, Zheng Q, Yen N-C, Tung CC, Liu HH (1998) The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc R Soc Lond A Math Phys Eng Sci 454(1971):903–995 27. Rilling G, Flandrin P, Goncalves P (2003) On empirical mode decomposition and its algorithms. In: IEEE-EURASIP workshop on nonlinear signal and image processing, NSIP-03, Grado (I), vol 3, pp 8–11 28. Chiang Y, Sullivan BJ (1989) Multi-frame image restoration using a neural network. In: Proceedings of the 32nd midwest symposium on circuits and systems. IEEE, pp 744–747 29. Hu J, Wang X, Shao F, Jiang Q (2020) TSPR: deep network based blind image quality assessment using two-side pseudo reference images. Digit Signal Process 106:102849 30. Jarrett K, Kavukcuoglu K, Ranzato M, LeCun Y (2009) What is the best multi-stage architecture for object recognition. In: 2009 IEEE 12th international conference on computer vision, pp 2146–2153 31 Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classifcation with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105 32. Liang J, Liu R (2015) Stacked denoising autoencoder and dropout together to prevent overftting in deep neural network. In: 2015 8th international congress on image and signal processing (CISP), pp 697–701 33. Zhang K, Zuo W, Chen Y, Meng D, Zhang L (2017) Beyond a Gaussian denoiser: residual learning of deep CNN for image denoising. IEEE Trans Image Process 26(7):3142–3155 34. Gholami-Boroujeny S, Fallatah A, Heffernan BP et al (2016) Neural network-based adaptive noise cancellation for enhancement of speech auditory brainstem responses. SIViP 10:389–395. https://doi.org/10.1007/s11760-015-0752-x 35. Diwakar M, Kumar M (2018) A review on CT image noise and its denoising. Biomed Signal Process Control 42:73–88. https://doi.org/10.1016/j.bspc.2018.01.010
7 An Extension Application of 1D Wavelet Denoising Method for Image …
105
36. Buades A, Coll B, Morel JM (2005) A review of image denoising algorithms, with a new one. Multiscale Model Simul 4(2):490–530 37. Awad A (2019) Denoising images corrupted with impulse, Gaussian, or a mixture of impulse and Gaussian noise. Int J Eng Sci Technol 22(3):746–753 38. Bingo WKL, Charlotte YFH, Qingyun D, Reiss JD (2014) Reduction of quantization noise via periodic code for oversampled input signals and the corresponding optimal code design. Digit Signal Process 24:209–222 39. Ramos S, Gehrig S, Pinggera P et al (2017) Detecting unexpected obstacles for self driving cars: fusing deep learning and geometric modeling. In: 2017 IEEE intelligent vehicles symposium (IV). IEEE 40. Wu H, Liu Y, Liu Y, Liu S (2019) Efficient facial expression recognition via convolution neural network and infrared imaging technology. Infrared Phys Technol 102:103031
Chapter 8
Improving Navigation Safety by Utilizing Statistical Method of Target Detection on the Background of Atmospheric Precipitation M. Stetsenko, O. Melnyk, O. Onishchnko, V. Shevchenko, V. Sapiha, O. Vishnevska, and D. Vishnevskyi
1 Introduction Today, about 80% of cargo around the world is transported by sea-going vessels. As the merchant fleet numbers about 100 thousand units, safety of navigation imposes certain technical requirements and construction rules. According to these rules, the navigation radar on board a sea-going vessel is part of the standard equipment that ensures the safety of navigation at all times. The operator’s decision to control the position of the vessel to avoid collision, grounding, etc., largely depends on the accuracy of the data received from the radar. Unfortunately, the quality of the received data is not always high. The main problems faced by operators arise due to unfavorable weather conditions (heavy sea, rain, snow, etc.), as the principle of radar operation is affected by interference from the sea and precipitation. M. Stetsenko · O. Onishchnko · V. Shevchenko · V. Sapiha National University “Odessa Maritime Academy”, Odesa, Ukraine e-mail: [email protected] V. Shevchenko e-mail: [email protected] O. Melnyk (B) · O. Vishnevska · D. Vishnevskyi Odesa National Maritime University, Odesa, Ukraine e-mail: [email protected] O. Vishnevska e-mail: [email protected] D. Vishnevskyi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 S. Lanka et al. (eds.), Trends in Sustainable Computing and Machine Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-9436-6_8
107
108
M. Stetsenko et al.
To date, there are several methods to suppress interference from rain and snow. The hardware method utilizes a gain function that allows the operator to filter out low amplitude signals by manually adjusting the gain. However, this often leads to undesirable conditions when useful signals from small navigational targets (e.g., small boats and other similar vessels) are also filtered. To solve this problem, modern radars use a rain and snow interference suppression algorithm. As a rule, interference from rain and snow is evenly distributed. The heavier the rain and snow, the stronger the interference, but it can be suppressed using constant false alarm rate (CFAR) In low-resolution radars, the interference probability density follows a Rayleigh distribution. In high-resolution radars, the probability density can be expressed by a Weibull distribution. CFAR can adjust the interference control threshold, thereby providing a certain false alarm rate and improving the target control capability. When handling a constant false alarm rate, if large noise fluctuations occur, the false alarm probability increases. To maintain a constant false alarm rate, it is necessary to increase the detection threshold accordingly and improve the input signal to noise ratio, i.e., lose the constant false alarm rate CFAR [1]. The literature review covered a wide range of sources contributing to the field of radar technology and applications. The book Absorption and Scattering of Light by Small Particles by is a seminal book in understanding the behavior of light by small particles. Their earlier 1983 paper is also referenced, indicating their long-standing contributions to the field. The paper [2] introduces a signal processing algorithm for ship navigation radar, emphasizing azimuth distance monitoring. Scientific work [3] studied polarization invariants of the scattering matrix and their stability in the context of the aperture synthesis problem. The contribution of paper [4] focuses on the proper polarimetric scattering matrix formulation in radar systems, drawing a critical distinction between forward propagation and backscattering. The work [5] introduces a small ship target detection method based on polarimetric synthetic aperture radar (SAR), demonstrating the practical application of polarimetric radar technology in target detection. The source [6] provides a foundational understanding of radiolocation principles, potentially delving into key concepts and theoretical frameworks. The paper [7] focuses on antenna miniaturization for radiolocation, suggesting advancements in antenna technology for improved radiolocation applications. The source [8] introduces a novel radiolocation method applicable to depth estimation, with potential applications in groundwater level analysis. The paper [9] discusses beamforming techniques in the context of radio-telescope technology, indicating its relevance for radiolocation applications. The study [10] provides insights into radiolocation experiments conducted within urban environments, potentially addressing challenges specific to this setting. The paper [11, 12] explores the directional radio response of a specific device guided by radiolocation, potentially contributing to advancements in device localization and introduces an algorithm for simultaneous radiolocation of multiple sources, indicating advancements in source localization techniques. [13, 14] presented a lightweight radar-based ship detection framework, potentially offering
8 Improving Navigation Safety by Utilizing Statistical Method of Target …
109
innovative methods for ship detection and proposed a unified approach to ship detection, combining optical and radar data, potentially contributing to advancements in ship detection methods. The source [15] introduces a novel approach for estimating ship speed and heading using radar sequential images, offering potential advancements in ship tracking technology. The articles [16–18] discuss inshore ship detection methods based on multi-modality saliency, potentially contributing to improved ship detection accuracy, presented a network designed for small ship detection in synthetic aperture radar imagery, indicating advancements in ship detection algorithms and introduced a method for judging the motion state of ships and selecting appropriate radar imaging algorithms, potentially contributing to improved ship detection accuracy. The source [19] discusses calibration methods involving ground-based, shipbased, and spaceborne radars, potentially contributing to improved radar accuracy. In [20], authors focused on estimating ship berthing parameters using a fusion of MultiLiDAR and MMW radar data, potentially offering advancements in ship docking technology. The articles [21–23] highlighted the importance of situational awareness for ship safety, focused on optimizing ship speed for safe transportation of heavy cargo under varying weather conditions discussed the concept of autonomous ships and their steering control using mathematical models. Works [24–27] presented an assessment methodology based on Markov models for navigational safety, provided a comprehensive method to evaluate the vulnerability of critical ship equipment and systems, examined information security risks in shipping for ensuring safety in maritime transportation and investigated the environmental impact of ship operation in relation to efficient freight transportation. [28–30] discussed the use of fuzzy controllers in ship motion control systems for automation, identified energy-efficient operation modes for propulsion electrical motors in autonomous swimming apparatus and presented a simple technique for identifying parameters of vessel models. Papers [31–33] introduced a decision support system concept for designing combined propulsion systems, studied challenges in creating energy-efficient positioning systems for multipurpose sea vessels, and explored risk management mechanisms in higher education institutions using innovative project information support. Work [34] discussed a method for managing human resources in educational projects of higher education institutions. The article [35] focused on modeling the creation of organizational energy entropy. Constructed and analyzed a model for creating a road map for enterprise development in [36, 37] examined the dynamics of project portfolio structure in organizational development considering information entropy resistance. Constructed and investigated a model depicting the energy entropy dynamics of organizations in [38]. Studies of various forms of cooperation among participants in inland waterways cargo delivery in the Dnieper region, development a strategy for modernizing passenger ships by optimizing fund distribution in [39, 40]. The researches [41, 42] provided insights into predicting centrifugal compressor instabilities for internal combustion engines and introduced a concept for vibroacoustic diagnostics of fuel injection and electronic cylinder lubrication systems in marine diesel engines.
110
M. Stetsenko et al.
Considering the above, the purpose of research is to improve the method of radar detection of navigation objects against the background of atmospheric precipitation regardless of its intensity. As an alternative to the described methods, another approach to increase the potential information capacity of an electromagnetic wave by recognizing and classifying the polarization parameters of partially polarized waves was proposed. Recognition and classification of polarization parameters can be performed using specialized radar and sensor systems as well as data processing techniques. Ultimately, the efficiency of recognition and classification of polarization parameters depends on the quality of the collected data, the selected features and the machine-learning model used. Such statistical methods have been successfully used in polarization selection of rain clouds and atmospheric precipitation [4]. This allows us to extend these studies to the problem of detection and targeting of complex objects. By a complex object, we mean a navigational object located either on a background of spatially distributed reflectors that remain unchanged at any moment of radar observation time. Hence, the main task of the study is to establish a statistical relationship between the invariant parameters of a polarized electromagnetic wave and an object on the sea surface surrounded by spatially distributed reflectors in the form of rain or other precipitation. The solution of this problem involves finding the Stokes parameters for the scattered wave when irradiating a complex object. This will make it possible to pass from the energy characteristics of the scattered wave to such informative parameters as the degree of polarization, azimuth, and ellipticity of the polarized wave.
2 Materials and Methods |2 | |2 | Let us express | E xinc | , | E yinc | through the components I 0 , Q0 of the incident wave | |2 | |2 Stokes vector. Since I0 = |E x |2 + | E y | , Q 0 = |E x |2 − | E y | , then | | inc |2 | | E | = I0 + Q 0 , | E inc |2 = I0 − Q 0 . y x 2 2 Hence,
I0 − Q 0 I0 + Q 0 + |Φ2 |2 2 2 . ) 1 ( ) 1 ( 2 2 = I0 |Φ1 | + |Φ2 | + Q 0 |Φ1 |2 − |Φ2 |2 . 2 2 The second component of the Stokes vector of the scattered radiation is expressed through the components of the Stokes vector of the incident radiation as follows: I = |Φ1 |2
| |2 | |2 | |2 | |2 Q = |Φ1 E xinc | − |Φ2 E yinc | = |Φ1 |2 | E xinc | − |Φ2 |2 | E yinc |
8 Improving Navigation Safety by Utilizing Statistical Method of Target …
=
111
) 1 ( ) 1 ( I0 |Φ1 |2 − |Φ2 |2 + Q 0 |Φ1 |2 + |Φ2 |2 . 2 2
We express the third component U of the Stokes vector through the components of the Stokes vector of the incident radiation: ) ) ( ( U = Φ1 Φ∗2 U0 + Φ2 Φ∗1 V0 . Finally, for the fourth component of the Stokes vector of reflected radiation, we write ) ) ( ( V = Φ1 Φ∗2 U0 + Φ1 Φ∗2 V0 . Thus, for the Stokes vector of scattered radiation, the following expression is obtained: ( ) 1 ( )1 ( ) 1 ( I0 |Φ1 |2 + |Φ2 |2 + Q 0 |Φ1 |2 − |Φ2 |2 I0 |Φ1 |2 − |Φ2 |2 S= 2 2 2 ) ( )( ) ) ( ) ) ( ( 1 + Q 0 |Φ1 |2 + |Φ2 |2 Φ1 Φ∗2 U0 + Φ2 Φ∗1 V0 Φ1 Φ∗2 U0 + Φ1 Φ∗2 V0 . 2 (1) We compose the Mueller scattering matrix for reflective metal surfaces: M = (M11 M12 0 0 M21 M22 0 0 0 0 M33 M34 0 0 M43 M44 ),
(2)
where |Φ1 |2 + |Φ2 |2 |Φ1 |2 − |Φ2 |2 ; M12 = M21 = ; 2 2 ) ) ( ( = M44 = Φ2 Φ∗1 ; M34 = −M43 = Φ2 Φ∗1 .
M11 = M22 = M33
2.1 Mueller Scattering Matrix for a Spherical Raindrop It is convenient to decompose the vector of the incident electric field E inc into parallel E ||inc and perpendicular E ⊥inc components and present the relationship between the incident and scattered fields in the matrix form: (
) ) eik(r −z) ( E ||refl E ⊥refl = (S2 S3 S4 S1 ) E ||inc E ⊥inc , −ikr
(3)
112
M. Stetsenko et al.
where k is the wave vector, r is the path traveled by the wave, and S j ( j = 1, 2, 3, 4) are the elements of the amplitude scattering matrix that depend on the scattering angle θ and azimuth γ . Then, for scattering of an electromagnetic wave by a spherical particle, taking into account the principle of reciprocity, expression (13) can be written as (
) ) eik(r −z) ( E ||refl E ⊥refl = (A2 0 0 A1 ) E ||inc E ⊥inc , −ikr
(4)
where 1∑ (2n + 1)(an + bn ), 2 n
(5)
n=∞ 1 ∑ (2n + 1)(an + bn )(coscosθ ). 2 n
(6)
A1 = A2 =
Interaction processes is shown in Fig. 1. Fig. 1 Interaction processes
8 Improving Navigation Safety by Utilizing Statistical Method of Target …
113
In formulas (5, 6), the coefficients of the scattering series an and bn are found using the following expressions: an =
mψn (mx)ψn' (ηx) − ψn (mx)ψn' (mx) ; mψn (mx)ξn' (x) − ξn (x)ψn' (mx)
(7)
bn =
ψn (mx)ψn' (mx) − mψn (mx)ψn' (mx) , ψn (mx)ξn' (x) − mξn (x)ψn' (mx)
(8)
where ψn (ηx), ξn (x) are the Riccati-Bessel functions, and x and m denote the diffraction parameter and relative refractive index, respectively. Wherein x = ka = m=
2π m 2 a ; λ m1 , m2
(9)
where a is the particle radius, m1 and m2 are the refractive indices of the particle and medium (air), respectively. We expand the functions included in the coefficients an and bn in the power series and save only the terms of order x 6 . The first four obtained coefficients are as follows: a1 = −
)( ) ( ( )2 i2x 3 m 2 − 1 i2x 5 m 2 − 2 m 2 − 1 4x 6 m 2 − 1 − + ; ( )2 3 m2 + 2 5 9 m2 + 2 m2 + 2 b1 = −
) i x5 ( 2 i x 5 m2 − 1 m − 1 ; a2 = − ; b1 ≈ 0. 45 15 2m 2 + 3
If |m|x 600
The bridge is not lifted
Dry Threshold frequency
F < 600
Bridge is lifted
Fig. 3 Water level analysis under the bridge
Table 2 Truth table of overall project SR
Water sensor
Flex sensor
Vibration sensor
IR sensor
Result
1
1
0
0
0
Vehicles can go
2
1
1
1
1
Prohibition of vehicles
3
0
0
1
0
Go slow
4
1
0
1
0
Prohibition of vehicles
When the water sensor is activated (1) and the flex sensor, vibration sensor, and IR sensor are not activated (0), it indicates that vehicles can go which implies vehicles can travel even if the water level under the bridge rises to a certain level. When all sensors (water, flex, vibration, and IR) are activated (1), it indicates a prohibition of vehicles that means even if there is a situation calamities like tsunamis or earthquakes, it is not advisable to use the bridge as it can be fatal. When only the vibration sensor is activated (1) and the other sensors are not activated (0), it indicates that vehicles should go slow which means the earthquake is minor and has have not created cracks yet on the road so the vehicles can travel but slow. When the water sensor and vibration sensor are activated (1), while the flex sensor and IR sensor are not activated (0), it indicates a prohibition of vehicles that means if the water level rises and damages the bridge vehicles should not travel (Fig. 4).
462
K. Ingale et al.
Fig. 4 Smart bridge implementation
The above shown is the implementation of the system, it shows the circuitry of the system where there are IR sensors at the corner of bridge to maintain the count of vehicles, the piezo plates generate electricity for self-powering system; below there are soil moisture sensor if the water level increases, it sends signal to the actuator and the flex and vibration sensor sensors sense the crack and bend on the structural health of bridge (Table 3). Table 3 Comparison of outcomes with existing systems SR no.
Features/Technology used in our system
Existing systems
1.
Water level monitoring using soil moisture sensors
Soil moistures, ultrasonic sensors, pressure transducers
2.
Vehicle traffic management using IR sensors as counters
Cameras
3.
Structural health monitoring with flex Combination of sensors like accelerometer and sensor for bending and vibration strain gauges, data analysis algorithms, wireless sensors for crack detections sensor networks in paper [3, 11, 14, 22]
4.
Self-powering system using piezoelectric plates for the generation of streetlights
Solar-based, salt-based piezoelectric voltage sensors for energy-efficient purposes are discussed in [12]
5.
Alert notification (for informing drivers via GSM module)
SMS, emails, push notifications
6.
Actuator control using Arduino UNO
Arduino, Raspberry Pi
31 A Smart Solution for Safe and Efficient Traffic Flow for Bridge
463
4 Conclusion In this study, there was a thorough understanding of existing systems and the applications in IoT for smart bridges. The integration of sensors and the choice of the sensors for the maintenance, safety, and traffic management on the bridge was understood through the previous study and existing systems. The smart IoT bridge enhances the self-powering aspect through piezoelectric plates. The IR sensors are used to control the vehicles by counting and allowing only a certain number of vehicles when the bridge’s structure is compromised. The lifting of bridges is also one of the safety measures updated in this system. If there is any problem in the bridge’s health, it is detected by flex and vibration sensor and via the GSM module the drivers are notified. The data from sensors is uploaded to the cloud for future processing and analyzing and to make inferences about things like environmental conditions, traffic hours, and disaster approaching timings and its corresponding effects on the bridge. This also helps in predictive maintenance. The results of this study demonstrate the effectiveness and reliability of the proposed system, showcasing its potential to preemptively address issues and significantly reduce downtime for maintenance. Moreover, the predictive maintenance notifications can play a pivotal role in optimizing the bridge’s lifespan and minimizing long-term repair costs. As per the user experience, the indication of the SMS alert via GSM gives beforehand alert if there is any chance of incoming disaster or structural health of the bridge, such applications can be useful in flood-prone areas. The data collected from the sensors can be used to draw inferences. The counters situated on the bridge can be used to control traffic by curbing the number of vehicles passing through. As the system is comprehensive and shows results there are always many areas and aspects where there can be enhancement. The future scope can be maintained to make the system more robust and can be used real time. The sustainability and durability of the working of bridges for metropolitan cities can be enhanced. The IoT smart bridge has a wide scope of innovation and when certain standardizations and protocols are laid out regarding data privacy and processing, well-defined strategies for the maintenance of structural health and algorithms or approaches for traffic management on the bridge would be revolutionary in the field of IoT in transportation. Acknowledgements We express our gratitude for supporting and encouraging us in fostering solutions for the benefit of society with innovative ideas and technologies to Vishwakarma Institute of Technology, Pune, and our head of the department. We aim to keep finding effective solutions for the development of our society.
464
K. Ingale et al.
References 1. Ding K, Shi H, Hui J, Liu Y, Zhu B, Zhang F, Cao W (2018) Smart steel bridge construction enabled by bim and internet of things in industry 4.0: a framework. In: 2018 IEEE 15th International Conference on Networking, Sensing, and Control (ICNSC) 2. Iasha F, Darwito PA (2020) Design of algorithm control for monitoring system and control bridge based Internet of Things (IoT). In: 2020 International Conference on Smart Technology and Applications (ICoSTA) 3. Li Y, Zhao W (2011) Intelligent bridge monitoring system based on 3G. In: 2011 international conference on Consumer Electronics, Communications and Networks (CECNet) 4. Ahmed SS et al (2020) Smart bridge: monitoring and early warning system using IoT 5. Mehta NR et al (2018) Smart bridge monitoring system using IoT and cloud computing 6. Lee J-L, Tyan Y-Y, Wen M-H, Wu Y-W (2017) Development of an IoT-based bridge safety monitoring system. In: Meen, Prior, Lam (eds) Proceedings of the 2017 IEEE international conference on applied system innovation IEEE-ICASI 2017 7. Chaudhuri D, Samal A (2008) An automatic bridge detection technique for multispectral images. IEEE Trans Geoscience Remote Sens 46(9):2720–2727 8. Kumar R et al (2018) Smart bridge monitoring and control system based on IoT for bridge condition management. https://ieeexplore.ieee.org/document/8494679 9. Kumar R et al (2018) IoT-based intelligent bridge control and monitoring system for bridge condition management. https://ieeexplore.ieee.org/document/8494679 10. Singh K et al (2019) IoT-based smart bridge monitoring system: a review 11. Zhang J et al (2021) IoT-based bridge condition monitoring and management system. https:// ieeexplore.ieee.org/documents/9426485 12. Kaur A, Saini SS, Singh L, Sharma A, Sidhu E (2016) Efficient Arduino UNO driven smart highway/bridge/tunnel lighting system employing Rochelle piezoelectric sensor. In: 2016 International Conference on Control, Computing, Communication and Materials (ICCCCM) 13. Chandrashekhara K, Watkins SE, Nanni A, Kumar P (2004) Design and technologies for a smart composite bridge. In: 2004 IEEE intelligent transportation systems conference WeBl.l Washington, DC, USA, 36 Oct 2004 14. Al-Radaideh A, Al-Ali AR, Bheiry S, Alawnah S (2015) A wireless sensor network monitoring system for highway bridges. In: 1st International Conference on Electrical and Information Technologies ICEIT’2015 15. Sazonov E, Li H, Curry D, Pillay P (2009) Self-powered sensors for monitoring of highway bridges. IEEE Sens J 9(11) 16. Li H (2015) The cause analysis on the highway bridge expansion joints and the maintenance of construction management. In: 2018 3rd International Conference on Smart City and Systems Engineering (ICSCSE) 17. Darshan B, Shashank MK, Srihari K, Srinidhi K, Reddy CV (2020) Smart bridge. Int Res J Eng Technol (IRJET) 07(04) 18. Hirlekar V, Doshi P, Shetty V, Tungare A (2020) Survey on IoT-based bridge health monitoring systems. Int J Res Eng Sci Manage 3(4) 19. IoT based bridge health smart monitoring system. Int J Eng Res Technol (IJERT) ISSN: 22780181. Published by, www.ijert.org IETE—2020 conference proceedings 20. Gore S, Angal Y (2022) IOT based bridge safety monitoring system using ARM cortex © 2022 IJCRT, vol 10, 7 July 2022. ISSN: 2320-2882 21. Shwetha AN, Prabodh CP (2019) IOT based android application for smart bridge monitoring system. Int J Recent Technol Eng (IJRTE) 8(2S4). ISSN: 2277-3878 22. Sawo F, Kempkens E (2016) Model-based approaches for sensor data monitoring for smart bridges. In: 2016 IEEE international conference on Multisensor Fusion and Integration for Intelligent Systems (MFI), Kongresshaus Baden-Baden, Germany, 19–21 Sept 2016
Chapter 32
Recognition of Aircraft Maneuvers Using Inertial Data Margarita Belousova , Stepan Lemak , and Ilya Kudryashov
1 Introduction The problem of aircraft maneuvers recognition has a large number of applications, such as determining aircraft control anomalies, predicting the movement trajectory. Engineers use it in the construction of simulators for teaching aircraft control, etc. The report consider the problem of recognition the maneuvers of light aircraft using data collected by inertial sensors. In this paper comparison results of the classification quality for different types of classifiers are presented. Aircraft maneuver classifier supposed to be used in simulators for training aircraft pilots. It is important to create conditions for the pilot that are as close to real as possible. For this purpose simulators often include motion cueing systems. Such systems can be based on industrial robot-manipulators with pilots seat installed on its end effector [1]. Manipulator must move in such a way that the accelerations acting on person at the end effector are equal in magnitude or/and direction to those that would act on him during a real flight [2, 3]. High quality motion cueing algorithms often rely on the nature of the imitated motion (for example, like it was done in [2]). So for this purpose it is necessary to obtain a real flight maneuver recordings with appropriate markup. The challenge is to obtain the necessary data without complex steps of embedding into aircraft systems. It is also important that the data acquisition equipment does not interfere with the pilot during the flight. Therefore, to solve this problem, the smartphone was used to collect the data. A problem of flight maneuver identification was solved in [4] using machine learning methods. Six types of maneuvers were identified: bleed off turn; wind up turn; Supported by the Ministry of Science and Higher Education of the Russian Federation within the the Program of “Supersonic” (agreement N 075-15-2020-923). M. Belousova (B) · S. Lemak · I. Kudryashov Lomonosov Moscow State University, Moscow 119991, Russia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 S. Lanka et al. (eds.), Trends in Sustainable Computing and Machine Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-9436-6_32
465
466
M. Belousova et al.
slow down turn; steady turning sideslip; a full turn of 360.◦ around the longitudinal axis of the aircraft in any direction (360.◦ roll); barrel roll. Records of 121 aircraft flights with total of 7560 maneuvers were used for the work. Validation of the result was carried out on data from 24 flights with a total number of maneuvers of 1512. The representation of 210 features used in this work consisted mainly of statistical parameters. Binary and multiclass classifiers were considered. Among binary classifiers, support vector machines (SVM) often performed best, artificial neural network (ANN) was recognized as the best multiclass classifier. In [5] a flight maneuver recognition method based on multi-strategy affine canonical time warping (MACTW) was presented. For this method improved successhistory-based adaptive differential evolution with linear population size reduction (LSHADE), multi-strategy LSHADE (combination of LSHADE and CMA-ES algorithms), canonical time warping, which uses a dynamic time warping method and canonical correlation analysis (CCA), were used. To carry out this work the authors used the 20 datasets from University of California, Riverside (UCR) time series database [6]. Authors of [7] present a variant of using ensemble classifiers to detect aircraft malfunctions. The article proposes a method in which the discrepancy between the maneuver actually performed and the maneuver predicted by the classifier is a strong indicator of the malfunction presence. Working dataset was created with data collected in a controlled test environment. An article [8] was devoted to a problem of separating the plausible flight trajectories from unfeasible ones using the data collected from pilots practicing in flight simulators. The dataset used by authors hosted at [9] and provides thousands of trajectories, descriptions of maneuvers, and examples of maneuvers conducted by experienced pilots. The problem of the aircraft maneuver trajectory predicting using the machine learning methods has been stated a long ago [10] and remains relevant to our days. In [11] the solution of the predicting aircraft flight path problem using deep learning models was presented. Flight path data is used for one-step-ahead prediction using deep feedforward neural network (FFNN) and for long-term forecasting using long short-term memory (LSTM) neural networks. Then results are combined to create a single prediction: after quantifying the discrepancy between two predictions, the FFNN prediction is used to correct the LSTM prediction of flight trajectory along subsequent time instants accordingly. This approach is used for multiple flights to assess safety based on horizontal and vertical separation distance between flights. Many researchers are engaged in the similar problems of determining maneuvers and its anomalies for car motion. For example, [12] describes building an autoencoder for a space-time graph to detect anomalies in driving behavior and building a synthetic dataset of normal and abnormal maneuvers using a driving simulator. Autoencoders are also can be used for maneuver recognition [13]. The authors of [14] describe the construction of a car maneuver detection system based on deep learning classifiers using smartphone inertial sensor data collected while driving. The article presents a framework for classifying and clustering maneuvers. There were selected three deep learning classifiers, each of which showed good results in recognizing the considered
32 Recognition of Aircraft Maneuvers Using Inertial Data
467
set of events. Their combination into an optimal set of classifiers was investigated. This approach was tested on a real dataset and showed good detection rate for 7 maneuvers with a balanced accuracy of .0.90 and an average F1-score of .0.71. The article [15] presents the process of extracting various maneuvers from flight without using deep learning methods, based on the algorithm described by the authors. This method allows you to extract 28 different types of maneuvers, but involves the selection process only in post-processing and the presence of a fairly large amount of different types data. Such a large amount of data also used in [16], which presents the result of k-means and k-medoids clustering of a large number of civil aircraft flight data for various maneuvers. Four main clusters were identified, corresponding to takeoff, landing, straight flight, and turns. An attempt to identify more clusters was unsuccessful. Clustering methods for automatic flight maneuver detection are also used in [17]. The paper [18] provides a brief overview of aircraft tracking and position prediction systems using machine learning methods for their subsequent use in airspace management systems. The authors of paper [19] describe the use of LSTM networks to determine maneuver types. The network input is time series of flight data, pre-interpolated if necessary. The presented network is trained to determine 13 main types of maneuvers. A description of the possible use of automatic systems for determining and evaluating maneuvers in pilot training is described in [20]. The researchers presented a description of the software that determines the maneuvers performed on simulator using the density-based spatial clustering of applications with noise (DBSCAN) clustering algorithm. In [21] a review of flight maneuver recognition studies carried out since the 1980s was presented. All considered researches devoted to using deep learning for flight maneuver recognition were divided into vision-based and sensor-based studies. Authors demonstrated the differences in approaches to solving problems of flight maneuver recognition and human action recognition. Flight data can be used not only for recognition maneuvers but for aircraft type determining. In [22] a novel classification model based on machine learning for aircraft type recognition with the use of some motion features extracted from aircraft flight track information was proposed. However, almost all articles describe the use of a fairly large amount of various types of flight data for identification, including the position, orientation of the aircraft, speeds, and accelerations. Sometimes aircraft parameters are used as features. The goal of this work is to create a classifier for recognition the type of maneuver being performed using data that can be collected without integrating into the aircraft control systems and interfering with the pilot. Often for these purposes, including in the works described above, a large number of expensive special sensors are used, which is impossible in this case. The automatic markup system proposed in this work uses data that can be recorded using only inertial sensors. To carry out manual data markup, a video recording of the aircraft’s external environment is required. Collecting this kind of data is not very difficult and does not require special permits or conditions for installing sensors in
468
M. Belousova et al.
the aircraft cockpit. But at the same time, the collected data can be used to create algorithms for motion cueing of an aircraft flight and its subsequent use in training systems.
2 Method To carry out this work, data from 24 flights of aircraft, with a total duration of 12 h, performed on An2, Zlin142, Piper22, Vilga aircrafts was used. The data was recorded at 50 Hz frequency by a smartphone using the Mobile AR (augmented reality) sensor logger software [23]. This software allows to synchronously record the video stream and data from the inertial sensors of smartphone. The split into training and test datasets was carried out by flight recordings. Also timestamp data was removed to avoid the data leakage. From the beginning, the data was divided by maneuvers. Then a binary classifier that should determine whether our data patch is in train or test dataset was created. The performance achieved was about 80%, so it was decided that there is a data leakage and the data should be divided by flight recordings. To create a test dataset, the recordings of 4 flights conducted on various aircraft, containing maneuvers of all distinguished classes, was used. In the initial manual marking of flight data, the following classes are distinguished: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14.
smartphone is in hand; aircraft engine is off; engine is on, aircraft doesn’t move; aircraft moves on the ground; takeoff; cruise; landing; turn left with an inclination angle of 15.◦ ; turn right with an inclination angle of 15.◦ ; turn left with an inclination angle of 30.◦ ; turn right with an inclination angle of 30.◦ ; turn left with an inclination angle of 45.◦ ; turn right with an inclination angle of 45.◦ ; aerobatics.
The created dataset was unbalanced. Also there were very similar maneuvers. So it was decided to unite some classes. Starts and ends of recordings were cropped to remove “smartphone is in hand” class samples. Thus the following classes are presented in the dataset: 1. 2. 3. 4.
aircraft engine is off; engine is on, aircraft doesn’t move; aircraft moves on the ground; takeoff; cruise; landing;
32 Recognition of Aircraft Maneuvers Using Inertial Data
469
5. turn left; 6. turn right; 7. aerobatics. During the flight the video and inertial measurements were collected. The values of the angular velocity and acceleration vectors, as well as their projections on associated with the aircraft trihedron axes, were used to obtain input features for classifiers. The corresponding projections were obtained by transforming the vector projections on the smartphone sensitivity axes. It was assumed that the device was always installed so that the camera was directed along the longitudinal axis. And the vertical direction was restored based on the values of the acceleration during the period of the smallest change in its value, which corresponds to the aircraft standing still on the ground. These time series were filtered with using a third order Butterworth filter with a cutoff frequency of 2 Hz. The filter parameters were selected manually. It was not obvious that the classifier that will be able to distinguish maneuvers only using inertial data can be constructed. So the Dynamic Time Warping method [24] was used to calculate the proximity of different segments within each class. It was confirmed that maneuvers are easy to distinguish from each other. To make the training set maneuvers were parted to a 4 pieces, first and last parts removed. Thus models were fitted with using only the central parts of maneuvers. Manual markup for maneuver boundaries is inaccurate. Even the pilots conducted these maneuvers can’t always say at what moment one maneuver gave way to another. Data was transformed into patches using the sliding window method. One patch includes the data collected over the 10 s of flight. Patches of 1, 2, and 5 s of flight were also considered. In this case the obtained results were only slightly better than the random class prediction. And increasing the time interval from 5 to 10 s improved results. It can be explained by the fact that flight parameters does not change much during the few seconds of flight. As for “aerobatics” class recordings, there are short periods of time where the flight is straight, so samples included such data can be completely classified as “cruise”. Further actions with the data depend on the type of used classifier. The following options were tried: using the entire time series for classification, taking a set of statistic values for the flight interval. A set of statistic values was formed by taking for each time series the average value, variance, minimum, maximum, median, a total of 40 features. Time series were fed to the input of all classifiers considered in the work, as well as a classifier based on the ResNet18 (Residual Network. ResNet18 is a convolutional neural network that is 18 layers deep) architecture. Statistic values were used for classifiers based on classical machine learning methods, as well as fully connected neural networks and convolutional networks with a small number of layers. The accuracy of classification using time series did not exceed the accuracy obtained on a set of statistic values, but it required greater resource consumption, so it is not considered further.
470
M. Belousova et al.
The data was standardized using the robust standardization method. Identifying the most significant features was unsuccessful, since removing of any feature led to a tangible deterioration in the result, all the features were used. The test set was formed without preliminary division into maneuvers, the patch was assigned to a class index from its middle part. For each time point, the prediction is formed by taking the most frequently occurring class across all patches that include it. The ratios of the number of different class patches for the training and test datasets are presented in Fig. 1. Thus the training dataset includes 75,783 patches, the test dataset includes 52,252 samples. In this work the records of flights of aircraft with similar maneuverability characteristics were used. To classify the maneuvers of aircraft with different parameters, it would probably be necessary to introduce additional characteristics corresponding to either of aircraft type or some of its characteristics. But since there was insufficient data of other types, such work was not carried out. To characterize the obtained result quality, the following metrics were considered: 1. 2. 3. 4.
accuracy; weighted average of precision coefficients; weighted average of F1-scores; weighted average of Intersection over Union (IoU) metrics (Jaccard index), calculated for every class; 5. minimum of IoU metrics.
For comparison, the following types of classifiers are taken: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
Decision Tree (DT); Random Forest (RF); Histogram-Based Gradient Boosting (HBGB); Quadratic Discriminant Analysis (QDA); K-Nearest Neighbors algorithm (k-NN); Support Vector Machine with linear kernel type (SVM); Multi-layer Perceptron classifier (MLP); Feedforward Neural Network (FNN); Convolutional Neural Network (CNN); Neural Forest (NF) [25].
From the beginning, different types of classifiers were considered, including neural networks with rather complex architectures, for example, Residual Network (ResNet) and Long Short-Term Memory (LSTM) networks. However, there was insufficient data to obtain an acceptable result using such networks. Therefore, we mainly considered classifiers based on classical machine learning algorithms. Some of the initially considered classifiers, whose performance and classification quality were worse than others, are not presented in this work. It was also suggested that the classification by a set of statistical characteristics of time series using classical machine learning algorithms will give results no worse than classification by the original time series. And this assumption was confirmed.
32 Recognition of Aircraft Maneuvers Using Inertial Data
471
Fig. 1 Number of samples for train (left) and test (right) datasets
To carry out this study, a Python code was written. Scikit-learn library [26], SciPy library [27], pandas library [28], PyTorch framework [29] were used. Hyperparameters for used classifiers were chosen with optuna [30].
3 Results and Discussion To summarize the discussion about the preliminary processing of data used to build classifiers. The training and test datasets were formed as follows: during the flight, video, time stamp, and inertial data (projections of angular velocity and linear acceleration vectors on the sensitivity axes of the smartphone’s inertial sensor) were synchronously recorded. The video is further used only for manual data markup. For angular velocity and acceleration time series, the Euclidean norms of vectors are calculated. From the obtained tabular data, patches were formed. These patches included inertial data recorded during 10 s of flight. The timestamps were then removed. For each patch, the minimum, maximum, average value, variance, and median of all components of the angular velocity and linear acceleration vectors and the magnitudes of these vectors were calculated. In total, for each patch a vector of length 40 is obtained. These vectors are fed to the considered classifiers inputs. The flow diagram of this process is shown in Fig. 2. In the Table 1 the values of classification quality metrics calculated on 4 flights of the test dataset are shown. The best results are obtained using gradient boosting based on histograms, convolutional or feedforward neural networks with a small number (in this case 4) layers. Most often, classifiers make mistakes when determining the class “aerobatics”. This is partly due to the fact that this is the most rarely encountered class in the dataset. There is also the problem that, for example, during a turn with an inclination angle of .15◦ , the values of angles, velocities and accelerations can be similar to the values obtained in straight flight with small swings. Almost all classifiers do their best job
472
M. Belousova et al. Recording Labeled Flight Recordings
Timestamps Vectors of angular velocity
Add Euclidian norm for vectors
Vectors of linear acceleration Norm of angular velocity vectors
Train or test sample?
Test
Norm of linear acceleration vectors Label
Train
Part maneuver record to 4 pieces, remove first and last parts
Samples (Matrices of order 500 x 8) Vectors of angular velocity
Get samples, corresponding to 10 seconds of flight
Vectors of linear acceleration Norm of angular velocity vectors
Remove timestamps
Norm of linear acceleration vectors Samples (Matrices of order 5 x 8)
Get the minimum, maximum, mean, variance and median values for all of 8 components
Stat. values for vectors of angular velocity Stat. values for vectors of linear acceleration Stat. values for norm of angular velocity vectors
Flatten the sample matrices
Apply robust scaler
Train or test sample?
Train
Stat. values for norm of linear acceleration vectors Test
Classifier
Train the classifier Recognition result
Fig. 2 Flow diagram of data preparation and classification process
at identifying moments of standing on the ground with the engine turned off, which is quite expected. It can be seen that most of the discrepancies between the predicted class and manual markings were recorded at the boundaries of the gaps corresponding to various maneuvers. And this is quite logical because the measured values change smoothly, and even pilots cannot clearly indicate at what moment one maneuver gave way to another. Also it can be assumed that in some places of the maneuver boundaries there are errors in manual markup.
32 Recognition of Aircraft Maneuvers Using Inertial Data
473
Table 1 Accuracy, weighted average of the precisions of each class, weighted average of the F1-scores, weighted average of the Jaccard scores and minimum of Jaccard scores of each class given by each classifier using 40 features Classifier Acc Pres f1 WIoU MIoU Class with min Train IoU time (s) DT
0.737
0.783
0.748
0.614
0.256
RF HBGB QDA k-NN
0.817 0.848 0.709 0.701
0.837 0.870 0.749 0.732
0.814 0.842 0.678 0.698
0.693 0.735 0.543 0.543
0.524 0.529 0.230 0.434
L SVM MLP
0.812 0.795
0.832 0.823
0.808 0.797
0.683 0.668
0.474 0.473
FNN CNN NF
0.849 0.827 0.803
0.862 0.844 0.820
0.850 0.820 0.798
0.744 0.702 0.670
0.473 0.397 0.263
Engine is on, aircraft doesn’t move Aerobatics Aerobatics Turn right Takeoff, cruise, landing Aerobatics Engine is on, aircraft doesn’t move Aerobatics Aerobatics Aerobatics
2.75
3.95 5.57 0.07 0.02 18.49 5.89
123.80 16.72 176.49
If the transition patches between maneuvers are removed and only the central parts of each maneuver will be presented in the test dataset, then the best result is obtained using HBGB, the classification accuracy of which exceeds 90%. For subsequent automatic data markup a prediction for every time moment was made as follows: 1. a set of patches from every flight record was made; 2. for each time moment, a set of predictions is taken for all patches that contain its data; 3. a final prediction for every time moment is determined by majority voting using the set of predictions from previous step. Figure 3 shows the prediction result and initial class values for all flights of the test dataset using histogram-based gradient boosting.
4 Conclusion The article presents a description of the solution to the problem of classifying light aircraft maneuvers using machine learning methods. For classification vectors consisting of the minimum, maximum, mean, dispersion, and median calculated for components and Euclidean norms of angular velocity and linear acceleration vectors are used.
474
M. Belousova et al.
Fig. 3 Results of classification using HBGB classifier for 4 flights of the test dataset
In this study data from 24 flights, performed on An2, Zlin142, Piper22, Vilga was used. These aircrafts have similar maneuverability characteristics. To classify the maneuvers of aircraft with different parameters, it would be necessary to introduce additional aircraft characteristics. It was found that the performing classification by recording a short period of flight time (1–2 s) will be unsatisfactory. In this work, samples are generated for classifiers over a time period of 10 s, which gives a good quality of classification which exceeds 90%. In this paper a comparison of classification quality for 10 types of classifiers is presented. The best results are obtained using gradient boosting based on histograms (HBGB), feedforward (FNN), or convolutional neural networks (CNN). Most often, classifiers give a prediction that does not correspond to manual markups at the boundaries of maneuvers, as well as when determining “aerobat-
32 Recognition of Aircraft Maneuvers Using Inertial Data
475
ics”, the most rare type of maneuvers. Almost perfectly classifiers determine the moments of standing motionless with the engine turned off. Using the constructed classifier, it is possible to determine the maneuvers being performed on an aircraft or its mathematical model implemented in a simulator in real time. The results of this work are intended to be used for aircraft control simulators to construct high-quality motion cueing algorithms that use the characteristics of the movement performed by the pilot.
References 1. Giordano P, Masone C, Tesch J, Breidt M, Pollini L, Bülthoff H (2010) A novel framework for closed-loop robotic motion simulation—part ii: motion cueing design and experimental validation, pp 3896–3903. https://doi.org/10.1109/ROBOT.2010.5509945 2. Alexandrov VV, Lemak SS (2021) Algorithms of dynamic piloted flight simulator stand based on a centrifuge with a controlled cardan suspension. J Math Sci 253(6):768–777. https://doi. org/10.1007/s10958-021-05268-8 3. Kabanyachyi V, Hrytsan S (2022) Problem of motion cueing along linear degrees of freedom on flight simulators. In: Mechanics of gyroscopic systems, pp 108–116. https://doi.org/10.20535/ 0203-3771422021268892 4. Bodin C (2020) Automatic flight maneuver identification using machine learning methods 5. Wei Z, Ding D, Zhou H, Zhang Z, Xie L, Wang L (2020) A flight maneuver recognition method based on multi-strategy affine canonical time warping. Appl Soft Comput 95:106527. https:// doi.org/10.1016/j.asoc.2020.106527 6. Chen Y, Keogh E, Hu B, Begum N, Bagnall A, Mueen A, Batista G (2015) The UCR time series classification archive. www.cs.ucr.edu/~eamonn/time_series_data/ 7. Oza NC, Tumer K, Huff EM (2003) Classification of aircraft maneuvers for fault detection. In: 4th international workshop multiple classifier systems, MCS, Guilford, UK, p 10. https://doi. org/10.1007/3-540-44938-8_38 8. Samuel K, Gadepally V, Jacobs D, Jones M, McAlpin K, Palko K, Paulk B, Samsi S, Siu HC, Yee C, Kepner J (2021) Maneuver identification challenge. CoRR. https://arxiv.org/abs/2108. 11503 9. Maneuver Identification Challenge. https://maneuver-id.mit.edu. Last accessed 21 Sept 2023 10. Rodin EY, Massoud Amin S (1992) Maneuver prediction in air combat via artificial neural networks. Comput Math Appl 24(3):95–112 11. Zhang X, Mahadevan S (2020) Bayesian neural networks for flight trajectory prediction and safety assessment. Decis Support Syst 131(113246):17. https://doi.org/10.1016/j.dss.2020. 113246 12. Wiederer J, Bouazizi A, Troina M, Kressel U, Belagiannis V (2021) Anomaly detection in multi-agent trajectories for automated driving. In: 5th Conference on Robot Learning (CoRL 2021) 13. Tian W, Zhang H, Li H, Xiong Y (2022) Flight maneuver intelligent recognition based on deep variational autoencoder network. EURASIP J Adv Signal Process. https://doi.org/10.1186/ s13634-022-00850-x 14. Ramah SE, Bouhoute A, Boubouh K, Berrada I (2021) One step further towards real-time driving maneuver recognition using phone sensors. IEEE Trans Intell Transp Syst 22(10):13 15. Wang Y, Dong J, Liu X, Zhang L (2015) Identification and standardization of maneuvers based upon operational flight data. Chinese J Aeronautics 28(1):133–140. https://doi.org/10.1016/j. cja.2014.12.026 16. Blanks Z, Sedgwick A, Bone B, Mayerchak A (2017) Identification of flight maneuvers and aircraft types utilizing unsupervised learning with big data. In: 2017 Systems and Information
476
17.
18.
19.
20.
21. 22.
23. 24. 25. 26. 27. 28. 29. 30.
M. Belousova et al. Engineering Design Symposium (SIEDS), pp 180–185. https://doi.org/10.1109/SIEDS.2017. 7937712 Lu J, Chai H, Jia R (2022) A general framework for flight maneuvers automatic recognition. Mathematics 10(7). https://doi.org/10.3390/math10071196, https://www.mdpi.com/ 2227-7390/10/7/1196 Emir çakıcı M, Okay FY, Özdemir S (2021) Real-time aircraft tracking system: a survey and a deep learning based model. In: 2021 International Symposium on Networks, Computers and Communications (ISNCC), pp 1–6. https://doi.org/10.1109/ISNCC52172.2021.9615681 HanYang F, Hongming F, RuiYuan G (2020) Research on air target maneuver recognition based on LSTM network. In: 2020 International Workshop on Electronic Communication and Artificial Intelligence (IWECAI), pp 6–10. https://doi.org/10.1109/IWECAI50956.2020. 00009 Socha V, Hanakova L, Socha L, Van den Bergh S, Lalis A, Kraus J (2018) Automatic detection of flight maneuvers with the use of density-based clustering algorithm. In: 2018 XIII international scientific conference—New Trends in Aviation Development (NTAD), pp 132–136. https://doi. org/10.1109/NTAD.2018.8551680 Lu J, Pan L, Deng J, Chai H, Ren Z, Shi Y (2023) Deep learning for flight maneuver recognition: a survey. Electronic Res Arch 31(1):75–102 Huang H, Huang J, Feng Y, Liu Z, Wang T, Chen L, Zhou Y (2018) Aircraft type recognition based on target track. J Phys Conf Ser 1061(1):012015. https://doi.org/10.1088/1742-6596/ 1061/1/012015 Mobile AR Sensor Logger (2022). https://www.cs.ucr.edu/~eamonn/time_series_data/. Last accessed 21 Sept 2023 Toni G (2009) Computing and visualizing dynamic time warping alignments in R: the dtw package. J Stat Softw 31. https://doi.org/10.18637/jss.v031.i07 Frosst N, Hinton G (2017) Distilling a neural network into a soft decision tree. https://doi.org/ 10.48550/ARXIV.1711.09784, https://arxiv.org/abs/1711.09784 Scikit-learn library. https://scikit-learn.org/stable/. Last accessed 21 Sept 2023 Scipy library. https://scipy.org/. Last accessed 21 Sept 2023 Pandas—python data analysis library. https://pandas.pydata.org/. Last accessed 21 Sept 2023 Pytorch framework. https://pytorch.org/. Last accessed 21 Sept 2023 Optuna—a hyperparameter optimization framework. https://optuna.org/. Last accessed 21 Sept 2023
Chapter 33
A Deep Learning Framework for Sleep Apnea Detection A. Sathiya, A. Sridevi, and K. G. Dharani
1 Introduction According to the American Academy of Sleep Medicine (AASM), brief pauses in breathing are a defining feature of sleep apnea. The presence and severity of apnea can be best determined by looking at the number of apnea occurrences per hour of sleep, as measured by the Apnea Hypopnea Index (AHI). It is estimated that 200 million people around the world are afflicted. Fewer than 2% of adult women and over 4% of adult men have this disorder. Most of the estimated 93 percent of middle-aged women and 82 percent of middle-aged men who have moderate to severe sleep apnea have never been identified. Three percent of preschool-aged children had sleep apnea, according to a study published in Paediatrics. Hypertension, excessive cholesterol, a lowered immune system, and an increased chance of developing type 2 diabetes are among the conditions that have been linked to sleep apnea. Drowsiness brought on by lack of sleep is a potential cause of vehicular accidents. When it comes to diagnosing sleep apnea, nothing beats a full night of polysomnography (PSG) in a sleep lab. Electroencephalogram, electrooculogram, electromyogram, and electrocardiogram (ECG) are just some of the physiological signals recorded in PSG, which A. Sathiya (B) Research Scholar, Department of ECE, M.Kumarasamy College of Engineering, Karur, Tamil Nadu, India e-mail: [email protected] A. Sridevi Professor, Department of ECE, M.Kumarasamy College of Engineering, Karur, Tamil Nadu, India e-mail: [email protected] K. G. Dharani Department of ECE, Akshaya College of Engineering & Technology, Coimbatore, Tamil Nadu, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 S. Lanka et al. (eds.), Trends in Sustainable Computing and Machine Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-9436-6_33
477
478
A. Sathiya et al.
total at least eleven channels. However, it is costly and inaccessible to many people throughout the world; this is because of the numerous cables and sensors that must be attached to the subject’s body. It takes a lot of time and effort to conduct an analysis. As a result, it often makes mistakes. Long wait times occur because of the shortage of sleep apnea specialists in medical facilities. By bolstering the validity of data derived from single-channel ECG recordings alone about the incidence of sleep apnea, this work aims to contribute to the creation of a simplified diagnostic strategy needing fewer physiological markers for the detection of OSA. This study aids in the advancement of an automated method for diagnosing OSA by analyzing data from a single-lead ECG with a deep learning framework and a convolutional neural network (CNN). Some of the typical ANN-like layers that CNNs use are convolutional layers, subsampling layers, and fully linked layers. CNNs are simpler to train due to their smaller number of total parameters compared with fully connected neural networks with the same number of hidden units. To its fullest extent, CNN’s architecture makes use of the input signals’ two dimensions. Features that are insensitive to translation can be obtained by using locally linked nodes with attached weights and then pooling them. The proposed method is computationally intensive, but it regularly generates results that are 9 percentage points better than anything else in the literature.
2 Literature Survey High blood pressure and heart difficulties are only two of the major health issues that can result from leaving sleep apnea, a sleep disorder, untreated. Snoring and daytime fatigue despite getting a full night’s sleep are symptoms of sleep apnea, which occurs when breathing temporarily ceases during sleep. People of any age or physical composition might suffer from sleep apnea. Sleep apnea can be categorized as either obstructive, central, or mixed. Polysomnography (PSG) is used to identify this disorder in sleep labs in several nations. This technique for illness detection is more expensive and takes longer to implement. Many studies employ the use of electromyographic, electrocardiographic, and electroencephalographic signals to diagnose sleep apnea. Due to its ability to record information about sleep respiratory disruptions, especially frequent arousal, EEG analysis has lately gained appeal as a method for detecting sleep apnea. Deep learning (DL) network methods for identifying sleep apnea from an EEG signal are currently underutilized. To identify apnea episodes from EEG time–frequency pictures, for instance, a convolutional network is employed. Many sleep apnea classification methods have been published in recent years. To identify the segments of a patient’s time series that are important for sleep apnea detection, [1] presented a deep learning method based on a convolutional neural network. In this article, the accuracy and signal degradation rates of many different approaches are compared for identifying sleep apnea using the SPO2 signal. Subject
33 A Deep Learning Framework for Sleep Apnea Detection
479
validation yields an overall accuracy of 91.3085 throughout the entire sample, with the testing distribution outperforming the training distribution. To detect SA, [2] suggested a TDCS classifier and an unsupervised feature extraction technique. The FSSAE allows for the unsupervised learning of a richer feature set. Including a time component helped this system’s HMM classifier perform better. The classifier bias problem, which is ignored by certain conventional SA detection approaches, was taken into account, and a cost-sensitive algorithm was examined as a potential solution. The suggested unsupervised features extraction approach and the time-dependency cost-sensitive classifier were shown to be effective in experiments for SA detection. However, some restrictions must be removed before this approach may reach its full potential. Weighted-loss time-dependent (WLTD) classification model and a multiscale dilation attention one-dimensional convolutional neural network (MSDA-1DCNN) were presented by [3] as a means of detecting OSA. Dilated convolution was implemented, and the parameter-performance trade-off was significantly reduced. After fusing the multiscale characteristics, attention mechanism technology-adjusted the feature weights under the most relevant channels. Weighted cross-entropy loss function combined with hidden Markov model enhanced classification accuracy in the final stage of the network’s classification process by addressing the issue of data imbalance. The proposed method achieves an accuracy of 100% when identifying individuals, along with an accuracy of 89.4%, 89.8%, and 89.1% when identifying segments. The findings confirmed the feasibility of the proposed approach for detecting sleep apnea. In [4], the self-attention technique is applied to the development of a hybrid transformer model for OSA detection using a single-lead ECG. A new strategy for building raw inputs is offered as a means of cutting down on the quantity of additional data that must be provided. Input data from an electrocardiogram (ECG) typically consists of sequences depicting the R-peak amplitude, RR interval, and RR interval first-order difference. To automatically focus on what’s most crucial among the four input signals and extract the resulting fused multiperspective features, an MPCA block is constructed. Using a self-attention strategy, the most important details were encoded when transformer blocks provided them with the characteristics and their position encodings. The linear layer did the math and sent down the diagnosis. The proposed method was put to the test on the apnea-ECG database. Within the context of IoMT, [5] describes a novel OSA detection technique that makes use of hand-crafted features and a parallel heterogeneous deep learning model. This methodology takes an ECG-based approach to diagnosing OSA by using shortterm HRV indicators. Human reverberation duration (HRV) signals and linear and nonlinear HRV properties are initially utilized to combine a 1-D sequence. Simultaneously, a spectral representation of HRV throughout time is produced. Separate nodes encode the 1-D data sequences and the 2-D images in the deep learning network proposed for OSA diagnosis. Authors in [6] proposed a set of features for measuring the amplitude and direction of tracheal motion during breathing. To estimate the AHI and detect respiratory episodes, a unique method is created based on a deep learning classifier.
480
A. Sathiya et al.
This suggested accelerometer- and deep learning-based technology can be incorporated into a practical, portable, and trustworthy wearable device for at-home monitoring of sleep apnea. The technology may help with the issue of sleep apnea being underdiagnosed. In [7], the authors detail the development of SleepFCN, a fully convolutional neural network capable of dividing sleep into five stages from a single EEG recording. Multiscale feature extraction (MSFE) and residual dilated causal convolutions (ResDC) are the two main building blocks of SleepFCN, and they are responsible for feature extraction and temporal sequence encoding, respectively. After that, convolutional layers with 1-sized kernels are used in place of dense layers to complete the fully convolutional neural network. The Sleep-EDF and SHHS datasets are used to evaluate SleepFCN’s effectiveness. A novel architecture called AttnSleep was proposed in [8] to characterize sleep stages using only a single raw EEG channel. Adaptive feature recalibration and a multilayer perceptron are used to extract features from EEG data. Afterward, the multi-head attention technique is used to consult the temporal context encoder to encode the temporal connections between the retrieved data. Using the respiratory signal directly, a unique method for training LSTM networks is proposed in [9]. This method eliminates the requirement for time-consuming feature engineering. OSA, CSA, and hypopnea can all be diagnosed using this method. It is possible to train LSTM networks using longer sequences of respiratory signals, resulting in a more robust and accurate model for evaluating new cases, by first preprocessing the signals to extract respiratory information and then making optimal use of the data via the balanced bootstrapping technique. CNN-based models, which integrate the RNN model, were trained using singlelead electrocardiogram (ECG) data to accurately, inexpensively, and non-invasively identify obstructive sleep apnea in [10]. The proposed model outperforms the current best practices. On low-power devices, the procedure works just as well and is much quicker. The diagnostic system for sleep apnea that is proposed in [11] is portable, low-cost, and simple to operate. A photoplethysmography (PPG) optical sensor measures both the pulse wave signals and the blood oxygen saturation of a human body at the same time. Multiscale entropy and random forest methods are applied to the PPG signal for sleep apnea analysis and diagnosis. By analyzing both the PPG signal and the blood oxygen saturation signal, SAS can be computed by using the latter to eliminate confounding factors. Using a CNN to evaluate data from a single-channel ECG, [12] proposed a sleepmonitoring model for use in portable OSA monitor devices. The first convolution layer uses three different filter types to probe features of varying sizes. To learn longterm dependencies, like the OSA transition rules, long short-term memory (LSTM) is employed. To make a conclusive call, the softmax function is connected to the final fully connected layer. Partitioning raw ECG data with an overlapping sliding window of 10 s enables the search for a full OSA event. Following its training on the segmented raw signals, the proposed model’s event recognition performance is assessed.
33 A Deep Learning Framework for Sleep Apnea Detection
481
Using a 1-D deep convolutional neural network (CNN) model, [13] demonstrated a sleep apnea diagnosis method. The signals from a standard one-dimensional electrocardiogram (ECG) were used in this system. The proposed CNN model consists of a flattening layer, four highly similar classification layers built mostly from fully connected networks, and a softmax classification layer. Using a separate collection of ECG recordings, the effectiveness of the proposed CNN model for detecting apnea occurrences was verified. From the MIT PhysioNetApnea-ECG Database, 35 public and 35 private ECG records were used to train the suggested model. An automatic feature extraction method was developed by [14] as a means of integrating the CNN and the long-term recurrent network. Apnea episodes can also be differentiated from ordinary ones using the completely connected layers as a diagnostic tool. The apnea– hypopnea index, abbreviated as AHI, is then utilized to differentiate between those who have apnea and those who do not. To determine whether or not the suggested method is effective, experiments are performed using datasets like as the apnea-ECG and the UCDDB, both of which are accessible to the general public. In [15], the authors detailed a novel method for diagnosing OSA by converting signals from a single-ECG lead and running them through a composite deep convolutional neural network model. Transforming the signal into images of a heart rate variability (HRV) scalogram and a Gramian angular field (GAF) matrix allows for the time dimension of the ECG to be considered. The composite model consists of three separate components: a convolutional neural network with five voted-on residual blocks; two convolutional neural networks employing AlexNet and ResNet models; and some fine-tuning. Authors in [16] presented a method for OSA recognition using a convolutional neural network. One place to start is by determining the frequency cepstral coefficient (MFCC) using the Mel scale. The convolutional neural network then provides an estimate of the likelihood of developing OSA. Extensive tests were conducted on a reference dataset to empirically evaluate the effectiveness and robustness of the proposed method. Based on the data collected, it is clear that this approach not only holds its own against recently published systems but vastly outperforms related baselines. A technique for automatically detecting sleep apnea events in research studies was proposed in [17]. An accurate diagnosis of sleep apnea requires the use of a polysomnogram. It is not a bargain, it takes too long, and it stresses out the patients. Signals that can be conveniently gathered by a smart shirt containing sensors and a fingertip pulse oximeter should be prioritized. Thus, the expense of polysomnography can be reduced by making to do with less expensive yet appropriate equipment. Therefore, the study’s scientific value resides in the fact that it may help standardize the methods used by other sleep specialists. Authors in [18] proposed a novel approach that makes use of deep learning to detect and quantify sleep apnea by utilizing statistical elements of ECG data. According to the findings that were gathered, the recently proposed method demonstrates a significant improvement in accuracy in contrast to the methods that are considered to be state-of-the-art. Because it is non-invasive and relatively inexpensive, this algorithm has a lot of untapped promise in the field of sleep medicine. Authors in [19] offered two novel techniques for the diagnosis of obstructive sleep
482
A. Sathiya et al.
apnea and snoring. By employing the MobileNet V1 model, it is possible to achieve the highest accuracy of by converging MobileNet V1 with LSTM, it is possible to get an accuracy of 90%; and by converging MobileNet V1 with GRU it is possible to achieve an accuracy of 90.29%. The results that were obtained provide evidence that the strategy that was suggested is superior to the best practices that are currently in place. As an illustration of how the authors’ ideas may be implemented in clinical settings, they developed a wearable device that can continually monitor ECG signals and classify them as either indicative of apnea or normal. Authors in [20] presented a network architecture called RAFNet, which is confined attention fusion, to detect sleep apnea. Additionally, a new restricted attention mechanism is proposed, one that uses the target segment as the query vector and cascades morphological and temporal attention to efficiently learn the feature information and suppress redundant feature information from the adjacent segments. By combining features of the target segment with those of neighboring segments using a channel-wise stacking strategy, SA detection performance is improved. An MCFN model is recommended by [21] for identifying OSA, hypopnea, and normal sleep. To maximize information extraction and make use of shallow features, the model makes use of the multilevel feature concatenation block. Later, an attention mechanism is used in the model to combine the data from the three different levels of respiratory signals (airflow, abdominal activity, and thoracic activity). The fusion block gives the three respiratory signals different weights based on their attributes, amplifying the important one while decreasing the volume of the others. The results showed how SAS identification was affected by using a variety of channel characteristics, a fusion of numerous channels, features at different levels, and multiple respiratory signals. The MCFN model’s usage of signal complementarily and feature completeness improves SAS detection performance. This method outperforms the competition on the MESA dataset, with a detection accuracy of 87.3%. Having a method in place for diagnosing Sleep Apnea–Hypopnea Syndrome is essential for monitoring a patient’s condition. Using only the SpO2 data, researchers [22] created a CNN model for SAHS identification. The CNN model used in this study can be used to detect SAHS, and it does so more cheaply and easily than the standard approach. This study evaluated the effectiveness of the proposed models by using data from the Apnea-ECG and UCD databases. The results of the proposed CNN model are compared with those of similar methods and concluded there. The results showed that this model produced results that were on par with or better than those seen in other studies in this field. CNN achieved 95.5% accuracy when trained on the Apnea-ECG database and 90.2% accuracy when trained on the UCD database. Data analysis reveals that cutting out the PSG test entirely and using CNN to detect SAHS can streamline diagnostic procedures while reducing reliance on test results. The developed model for determining whether a patient has SAHS is a valuable and applicable resource that may be critical to the patient’s future health. To diagnose OSA from a single-lead ECG, a CNN model was suggested in [23] to classify CQT pictures of apnea and non-apnea signals. At first, one-minute chunks of the signals were extracted. Additionally, a first-order Butterworth band-pass filter was used to adjust the baseline.
33 A Deep Learning Framework for Sleep Apnea Detection
483
These filtered chunks are then entered into the custom-built CNN model after being turned into time/frequency graphs through constant Q-transform.
3 Proposed Work The proposed method is layered, much like a traditional neural network, with each layer being in charge of a different aspect of learning. A deep learning framework is used to train the proposed network. When compared with traditional ANNs and other machine learning algorithms, deep learning’s ability to uncover new characteristics that are strongly connected to the small number of features that can already be recovered from the data in a training dataset stands out as a major benefit. Multiple individuals’ electrocardiograms (ECGs) on a single lead demonstrated varied degrees of obstructive sleep apnea. Each wave is expected to consist of a series of equally spaced parts, each of which lasts exactly one minute. The presence or absence of an apneic episode is noted in each film by a medical practitioner. A deep learning framework employing convolutional neural networks is fed the subject’s waveform samples. Features of apneic and non-apneic episodes are learned from the training data and classified in the later layers of the network. Figure 1 depicts the proposed work. During the testing, the trained network is put to the test by providing it with input data from various sorts of segments from those that were utilized during training. This allows us to evaluate how well the network can identify the OSA condition. Therefore, the feature extraction and classification algorithms that are often used to identify OSA are superfluous when employing the method described in this study. The most common use case for these network versions is in image analysis. It is utilized to analyze time series data in the study presented here. The details of the proposed network are presented below, and the entire setup is shown in Fig. 2. As can be seen in Fig. 2, the CNN architecture is made up of several layers. Here are some brief descriptions of each layer. a. Input Layer The single-lead ECG signals from patients with various sleep disorders are received at this layer. Figure 2 shows the relative sizes of the several layers. There are 6000 data points each minute in all the inputs. b. ReLU Layer Rectified linear units is the expansion of ReLU. The thresholding process is carried out by this layer using a nonlinear decision mapping function. The role in this particular part of the proposed work is formulated as: max{0, s} =
0 if s < 0 s if any
484
A. Sathiya et al.
Fig. 1 Flowchart of proposed methodology
c. Convolutional Layer The convolution filtering of the input data is completed at this layer so that salient characteristics can be extracted and passed on to the succeeding levels. The mathematical expression for the convolution operation for a 1D signal (sequence) is c[n] = s[n] ∗ h[n] =
+1 −1
s[k]h[n − k]
33 A Deep Learning Framework for Sleep Apnea Detection
485
Fig. 2 Illustration of the CNN-based technique
where s[n] = signal h[n] = convolutional layer. Each convolution kernel in a convolutional neural network (CNN) stands in for a different class of features, such as the type of temporal variation present, the sharpness of edges in the input, the amplitude of small changes, etc. Twenty convolution kernels are utilized in the network under consideration, and their weights are initially seeded at random. After that, backpropagation learning is used to refine the weights. For the first convolution layer, the filter structure’s dimension is written as [1 9 60 9 20]. Like choosing how many neurons to place in a given layer of a traditional neural network, this structure is arrived at through trial and error. d. Pooling Layer This layer’s responsibility is to bring down the dimensions. The method of max pooling [9] is often used for this task. In addition to average pooling, L2-norm pooling is also commonly used. When dealing with image inputs, these pooling techniques function admirably. Convolutional pooling is used instead of a straightforward pooling process to better accommodate the 1D time-domain input signals. It is easy to achieve by adding a new convolution layer and sliding the kernel in a different direction along the time axis. Another new aspect of the current plan is this. e. Fully Connected Layer It is functionally identical to the hidden layers in CNN, as the name suggests, with each neuron in this layer coupled to all activations in the one below it. It should be emphasized that unlike the earlier layers, which focus on feature learning, this one is responsible for carrying out the classification task itself. The network is kept up-to-date by a gradient-based backpropagation technique. However, the updated
486
A. Sathiya et al.
algorithm selected should be tailored to the problem at hand. The necessary level of accuracy can be achieved by repeating these layers an unlimited number of times (n). As can be seen in Fig. 2, the layers are repeated twice in the current work to attain the required precision.
4 Results and Discussion The apnea-ECG dataset from Physionet has been used to evaluate the effectiveness of the present approach. In this study, 35 patients with OSA were recorded using a single-lead electrocardiogram. The study population’s physiological characteristics are listed below. • • • • •
Length in minutes AH Sex Apnea Non-apnea.
Each recording lasts between seven and ten hours. As was previously stated, every part of information in the dataset is meticulously labeled and rated by specialists in real-time. The existence or absence of sleep apnea throughout that period is indicated by these notations. A single ‘episode’ or ‘event’ can be inferred from each 1-min recording. Additional information about the database can be found in literature. Accuracy, sensitivity, and specificity are used as performance metrics for the suggested system. Since the values of these constants are standard knowledge, it need not be specified here. In this section, [24–26] are taken and compared with the proposed methodology to assess the efficiency of the proposed method. (i) Accuracy The accuracy of a test is measured by how many of the findings are actually correct. The following formula is used to determine this value: Accuracy = (TP + TN)/(TP + TN + FN + FP) (ii) Sensitivity The sensitivity of a test is measured by how many positive results out of a total number of samples it actually produces. The following formula is used to determine this value: Sensitivity = TP/(TP + FN)
33 A Deep Learning Framework for Sleep Apnea Detection
487
Fig. 3 Accuracy analysis
(iii) Specificity Specificity measures how many samples, on average, are genuinely negative. The following formula is used to determine this value: Specificity = TN(TN + FP) Figure 3 shows that the proposed method has obtained 97.25% of accuracy when compared with other existing methods. It achieves better level of accuracy while other methods achieved less value of accuracy. As seen from Fig. 4, the suggested approach has a sensitivity rate that is 96.58% higher than that of other currently used methods. The sensitivity it produces is superior to that of other methods in their respective categories. In contrast to other approaches that are already in use, the method that was recommended has attained a specificity rate of 95.25%, as can be shown in Fig. 5. In contrast to other methods, this one has a better rate of success in producing specificity results.
488
A. Sathiya et al.
Fig. 4 Sensitivity analysis
Table 1 displays the outcomes and contrasts the efficiency of the suggested scheme with that of the already-used approaches. When compared with other approaches using the same dataset, the proposed scheme’s superior performance is justified. By reducing the training data and increasing the testing data, the deep network’s feature learning capacity may be evaluated. Showing that the deep learning framework can adapt to learn the signal’s intrinsic properties, the method achieves satisfactory performance even with a smaller amount of training data. The network’s first few layers use a convolution operation, which can be thought of as a filter. Training involves updating the convolution kernels. One or more kernels in the process take on filtering properties that eliminate the stray noise. As a result, the suggested technique can also function with some degree of immunity to background noise. Therefore, the devised system can deal with noisy ECG data without severely degrading performance. This is an added merit that comes with the offered solution.
33 A Deep Learning Framework for Sleep Apnea Detection
489
Fig. 5 Specificity analysis
Table 1 Evaluation of proposed and existing methodologies
Refernces
Accuracy
Specificity
Sensitivity
Hassan and Haque [24]
86.32
85.65
87.48
Hassan [25]
88.35
91.87
83.25
Varon [26]
85.96
85.25
87.36
Proposed method
97.25
95.68
96.58
5 Conclusion In this work, a convolutional neural network (CNN)-based deep learning system is provided for automatic OSA identification from a single-electrocardiogram (ECG) lead. The primary result of this study is a greater degree of accuracy in OSA identification than was previously possible using existing methods. With the current method, feature extraction and classification software are the best. In this work, one segment handles feature learning while another segment handles supervised feature classification. Comparing the results of the current scheme with those of other methods
490
A. Sathiya et al.
reveals that it increases classification accuracy by more than 9% points on average. Using the proposed technique, inherent features can be learned with less training data, and random noise in the data can be reduced to some extent.
References 1. Chaw HT, Kamolphiwong S, Wongsritrang K (2019) Sleep apnea detection using deep learning. IEEE 13(4):261–266 2. Feng K, Qin H, Wu S, Pan W, Liu G (2020) A sleep apnea detection method based on unsupervised feature learning and single-lead electrocardiogram. IEEE Trans Instrum Meas 70:1–12 3. Shao S, Han G, Wang T, Song C, Yao C, Hou J (2022) Obstructive sleep apnea detection scheme based on manually generated features and parallel heterogeneous deep learning model under IoMT. IEEE J Biomed Health Inform 26(12):5841–5850 4. Shen Q, Qin H, Wei K, Liu G (2021) Multiscale deep neural network for obstructive sleep apnea detection using RR interval from single-lead ECG signal. IEEE Trans Instrum Meas 70:1–13 5. Hu S, Cai W, Gao T, Wang M (2022) A hybrid transformer model for obstructive sleep apnea detection based on self-attention mechanism using single-lead ECG. IEEE Trans Instrum Meas 71:1–11 6. Hafezi M, Montazeri N, Saha S, Zhu K, Gavrilovic B, Yadollahi A, Taati B (2020) Sleep apnea severity estimation from tracheal movements using a deep learning model. IEEE Access 8:22641–22649 7. Eldele E, Chen Z, Liu C, Wu M, Kwoh CK, Li X, Guan C (2021) An attention-based deep learning approach for sleep stage classification with single-channel EEG. IEEE Trans Neural Syst Rehabil Eng 29:809–818 8. Goshtasbi N, Boostani R, Sanei S (2022) SleepFCN: a fully convolutional deep learning framework for sleep stage classification using single-channel electroencephalograms. IEEE Trans Neural Syst Rehabil Eng 30:2088–2096 9. Van Steenkiste T, Groenendaal W, Deschrijver D, Dhaene T (2018) Automated sleep apnea detection in raw respiratory signals using long short-term memory neural networks. IEEE J Biomed Health Inform 23(6):2354–2364 10. Hemrajani P, Dhaka VS, Rani G, Shukla P, Bavirisetti DP (2023) Efficient deep learning based hybrid model to detect obstructive sleep apnea. Sensors 23(10):4692 11. Wang S, Xuan W, Chen D, Gu Y, Liu F, Chen J, Xia S, Dong S, Luo J (2023) Machine Learning assisted wearable wireless device for sleep apnea syndrome diagnosis. Biosensors 13(4):483 12. Zhang J, Tang Z, Gao J, Lin L, Liu Z, Wu H, Liu F, Yao R (2021) Automatic detection of obstructive sleep apnea events using a deep CNN-LSTM model. Comput Intell Neurosci 9(4):587 13. Chang HY, Yeh CY, Lee CT, Lin CC (2020) A sleep apnea detection system based on a one-dimensional deep convolution neural network model using single-lead electrocardiogram. Sensors 20(15):4157 14. Zarei A, Beheshti H, Asl BM (2022) Detection of sleep apnea using deep neural networks and single-lead ECG signals. Biomed Signal Process Control 71:103125 15. Zhou Y, He Y, Kang K (2022) OSA-CCNN: obstructive Sleep apnea detection based on a composite deep convolution neural network model using single-lead ECG signal. In: 2022 IEEE ınternational conference on Bioinformatics and Biomedicine (BIBM), vol 1, pp 1840–1845 16. Dong Q, Jiraraksopakun Y, Bhatranand A (20210 Convolutional neural network-based obstructive sleep apnea identification. In: 2021 IEEE 6th ˙International Conference on Computer and Communication Systems (ICCCS), IEEE, pp 424–428
33 A Deep Learning Framework for Sleep Apnea Detection
491
17. Abed M, Ibrikci T (2023) Sleep apnea events detection using deep learning techniques. J Sleep Disor Treat Care 12(1):2 18. Alsalamah M, Amin S, Palade V (2018) Detection of obstructive sleep apnea using deep neural network. Appl Big Data Anal Trends Issues Challenges 1:97–120 19. Samuel Manoharan J, Braveen M, Ganesan Subramanian G (2021) A hybrid approach to accelerate the classification accuracy of cervical cancer data with class imbalance problems. Int J Data Mining 25(3/4):234–259 20. Chen Y, Yue H, Zou R, Lei W, Ma W, Fan X (2023) RAFNet: restricted attention fusion network for sleep apnea detection. Neural Netw 1(1) 162:571–580 21. Lv X, Li J, Ren Q (2023) MCFN: a multichannel fusion network for sleep apnea syndrome detection. J Healthcare Eng 22. Albuhayri MA (2023) CNN model for sleep apnea detection based on SpO2 signal. Comput Inf Sci 6(1):1–39 23. Kandukuri UR, Prakash AJ, Patro KK, Neelapu BC, Tadeusiewicz R, Pławiak P (2023) Constant Q–transform–based deep learning architecture for detection of obstructive sleep apnea. Int J Appl Math Comput Sci 33(3):493–506 24. Hassan AR (2016) Computer-aided obstructive sleep apnea detection using normal inverse Gaussian parameters and adaptive boosting. Biomed Signal Process Control 29:22–30 25. Hassan AR, Haque MA (2016) Computer-aided obstructive sleep apnea screening from single-lead electrocardiogram using statistical and spectral features and bootstrap aggregating. Biocybernetics Biomed Eng 36(1):256–266 26. Varon C, Caicedo A, Testelmans D, Buyse B, Van Huffel S (2015) A novel algorithm for the automatic detection of sleep apnea from single-lead ECG. IEEE Trans Biomed Eng 62(9):2269– 2278
Author Index
A Aditya Warke, 449 Akhmedova, Nozima, 341 Akshay Gajanan Bhosale, 329 Akshay, R., 407 Almezhghwi, Khaled, 275
E Eg Su, Goh, 243
B Bacanin, Nebojsa, 51 Belhaj, Fairouz, 275 Belousova, Margarita, 465 Bhagya, R., 169 Bharthi, R., 169 Booba, B., 259 Buchatska, Iryna, 139
G Ghadedo, Adel, 275 Gordienko, Yuri, 391
C Chandrika Morthala, 1 Chetan J. Shelke, 357
I Ismail, Ajune Wanis, 243
D Debasis Gountia, 87 Deepa, S., 259 Dhakshith, N. K., 199 Dharani, K. G., 477 Dinesh Kumar Anguraj, 13 Dobrojevic, Milos, 51 Dubovyk, Tatiana, 139
F Fadzli, Fazliaty Edora, 243 Filimonova, Tetiana, 139
H Hamid, Hrimech, 67 Hassan, Morad Ali, 275 Huertas-García, Álvaro, 373
J Jason D’souza, 299 Jovanovic, Luka, 51
K Kambhampati Kodanda Sai Harshitha, 155 Kapil Wavale, 449 Karnam Akhil, 155, 231 Ketan Yalmate, 423
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 S. Lanka et al. (eds.), Trends in Sustainable Computing and Machine Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-9436-6
493
494 Kiran Ingale, 423, 449 Kudryashov, Ilya, 465
L Lakshmi Priya Swaminatha Rao, 123 Lemak, Stepan, 465
M Maezo, Rubén García, 373 Maganti Hemanth Baji, 1 Mahek Karia, 299 Manan Mehta, 299 Manas Kumar Nanda, 87 Mano Paul, P., 357 Manukumaar, O. G., 437 Martí-González, Carlos, 373 Medipally Abinay, 155 Melnyk, O., 107 Mihir Lad, 299 Mohamed, Hanine, 67 Mohamed, Lachgar, 67 Muralidhar Billa, 357
N Nayana Shetty, 13
O Omkar Rane, 215 Onishchnko, O., 107 Oumaima, Derouech, 67
P Parameswari, R., 313 Param Kothari, 215 Patha Srija, 155 Petrovic, Aleksandar, 51 Pidhorna, Tetiana, 139 Prabhuraj Metipatil, 437 Praharsha Sirsi, 199 Pramod Kumar, P., 407 Prasanta Kumar Sahoo, 87 Pursky, Oleg, 139
R Radha, K., 313 Raghavendra Reddy, 437 Rahul Wagh, 423 Ranjan Kumar Dash, 87
Author Index Rathnakar Achary, 357 Rey, Alejandro Echeverría, 373 Riya Tambe, 449 Rohan Puchakayala, 1
S Sadimon, Suriati, 243 Sagar, K., 407 Sai Akhil Kakumanu, 155 Sai Dhanush, K., 1 Sankaran Vaibhav, 123 Sanket Thite, 449 Santhana Lakshmi, V., 185 Sapiha, V., 107 Sarasa-Cabezuelo, Antonio, 287 Saraswathi, K., 33, 199 Sathiya, A., 477 Selivanova, Anna, 139 Shaik Afreen, 231 Shanthi Therese, S., 299 Shanti Konda, 357 Sharan Giri, 123 Shevchenko, V., 107 Shubham Thakur, 449 Shulha, Maksym, 391 Shwehdi, Rabei, 275 Siddhartha Behera, 87 Snehapriya Murugan, 123 Sreedhar Namratha, 169 Sridevi, A., 477 Stetsenko, M., 107 Stirenko, Sergii, 391 Sudha, S. V., 1 Sudhir Dhage, 215 Supriya Sameer Nalawade, 329 Suresh Jaganathan, 123 Swati Sonone, 423
T Tamal Kumar Kundu, 13 Tanmay Yadav, 423 Tashev, Komil, 341 Toskovic, Ana, 51 Trisha Shishodiya, 215 Tugui, Alexandru, 27
U Uppala Reshmitha, 231
Author Index V Vachana, C., 33 Vedant Kadam, 299 Vijaya, M. S., 185 Virendra Kumar Shrivastava, 357 Vishnevska, O., 107 Vishnevskyi, D., 107
495 Y Yash Lahoti, 423
Z Zivkovic, Miodrag, 51