215 83 20MB
English Pages [188] Year 2023
Lecture Notes in Networks and Systems 670
Luigi Troiano · Alfredo Vaccaro · Nishtha Kesswani · Irene Díaz Rodriguez · Imene Brigui · David Pastor-Escuredo Editors
Key Digital Trends in Artificial Intelligence and Robotics Proceedings of 4th International Conference on Deep Learning, Artificial Intelligence and Robotics, (ICDLAIR) 2022 - Progress in Algorithms and Applications of Deep Learning
Lecture Notes in Networks and Systems
670
Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland
Advisory Editors Fernando Gomide, Department of Computer Engineering and Automation—DCA, School of Electrical and Computer Engineering—FEEC, University of Campinas—UNICAMP, São Paulo, Brazil Okyay Kaynak, Department of Electrical and Electronic Engineering, Bogazici University, Istanbul, Türkiye Derong Liu, Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, USA Institute of Automation, Chinese Academy of Sciences, Beijing, China Witold Pedrycz, Department of Electrical and Computer Engineering, University of Alberta, Alberta, Canada Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Marios M. Polycarpou, Department of Electrical and Computer Engineering, KIOS Research Center for Intelligent Systems and Networks, University of Cyprus, Nicosia, Cyprus Imre J. Rudas, Óbuda University, Budapest, Hungary Jun Wang, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
The series “Lecture Notes in Networks and Systems” publishes the latest developments in Networks and Systems—quickly, informally and with high quality. Original research reported in proceedings and post-proceedings represents the core of LNNS. Volumes published in LNNS embrace all aspects and subfields of, as well as new challenges in, Networks and Systems. The series contains proceedings and edited volumes in systems and networks, spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems and other. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution and exposure which enable both a wide and rapid dissemination of research output. The series covers the theory, applications, and perspectives on the state of the art and future developments relevant to systems and networks, decision making, control, complex processes and related areas, as embedded in the fields of interdisciplinary and applied sciences, engineering, computer science, physics, economics, social, and life sciences, as well as the paradigms and methodologies behind them. Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science. For proposals from Asia please contact Aninda Bose ([email protected]).
Luigi Troiano · Alfredo Vaccaro · Nishtha Kesswani · Irene Díaz Rodriguez · Imene Brigui · David Pastor-Escuredo Editors
Key Digital Trends in Artificial Intelligence and Robotics Proceedings of 4th International Conference on Deep Learning, Artificial Intelligence and Robotics, (ICDLAIR) 2022 - Progress in Algorithms and Applications of Deep Learning
Editors Luigi Troiano Department of Management and Innovation Systems University of Salerno Salerno, Italy Nishtha Kesswani Department of Computer Science Central University of Rajasthan Ajmer, Rajasthan, India Imene Brigui EMLYON Business School Écully, France
Alfredo Vaccaro Department of Engineering University of Sannio Benevento, Italy Irene Díaz Rodriguez Department of Computer Science University of Oviedo Gijón, Spain David Pastor-Escuredo University College London London, UK
ISSN 2367-3370 ISSN 2367-3389 (electronic) Lecture Notes in Networks and Systems ISBN 978-3-031-30395-1 ISBN 978-3-031-30396-8 (eBook) https://doi.org/10.1007/978-3-031-30396-8 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
Although artificial intelligence (AI) has generated a lot of hype over the past ten years, these consequences on how we live, work, and play are still in their infancy and will likely have a significant impact in the future. The supremacy of AI in areas like speech and picture recognition, navigational apps, personal assistants for smartphones, ride-sharing apps, and many other areas is already well established. The book “Key Digital Trends in Artificial Intelligence and Robotics” (proceedings of 4th International Conference on Deep Learning, Artificial Intelligence and Robotics (ICDLAIR) 2022) introduces key topics from artificial intelligence algorithms and programming organizations and explains how they contribute to health care, manufacturing, law, finance, retail, real estate, accountancy, digital marketing, and various other fields. The book is primarily meant for academics, researchers, and engineers who want to employ AI applications to address real-world issues. We hope that businesses and technology creators will also find it appealing to utilize in industry. Luigi Troiano Alfredo Vaccaro Nishtha Kesswani Irene Díaz Rodriguez Imene Brigui David Pastor-Escuredo
Organization
PC Chairs Troiano, Luigi Vaccaro, Alfredo Kesswani, Nishtha Díaz Rodriguez, Irene Brigui, Imène Pastor-Escuredo, David
University of Salerno, Italy University of Sannio, Italy Central University of Rajasthan, India University of Oviedo, Spain Emlyon Business School, Ecully, France UNHCR, Geneva
Program Committee Members Abdul Kareem, Shaymaa Amer Bencheriet, Chemesse Ennehar Brigui, Imène Das, Piyali
Ghosh, Rajib Lo, Man Fung Misra, Rajiv Yesmin, Farzana
Al-Mustansiriya University, Baghdad University of Guelma, Algeria Emlyon Business School, Ecully, France North Eastern Regional Institute of Science and Technology, Electrical Engineering, Nirjuli, India NIT Patna The University of Hong Kong, Hong Kong, Hong Kong IIT Patna Daffodil International University, Bangladesh
Reviewers Abdul Kareem, Shaymaa Amer Brigui, Imène Chapi, Sharanappa Das, Piyali
Islam, Mirajul Iyer, Sailesh Jyotiyana, Monika
Al-Mustansiriya University, Baghdad Emlyon Business School, Ecully, France B.M.S. College of Engineering, Engineering Physics, Bengaluru, India North Eastern Regional Institute of Science and Technology, Electrical Engineering, Nirjuli, India Daffodil International University, Bangladesh Rai University, Ahmedabad CURAJ, Kishangarh (Ajmer)
viii
Organization
Kumar, Krishan Lonkar, Bhupesh Mondal, Ashim Majumder, Swanirbhar Masum, Abu Kaisar Mohammad Moon, Nazmun Nessa Nahian, Jabir Al Ramesh, Krithiga Ria, Nushrat Salehin, Imrus Shetu, Syeda Farjana
NIT Kurukshetra Datta Meghe Institute of Engineering & Tech., Wardha Aliah University, Electrical Engineering, Kolkata, India Tripura University, Information Technology, Agartala, India Daffodil International University, Bangladesh Daffodil International University, Bangladesh Daffodil International University, Bangladesh Microsoft Research India Daffodil International University, Bangladesh Dongseo University, Busan, South Korea BJIT Limited, SQA Department, Dhaka
Contents
A Study Review of Neural Audio Speech Transposition over Language Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sharun Akter Khushbu, Moshfiqur Rahman Ajmain, Mahafozur Rahman, and Sheak Rashed Haider Noori DCNN Based Disease Prediction of Lychee Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . Saiful Islam, Shornaly Akter, Mirajul Islam, and Md. Arifur Rahman MEVSS: Modulo Encryption Based Visual Secret Sharing Scheme for Securing Visual Content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parul Saini, Krishan Kumar, Shamal Kashid, and Alok Negi Analysis of Bangla Transformation of Sentences Using Machine Learning . . . . . Rajesh Kumar Das, Samrina Sarkar Sammi, Khadijatul Kobra, Moshfiqur Rahman Ajmain, Sharun Akter khushbu, and Sheak Rashed Haider Noori Towards a Novel Machine Learning and Hybrid Questionnaire Based Approach for Early Autism Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sid Ahmed Hadri and Abdelkrim Bouramoul Bi-RNN and Bi-LSTM Based Text Classification for Amazon Reviews . . . . . . . . Shamal Kashid, Krishan Kumar, Parul Saini, Abhishek Dhiman, and Alok Negi Resource Utilization Tracking for Fine-Tuning Based Event Detection and Summarization Over Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alok Negi, Krishan Kumar, Prachi Chauhan, Parul Saini, and Shamal Kashid Automatic Fake News Detection: A Review Article on State of the Art . . . . . . . . Karim Hemina, Fatima Boumahdi, and Amina Madani Cascaded 3D V-Net for Fully Automatic Segmentation and Classification of Brain Tumor Using Multi-channel MRI Brain Images . . . . . . . . . . . . . . . . . . . . Maahi Khemchandani, Shivajirao Jadhav, and Vinod Kadam
1
13
24
36
53
62
73
84
94
Computing Physical Stress During Working Shift with Deep Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Vincenzo Benedetto, Francesco Gissi, and Marialuisa Menanno
x
Contents
Trend Prediction in Finance Based on Deep Learning Feature Reduction . . . . . . 120 Vincenzo Benedetto, Francesco Gissi, Elena Mejuto Villa, and Luigi Troiano A Preliminary Study on AI for Telemetry Data Compression . . . . . . . . . . . . . . . . . 134 Gioele Ciaparrone, Vincenzo Benedetto, and Francesco Gissi On the Use of Multivariate Medians for Nearest Neighbour Imputation . . . . . . . . 144 Francesco Gissi, Vincenzo Benedetto, Parth Bhandari, and Raúl Pérez-Fernández Decision Making by Applying Machine Learning Techniques to Mitigate Spam SMS Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 Hisham AbouGrad, Salem Chakhar, and Ahmed Abubahia Voronoi Diagram-Based Approach to Identify Maritime Corridors . . . . . . . . . . . . 167 Mariem Masmoudi, Salem Chakhar, and Habib Chabchoub Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
A Study Review of Neural Audio Speech Transposition over Language Processing Sharun Akter Khushbu, Moshfiqur Rahman Ajmain(B) , Mahafozur Rahman, and Sheak Rashed Haider Noori Daffodil International University, 1341 Dhaka, Bangladesh {sharun.cse,moshfiqur15-14090,mahafozur15-2955}@diu.edu.bd, [email protected]
Abstract. Natural Language processing is the advancement of Artificial Intelligence in the modern technological era. Machine Translation this is the vast majority of languages transformation among human languages and computer interaction. NLP domain creates a sequential analysis of the path where the neural network basement is mathematically and theoretically strong enough. In unwritten language aim to multimodal language transformation. According to the spoken language there are several aspects of prosperity. Thereby coming up with the development of linguistic CNN models for all manpower who spoke in their mother tongue. Therefore, concern with english speech language processing advancement has a great impact on language transformation. In a sense other languages can be placed by the speech to language transformation computational period emerged on severally made corpus languages. Due to this implementation of model refer probabilistic model. Consequently, this is a study of a benchmark of recent happenings in unwritten language to language modeling in which summarization or transformation will be faster. With concern current research work that has described how performing best in CNN model and attention model, described statistical deployment. Finally, this paper is capable of inflicting the solution. Furthermore, studies have discussed the impact of result, observation, challenges and limitation with the respect of the solution. The challenge is voice identification without noise is challenging. Keywords: NLP
1
· CNN · AI · Attention Mechanism
Introduction
Machine Translation invented a Neural network possibility by creating a wide research field of Natural Language Processing. Furthermore NLP explores the area of visualization power across large amounts of data. Tropically, NLP has capability of processing various types of problems. NLP was introduced [1] many times ago but recently lots of repositories have been created throughout the knowledge invention. Moreover, researchers at present relate their languages to modify the language as instinctive modernized support. According to the field c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 L. Troiano et al. (Eds.): ICDLAIR 2022, LNNS 670, pp. 1–12, 2023. https://doi.org/10.1007/978-3-031-30396-8_1
2
S. A. Khushbu et al.
specification lot of sectors emerged on NLP such as Automatic Speech Recognition (ASR) [2–4], Speech to emotion recognition [10,15], speech coding [5], NER [6,7] for text annotation, continuous sequences assistive technology [8,9]. Natural language processing study to help a sustainable productive research output. This study refers to a review on unwritten languages where study had been rendered as the current NLP path driven by the Convolutional Neural Network(CNN). The review analysis that focused on comparative study where specifically upholds how much machine translation is reliable and exists with audio speech synthesizing. This is the sequential model analysis of DL [10–15]. CNN layer is the expertise of the noise reduction onto any speech. Large integrated data in heterogeneous modules predict over the solution and limitation of speech recognition. Natural language processing attentively genre the computational resources where speech detect on its utterances. Moreover, input data remarks by hand correction before preprocessing the word phrase. In language processing determine the accurate speech output that totally divides the word and converts into text. Language processing replaces the machine readable power in order to change keyboard transformation through voice recognition. Definitely there are some limitations to gathering large volumes of data but more than advantages too. Throughout the voice synthesization method to text or something can reduce larger time boundaries. State of the art worked on different Corpora that predict language models which can suggest the same category of word. On the contrary, there are also difficulties in different corpora segmentation based on language utterances, word selection, length of sentences, relativity of words, speakers fluency, noiseful data. Furthermore, preprocessing is also challenging against different countries based on different languages. Due to this analysis the outcome mostly refers to several explorations such as emotion, hate words, sarcastic words, ASR and crime detection etc. According to this research initial input has a great part of sound activity in which tape to record voices using microphone, smartphone recorder and so on recording instruments. Language processing has been placed a tremendous avail for deaf those who do not hear by ear. In this discussion that reviewed those research which relates to unwritten languages and also comprehensive use of sequential CNN model learning in neural architectures. A machine translation model is an attention mechanism developed inside creating RNN models.
2
Brief Background
Speech transposition is the recent trend in various sub continental and other countries. In order to do so many researchers had worked on it. A brief background only can analyse, produce feedback, data set availability, model efficiency and comparative study deliver in the next advancement. Odette Scharenborg et al. [16] asserted by the exploration of deep learning throughout sequence to sequence language processing. Accepted audio file addressed by the unwritten language confessed about not only one but also three representations along with defeat necessary of language formulation difficulties. As mentioned with speechtranslation [17,18] required LSTM, speech-image besides retrieving image-speech
A Study Review of Neural Audio Speech Transposition
3
required PyTorch. An unwritten language [19] needs to specify speech to meaning vice versa for utterances because of denoting by image, translation, documentation and auto text generation. Deep neural networks [17,20,21] modulate the signal [22] of any natural languages into text form [23,24]. Proceeded by speech translation paves that English-English, English-Japanese, MboshiMboshi, Mboshi-french refers BLEU score later apart from speech-image stand by BLEU scores and PER(Phone Error Rate). In [25], The authors proclaimed the Speech emotion recognition replace the ASR technique where audio speech produces a signal after that applied by the attention mechanism it enhanced speech emotion recognition. In this research study developed a transfer learning scheme in which aligned between speech frames and speech text. Researchers had been establishing an attention mechanism model due to the RNN and LSTM model. According to the analysis showed comparative performance with LSTM+Attention, CNN+LSTM, TDNN+LSTM. The dataset of speech has been collected from the IEMOCAP source and trained by the Bidirectional LSTM model. With ideal parameter settings machines have learned a multimodal feature that produces a sequential model with fully connected hidden layers by the 0.001 learning rate and 16 kHz utterances pattern up to 20 s duration. According to the comparison rate speech emotion recognition states that Oracle text accuracy reminder value addition with other comparative analysis. In [26], the authors facilitated distorted speech signal processing that have used the Transformer MT system and LSTM. Removed noisy background found clean speech that are used as BPC phonetic class and (BPPG) posterior-gram developed SNR system. Nurture with TIMIT dataset was evaluated as BPSE rate ground truth rate, noise ratio and transformer, LSTM performance. The SNR system implies acoustic signals into symbolic sequences. The study of incorporating broad phonetic information for speech enhancement was outperformed by the overcome different SNR criteria. In [27], the authors employed an assistive technology that covers many sectors of COVID-19 with the audio speech synthesis analysis. It had been studied about speech transformation about Covid or not Covid cough sound samples, voice synthesizing for face mask wearing or not, breathing speed ups and down analysis, Covid speech to text analysis and Mental health sensitivity analysis from twitter, instagram sound clip. All are the unwritten languages detected Covid-19 situation that is customized by the attention mechanism and transformation of natural language processing. In [28], the authors elaborate between virtual speaker speech and real speaker speech that have shown the possibility of noise reduction. In [29], the authors had employed over voice recording in which (WCE) word count estimation using six numbers of various corpora of several languages. English languages such as French, Swidish, Canadian, Spanish are different languages covered by the daylong recordings from children. According to this audio speech produces consistent performance over model in all several corpora. Collected speech worked for two specifications one is speech activity detection where detect word depends on its utterance also deduct noise another work was syllabification of speech that check the phonetic models. Study had also illustrated the limitations of working on speech
4
S. A. Khushbu et al.
recognition synthesis. In [30], the authors demonstrate that emotion recognition over natural language processing. This is another speech transposition procedure towards enhancing LSTM based models. Also facilitate the preprocessing section where speech is processed by the word. According to the coefficient rate authors showed a visual representation of audio speech. The working section represents the functionality of CNN, LSTM that produce outperform results. In [31], the author’s explanation states that LSTM 2D convolutional layers in order to perform speech transformation analysis according to the speech signal processing where signal can drive brain signals. Moreover, Graphical representations that reproduce clean signals from original signals. Separating clean signals after that using theoretical implementation noisy speech cleaned. In [32], the authors claimed a spectrogram convolutional neural networks towards a 2240 speech dataset where it combines with depressed and nondepressed related data. Due to speech signal processing developed a model that represents a convolutional 256 hidden layer with Dense layer containing max pooling and softmax. End to end convolution neural network model gained 80% highly accurate model check validity of F-score. This study also proclaimed for speech to depression detection behalf of the controversial LSTM model that justified all necessary parameters.
3 3.1
Methodology Statistical Approach
Analysis is the data set that has the great effort in model end-to-end unwritten language transformation where neural networks recreate in sequence-to-sequence machine conversion. Regarding through the making larger dataset quite tuff in audio speech data collection. Due to audio speech or audio voice collection from different inputs there are more difficulties in the dataset. According to the unwritten languages collection phrase dataset included by noisy input. Therefore, SNR fixed the issues and reduced the noise ratio and reformed the speech as a clean speech that is standard for speech processing in model. This input signal converted into a wav signal afterwards wav signal put counting number head of word vectors. CNN multimodal language modeling trained large volumes of data so that states statistical of dataset very high volume. Sequence to sequence segmentation is the review of Dataset where models generate effective output. ASR technique using CNN model that experiment on publicly known larger dataset e.g. LRS2 [33,34] around thousands of sentences on the news. According to the utterances separately another large dataset VoxCeleb2 [35] that had been made by the different spoken input among 6000 speakers. Tropically, the dataset is fed into the model by dividing two sizes and that is Training dataset and testing dataset. Speech review work is not available. Here the discourse review work and discussion of conflicting results distinguish this work from other works.
A Study Review of Neural Audio Speech Transposition
3.2
5
Experimental Setup and Evaluation Protocol
CNN models transform in multimodal two languages output with the experimental setup. Thereby, input signals all types of difficulties have been measured by theoretical terms also collaboration with data preprocessing in which need to cover the challenges of spectacular noise removal signal. In the preprocessing method raw dataset delivers a significant word vector output by applying a lemmatization method afterwards parser parses the data with word level annotation. Any research study there has a justified protocol where machine translation in NLP by CNN represents an optimum, effective solution. Therefore, removing noise unavoidable input that raises speech quality where Interference Ratio of signal detect low voice, unclear speech. Furthermore, other staff such as SAR, SDR and STOI have been looking to wav signal quality ratio, intelligibility of transform sequential signal. Phase distortion calculated by matrix parameters what have reformed by character or word number [36]. Thereby, a great number of research domains apply ASR technique and enhanced WER in few studies. 3.3
CNN Model
Due to the analysis of input framework CNN model is the right approach to centralized the actual accuracy path. CNN model contains by itself fully connected of 5 layers where each layer is centrally connected with hidden 8 layers [37,38]. Convolutional neural network developed with faster GPU module where highly powerful NVIDIA graphics that visionary signal processing output where CNN layer converted it into 512 unit or 256 unit. Nevertheless, CNN states the optimum output in each unit by adam optimizer. Relu functionality [39] that has pays render the activity of the Dense layer. Each unit maintains the hiding dense layer from starting to fully connected. Mostly lots of large volumes of data give compatible high order accuracy suggested in CNN. Popular dataset runs in CNN remain very high efficient like PASCAL VOC. Speech data requires a CNN model nust where recurrent neural network used in the model adjusting with LSTM model (Fig. 1).
Fig. 1. ASR of language transformation acoustic model.
6
S. A. Khushbu et al.
Paper Title Speech technology for unwritten languages
Learning Alignment for Multimodal Emotion Recognition from Speech
Incorporating Broad Phonetic Information for Speech Enhancement
An Overview on Audio, Signal, Speech, & Language Processing for COVID-19
Author
Odette Scharenborg, Laurent Besacier, Alan Black, Mark Hasegawa -Johnson, Florian Metze, Graham Neubig, Sebastian Stuker, Pierre Godard, Markus Muller, Lucas Ondel, Shruti Palaskar, Philip Arthur, Francesco Ciannella, Mingxing Du, Elin Larsen, Danny Merkx, Rachid Riad, Liming Wang, Emmanuel Dupoux
Haiyang Xu, Hui Zhang, Kun Han, Yun Wang, Yiping Peng, Xiangang Li
Yen-Ju Lu, Chien-Feng Liao, Xugang Lu, Jeih- weih Hung, Yu Tsao
Gauri Deshpande, Bjorn W. Schuller
Method
Speech to Text conversion 3 speech translation approach for speech to image, image to speech
ASR system in Phonetic Bidirectional based acoustic method model on speech to word LSTM two language modeling
Speech to text and emotion detection CNN method approached
Dataset
FlickRreal Dataset
IEMOCA P Dataset of speech files
TIMIT dataset
Self collected features of speech signal
Size
6000 images 30,000 speech files
NA
3696 utterances of speech
NA
Accuracy
BLEU score 7.1%
PER score 14.7% WA-70.4 UA-69.5
LSTM score 78%, BPG score 0.824
Accuracy 0.69%
Year
2020
2020
2020
2020
A Study Review of Neural Audio Speech Transposition Paper Title The fifth ‘CHIME’ Speech Separation and Recognition Challenge: Dataset, task and baselines
The Conversation: Deep Audio-Visual Speech Enhancement
7
Phoneme-Specific Speech Separation
Author
Jon Barker, Shinji Triantafyllos Afouras, Watanabe, Emmanuel Joon Son Chung, Vincent, and Jan Trmal Andrew Zisserman
Method
CNN encoder-decoder based LSTM network
Loss function ASR systems NMF estimation in SNR ratio method in DNN measured PESQ on network model CNN based networks
Dataset
Self-collected features of speech signal
LRS2 Dataset on noisy Self-collected dataset audio speech on ASR
Size
200k utterances
1000 sentences & 6000 different speakers
4026 several speech files
Accuracy
200k utterances (LF-MMI TDNN) 81.3 end-to-end 94.7
WER score is 98.9% accuracy, ground truth signal 8.8%
ASR performance in WER is 13.46%
Year
2020
2020
2020
3.4
Zhong-Qiu Wang, Yan Zhao and DeLiang Wang
LSTM
Language modeling difficulties occur by its language phonetics, morphology and anafora. Due to audio speech analysis LSTM [40] transfers a voice signal into sequential processing where the model can convert text or word or break into the sentence by labeling. LSTM gives the facility of regularizing, word embeddings [41] and optimizing. Recurrent neural networks developed its model on encode the input source and decode the reformed text then produce output with BLEU score. Encoded output new sequential text formed as metrics which is measured by categorical cross entropy and adam optimizer. Sequential processing moderate the model setting parameters counting matrix weighted position. Tropically, LSTM is two languages modeling task implementation happening by the 3D shape matrix in which it is highly modulated [42–44]. 3.5
Attention Mechanism
Many of the unwritten language translations that have involved another attention model analysis. This is another structured model enriched by the researcher where context follows multi head mechanisms that are double linked with self attention base multi head transfer. According to this transformer model developed feed forward networks with encoding signals and output decoding signals in word or character sequences or summary representation. Furthermore, attention mechanisms have a duty to positional encoding where encoding structured sequential sentences convert it into words. After that, positional encoding in
8
S. A. Khushbu et al.
terms of transfer the signal of word count as a head this head of word concat with each other and decode a sequential output. NMT requires output estimation of BLEU scores by plotting n grams key. N grams key compute the final output sequence with BLEU score [45–52]. 3.6
Findings and Limitation
In a sense, the reviews of the unwritten languages based on the neural network have lots of findings defining the phonetically, morphological, utterances high similarity of input with output. CNN Neural network, RNN, LSTM and attention mechanism derive rap id advancement. Findings that measure those model efficiency also enlightened the next upcoming related research where this model can contribute a big part for improving intelligibility and quality. These models handle noisy conditions that give noiseless output signals. Another study also has explained about LSTM networks good for phonetic module to speech capturing. SNR puts input from audio data after that fresh data ready for parsing, lemmatization. An acoustic model that has some validity addressing the development of Corpora. NLP processes different languages and comes up with multi - modal transformation. In Table - 1 review that has some specification about CNN model higher accuracy. All popular dataset and few are self - collected data set responded to very high accuracy. According to this review, derived data-set volume and a healthy dataset will make an impact based on the model and also depending data-set on its own waveform high to low frequencies calculation. This paper’s technical contribution is identifying the different types of statistical analysis on the other side exploratory analysis of speech’s recent trend of work.
4
Impact of Result
To the best of our knowledge, a dataset plays a strong character when input source sounds good for volume and noiseless. By this observation, wav signal is capable of interpreting as much wrong and depending on it everything will spoil taking wrong input, preprocessing result i s not good, model unoccupied for transfer learning and so on problematic functionalities will arise. Thus all that staff misguide the multimodal language transformation process for sequential learning. To come up with the solution of limitation, the study make table where approached input the great compatibility of WER analysis, SNR technique making noiseless waveform, BPC technique in articulatory by place and manner. Furthermore, larger data set also requires effective accuracy and loss function calculation rate that can conduct or address functionality of computational error (Fig. 2).
A Study Review of Neural Audio Speech Transposition
9
Fig. 2. ASR of language transformation acoustic model.
5
Conclusion
In this paper, which has reviewed the model flexibility over voice signal in benchmark of automatic speech recognition. According to the study of data-set it makes sense to have huge doman in unwritten languages depending on the several corpus of languages and volume of data. Concluding research summary finally after review we have got two known models is CNN and attention mechanism. These two models explain ASR technique possibilities with a very low loss function rate thereby accuracy in all data-set very effective. Therefore, Audio speech in which is collected by microphone and recorder after removing noise machine translation output estimate sequential output. This translation developed a multimodal language transformation. Lots of language domains related with unwritten languages are not involved in computational transformation multi-modal languages. Various sub-continental languages or country languages have created the recent domain of research. In terms of small numbers of study also begin sequential analysis among unwritten languages. Due to this advancement poorly required to increase volume of the data set and figure out the corpus lackings also need to fix it and refill the corpus lackings which can surely reach a tremendous comprehensive module. Acknowledgment. DIU NLP & Machine learning Research Lab give support to accomplish our research. We are thankful to give us support together with facilities and guidance.
10
S. A. Khushbu et al.
References 1. Jones, K.S.: Natural Language Processing: A Historical Review, vol. 9-10 (1994) 2. Li, J., Deng, L., Gong, Y., Haeb-Umbach, R.: An overview of noise-robust automatic speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22(4), 745–777 (2014) 3. Erdogan, H., Hershey, J.R., Watanabe, S., Le Roux, J.: Phase sensitive and recognition-boosted speech separation using deep recurrent neural networks. In: Proceedings of ICASSP 2015 (2015) 4. Weninger, F., et al.: Speech enhancement with lstm recurrent neural networks and its application to noise-robust asr. In: Proceedings of LVA/ICA (2015) 5. Accardi, A.J., Cox, R.V.: A modular approach to speech enhancement with an application to speech coding. In: Proceedings of ICASSP 1999 6. Jin, Y., Li, F., Yu, H.: BENTO: A Visual Platform for Building Clinical NLP Pipelines Based on CodaLab (2020) 7. Dai1, X., Karimi, S., Hachey, B., Paris, C.: An Effective Transition-based Model for Discontinuous NER. rXiv:2004.13454v1 [cs.CL] (2020) 8. Wang, D.: Deep learning reinvents the hearing aid. IEEE Spectrum 54(3), 32–37 (2017) 9. Lai, Y.-H., Chen, F., Wang, S.-S., Lu, X., Tsao, Y., Lee, C.-H.: A deep denoising autoencoder approach to improving the intelligibility of vocoded speech in cochlear implant simulation. IEEE Trans. Biomed. Eng. 64(7), 1568- 1578 (2016) 10. Wang, Y., Narayanan, A., Wang, D.: On training targets for supervised speech separation. IEEE/ACM Trans. Audio Speech Lang. Process. 22(12), 1849–1858 (2014) 11. Lu, X., Tsao, Y., Matsuda, S., Hori, C.: Speech enhancement based on deep denoising autoencoder. In: Proceedings of Interspeech (2013) 12. Xu, Y., Du, J., Dai, L.-R., Lee, C.-H.: A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 23(1), 7–19 (2014) 13. Kolbæk, M., Tan, Z.-H., Jensen, J.: Speech intelligibility potential of general and specialized deep neural network based speech enhancement systems. IEEE/ACM Trans. Audio Speech Lang. Process. 25(1), 153–167 (2016) 14. Tan, K., Zhang, X., Wang, D.: Real-time speech enhancement using an efficient convolutional recurrent network for dual microphone mobile phones in close-talk scenarios. In: Proceedings of ICASSP 2019 15. Qi, J., Du, J., Siniscalchi, S.M., Lee, C.: A theory on deep neural network based vector-to-vector regression with an illustration of its expressive power in speech enhancement. IEEE/ACM Trans. Audio Speech Lang. Process. 27 (2019) 16. Scharenborg, O., et al.: Speech technology for unwritten languages (2020) 17. Berard, A., Pietquin, O., Servan, C., Besacier, L.: Listen and translate: a proof of concept for end-to-end speech to-text translation. In: NIPS Workshop on End-toEnd Learning for Speech and Audio Processing (2016) 18. Weiss, R.J., Chorowski, J., Jaitly, N., Wu, Y., Chen, Z.: Sequence-to-sequence models can directly transcribe foreign speech. arXiv preprint arXiv:1703.08581, 2017 19. Besacier, L., Zhou, B., Gao, Y.: Towards speech translation of non written languages. In: Spoken Language Technology Workshop, pp. 222–225. IEEE (2006) 20. Duong, L., Anastasopoulos, A., Chiang, D., Bird14, S., Cohn, T.: An attentional model for speech translation without transcription. In: Proceedings of NAACLHLT, pp. 949–959 (2016)
A Study Review of Neural Audio Speech Transposition
11
21. Fer, R., Matejka, P., Grezl, F., Plchot, O., Vesely, K., Cernocky, J.H: Multilingually trained bottleneck features in spoken language recognition. Comput. Speech Lang. 46(Supplement C), 252–267 (2017) 22. Malfrere, F., Dutoit, T.: High-quality speech synthesis for phonetic speech segmentation. In: Proceedings of Eurospeech, pp. 2631–2634 (1997) 23. Shen, J., et al.: 24. Wu. Natural TTS synthesis by conditioning wavenet on mel spectrogram predictions. In: Proceedings of ICASSP (2018) 25. Chan, W., Jaitly, N., Le, Q., Vinyals, O.: Listen, attend and spell. CoRR, abs/1508.01211 (2015) 26. Xu, H., Zhang, H., Han, K., Wang, Y., Peng, Y., Li, X.: Learning Alignment for Multimodal Emotion Recognition from Speech. arXiv:1909.05645v2 [cs.CL] (2020) 27. Lu, Y.-J., Liao, C.-F., Lu, X., Hung, J., Tsao, Y.: Incorporating Broad Phonetic Information for Speech Enhancement. arXiv:2008.07618v1 [eess.AS] (2020) 28. Deshpande, G., Schuller, B.W.: An Overview on Audio, Signal, Speech, & Language Processing for COVID 19. arXiv:2005.08579v1 [cs.CY] (2020) 29. Nirme, J., Sahlen, B., Ahlander, V.L., Brannstrom, J., Haake, M.: Audio-Visual Speech Comprehension in noise with real and virtual speakers. Elsevier J. Speech Commun. 116, 40–55 (2020) 30. Rasanen, O., et al.: Automatic word count estimation from daylong child centered recordings in various language environments using language-independent syllabification of speech. Elsevier 113, 63–80 (2019) 31. Md. Uddin, Z., Nilsson, E.G.: Emotion Recognition using Speech and Neural Structured Learning to Facilitate edge Intelligence 94, 103775 (2020) 32. Ceolini, E., et al.: Brain informed speech separation (BISS) for enhancement of target speaker in multitalker Speech perception, 223, 117282 (2020) 33. Srimadhur, N.S, Lalitha, S.: An End-to-End Model for Detection and Assessment of Depression Levels using Speech, 171, pp. 12–21 (2020) 34. Chung, J.S., Senior, A., Vinyals, O., Zisserman, A.: Lip reading sentences in the world. In: Proceedings of CVPR (2017) 35. Chung, J.S., Senior, A., Vinyals, O., Zisserman, A.: Lip reading sentences in the world. In: Proceedings of BMVC (2017) 36. “VoxCeleb2 : Deep Speaker Recognition”, arXiv Preprint arXiv: 1001.2267 (2018) 37. Mowlaee, P.: On speech intelligibility estimation of phase-aware single-channel space enhancement. In: ICASSP (2015) 38. Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the Devil in the Details: Delving Deep into Convolutional Nets. arXiv:1405.3531v4 [cs.CV] (2014) 39. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1106–1114 (2012) 40. Yang, L., Song, Q., Wang, Z., Jiang, M.: Parsing R-CNN for Instance-Level Human Analysis (2019) 41. Merity, S., Keskar, N.S., Socher, R.: An Analysis of Neural Language Modeling at Multiple Scales. arXiv:1803.08240v1 [cs.CL] (2018) 42. Inan, H., Socher, R.: Tying Word Vectors And Word Classifiers: A Loss Framework For Language Modeling. arXiv:1611.01462v3 [cs.LG] (2017) 43. Sundermeyer, M., Schluter, R., Ney, H.: LSTM Neural Networks for Language Modeling (2012) 44. Merity, S., Keskar, N.S., Socher, R.: Regularizing and Optimizing LSTM Language Models. arXiv:1708.02182v1 (2017) 45. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
12
S. A. Khushbu et al.
46. Vaswani, A., et al.: Attention is all you need, pp. 5998–6008 (2017) 47. Niehues, J., Cho, E.: Exploiting linguistic resources for neural machine translation using multi-task learning. In: Proceedings of the Second Conference on Machine Translation, pp. 80–89 (2017) 48. Raganato, A., Tiedemann, J.: An Analysis of Encoder Representations in Transformer-Based Machine Translation (2018) 49. Caglayan, O., Barrault, L., Bougares, F.: Multimodal Attention for Neural Machine Translation. arXiv:1609.03976v1 [cs.CL] (2016) 50. Zhou, Q., Zhang, Z., Wu, H.: NLP at IEST 2018: BiLSTM-attention and LSTMattention via soft voting in emotion classification. In: Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 189–194 (2018) 51. Han, K.J., Huang, J., Tang, Y., He, X., Zhou, B.: Multi-Stride Self-Attention for Speech Recognition (2019) 52. McCann, B., Bradbury, J., Xiong, C., Socher, R.: Learned in translation: contextualized word vectors. In: 31st Conference on Neural Information Processing Systems (NIPS) (2017)
DCNN Based Disease Prediction of Lychee Tree Saiful Islam, Shornaly Akter, Mirajul Islam(B) , and Md. Arifur Rahman Department of Computer Science and Engineering, Daffodil International University, Dhaka 1341, Bangladesh {shiful15-11700,shornaly15-11732,merajul15-9627, arifur.cse}@diu.edu.bd
Abstract. Tree disease classification is needed to determine the affected leaves as it controls the economic importance of the trees and their products and decreases their eco-friendly eminence. The lychee tree is affected by some of the diseases named Leaf Necrosis, Stem Canker and leaf spots. Therefore, classifying the Lychee tree is essential to find the good and affected leaves. Our economic growth will be very high if we can adequately do the Lychee tree classification. In this paper, we tried to do a Lychee tree disease classification to make things easier for the farmers as they cannot correctly distinguish the good and bad leaves in an earlier stage. We have created a new data set for training the architectures. We have collected about 1400 images with three categories of pre-harvest diseases “Leaf Necrosis”, “Leaf Spots”, and “Stem Canker”. There are 1400 images in total, and out of those, 80% of the data is for training and 20% is for testing, this dataset has fresh and affected leaves and stems. For Lychee tree disease classification, we have chosen pre-trained CNN and Transfer Learning based approach to classify the layer of the 2D image by layer. This method can classify images efficiently from the images of disease leaves and stems. It will address disease from the images of the leaves and trees and determine specific preharvest diseases. Keywords: Lychee tree disease · Leaves · Stems · MobileNet · NASNetMobile · InceptionV3 · VGG16 · CNN
1 Introduction Fruits are essential for our bodies. It provides many health benefits. The tropical fruits of Bangladesh are great sources of antioxidant vitamins and antioxidant minerals. One type of tropical fruit that can be found in Bangladesh is lychee. It is one of the fruits in the highest demand and has the best flavor in our nation. It is beneficial to one’s health and it also offers nutritional value. Although somewhat pricey, it is one of the most sought-after and essential fruits for table use in Bangladesh. It is generally available in the market in May-June in every year. Diseases of the lychee tree include Leaf Necrosis, Leaf Spots, Stem Cancer, Anthracnose, Root Rot, and Red Algae, among others. The pathogens that cause these diseases, such as bacteria, viruses, fungi, parasites, etc., as well as unfavorable environmental and mental factors, are to blame. Environmental stress is the root of the majority of © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 L. Troiano et al. (Eds.): ICDLAIR 2022, LNNS 670, pp. 13–23, 2023. https://doi.org/10.1007/978-3-031-30396-8_2
14
S. Islam et al.
plant issues, either directly or indirectly. The disease’s type is determined by the signs and the areas of the leaves that are impacted. In the past, identifying plant diseases was often done by specialist farmers inspecting plants on a constant basis. Unfortunately, it necessitates a great deal of effort and money to produce a significant number of plants on large farms. Therefore, finding an automated, precise, quick, and less expensive system for identifying plant diseases is crucial. The most common and commonly used technologies are image processing and machine learning. According to Bangladeshi tradition, these months are called “Madhu Mash”. 46% of fruits harvested during this Madhu Mash. Although, despite the market’s abundance of some of the other fruit varieties, unpreserved lychee is in high demand due to its special mouthful, flavorful, and its amazing color. The availability of lychee is sadly limited to 60 days due to insufficient supply. Lychee yields on average per acre are roughly 2.5 MT, which is quite low when compared to other nations. The courtyard (two to three plants) or a small fruit garden (15 to 20 plants) next to the house are where lychee is mostly grownIt is grown all over the country, but the most important places are the districts of Dinajpur, Khulna, Jessore, Rajshahi, Kushtia, Sylhet, Rangpur, Dhaka and Chittagong. Bangladesh is an agricultural country and this sector plays a vital role in our GDP. More than 47% of people’s livelihood depends on the agriculture sector. The fruit sector is Bangladesh’s agriculture’s biggest and manifold economic area. After finishing our demands, litchi can also participate in Bangladeshi remission by exporting in several countries with the UN’s Food and Agriculture Organization (FAO). Commercial assembly and business of litchi are increasing day by day at a very higher rate. A significant part of the GDP, around 13%, comes from agriculture. Not being able to manage our agriculture sector properly causes us unexpected losses in our economy. Therefore, we must properly guide our farmers and keep a friendly agricultural system to ensure the long-term security of the food processor. If we can make this happen then, our GDP will also grow and we can give more employment to people in our country. We demonstrate a technique for automatically classifying Lychee tree diseases using four pre-trained Convolution neural networks (CNN). The structure of the paper is as follows: Sect. 2 clarifies the relevant work of several disease classification methods. The method and materials that were used are illustrated in Sect. 3. The experimental analysis, including performance and results, is depicted in Sect. 4. Section 5 discusses the article’s conclusion.
2 Literature Review In this section, we have tried to associate the related work. The research that was done in the past about plant disease classification and prediction and the methodology is shortly presented here: Hossain et al. [1] have used a framework established considering two models: One is a suggested light model with six layers of CNN and another is a fine-tuned pre trained deep learning model for group-16 visual geometry. They used two datasets: the first contains clear fruit photos with an accuracy of 99.49%, while the second contains fruit ages with an accuracy of 96.75%. Dandavate et al. [2] have proposed a system for classifying four fruits into three stages by using Convolutional Neural Networks. For
DCNN Based Disease Prediction of Lychee Tree
15
this study, they’ve put together a list of local fruits images. In 8 epochs, the accuracy was 97.74%, and the validation clarity was 0.9833. CNN and AlexNet architecture have been used by Arya et al. [3] to identify leaf diseases mango and potato. Then, the various architectures’ performance metrics were compared with one another. AlexNet achieves greater accuracy than the CNN architecture. Using deep learning, Jayakumar et al. [4] were able to classify the diseases and make predictions after processing and segmenting images of leaf surfaces. The prediction and identification were made using image acquisition, processing, and segmentation. They got higher accuracy by predicting and classifying the leaf diseases and got computational precision compared to the other models. Lakshmanarao et al. [5] have predicted plant disease by applying the transfer learning technique. They have taken the plant village dataset, which was collected from Kaggle. They divided the real dataset into three parts for three different plants. They used three transfer learning techniques named VGG16, RESNET50, and Inception and got an accuracy of 98.7%, 98.6%, and 99% sequentially. Their proposed model achieved good accuracy when compared to other models. Machine learning models were used by Qasrawi et al. [6] to cluster, predict, and classify tomato plant diseases. For clustering, they used image embedding and hierarchical clustering algorithms. The accuracy of the clustering model was measured at 70%, while the accuracy of the neural network model and the logistic regression model were measured at 70.3% and 68.9% respectively. Beikmohammadi et al. [7] have presented a transfer learning technique to identify plants leaf, which has first used a pre-trained deep CNN model and then takes the input data representation. Two familiar botanical datasets were used to classify the proposed method. Both Flavia, which has 32 classes, and Leaf snap, which has 184 categories, have achieved an accuracy of 99.6% and 90.54%, respectively. Gosai et al. [8] have made an effort to construct a model that categorizes the harvest leaves into healthy and unhealthy categories in order to better detect plant diseases. The researchers trained a model to recognize distinct harvests and 26 diseases using 54,306 images of unhealthy and healthy plant leaves taken under controlled conditions. A convolutional neural network with 13 layers was built by Zhang et al. [9]. Image rotation, Gamma correction, and noise injection were the three methods used for data augmentation. With an overall accuracy of 94.94%, their approach outperformed five state-of-the-art techniques approaches. A constructive machine vision framework is recommended for use in date fruit harvesting robots, according to Altaheri et al. [10]. They have utilized pre-trained DCNN through the process of fine-tuning and transfer learning. Wang et al. [11] established a dataset of 3743 samples grouped into three groups, namely mature, defects, and rot, and assessed the plausibility of automating the detection of defective surfaces for lychees. They use a transformer-based generative adversarial network (GAN) as a method of data augmentation to effectively enrich the initial training set with more diverse samples in order to rebalance the three categories in order to overcome this issue. Haque et al. [12] employed convolutional neural networks to construct a model for the detection of diseases in rice plants. K-Means clustering was utilized in their study because it can process photos by limiting the total number of colors in each image. A robust UAV DOM-based instance segmentation method for photos of litchi trees is provided by Mo et al. [13]. To alleviate the lack of diversity in the primary litchi dataset, citrus data are added to the training set. The model obtained
16
S. Islam et al.
the best Mask AP50 of 95.49% and the best Box AP50 of 96.25% on the test set with the aid of training on the litchi-citrus dataset.
3 Proposed Methodology The role of methodology is one of diligence, as it meticulously follows agreeable steps and procedures in order to achieve its goal. In this section of our study, we have applied pre-trained Convolutional Neural Networks [14, 15] for identifying fresh and affected leaves and stems. Classical neural networks are very hard for image recognition. CNN is used to ease the complication and struggles. We are able to recognize the pattern in the input image, which is difficult for computer vision to do, by using CNN. CNN’s key advantage is that it only needs a small number of parameters, allowing for a more compact model and faster results. Layers in a CNN are as follows: Convolution Layer, the first layer in a CNN. It works in 32 dimensions, utilizing two or three-dimensional input images and weights. Pooling layer: it preserves essential information, It does this by lowering the total amount of information contained in each convolutional layer’s features. Typically, a pooling layer follows the activation layer and can reduce the number of map presented. In most cases, an activation layer is followed by a pooling layer, which has the ability to minimize the amount of feature maps that are presented. In addition to this, it assigns a label to each image. 3.1 Dataset Data is an important aspect of the research research work. Given that our project is centered on the subject of the detection of fresh and affected leaves and stems, we have taken both fresh and affected leaves and stems of the Lychee tree. We have taken leaves from the garden and trees of the village. The whole dataset consists of 1400 images separated into five classes, including 1120 training and 280 testing images depicted in Table 1. The following are examples from the 5 categories shown in Fig. 1. 3.2 Data Pre-processing Preprocessing data refers to the steps taken before using the data to improve the consistency of the information produced. The training procedure for CNN models was optimized by applying two standard pre-processing techniques. 1) Resizing: This dataset contains images with varying resolutions and dimensions. We rescaled the original size of each image to 224 by 224 pixels so that the dimensions of all of the images that were input would be the same. 2) Normalization: we used ImageNet mean subtraction to rescale the intensity values of the pixels as a pre-processing step for image normalization. By applying min-max normalization [16] to the intensity range [0, 1], we standardized the intensity values of all images within the range [0, 255] to the standard normal distribution.
DCNN Based Disease Prediction of Lychee Tree
Fresh Leaf
Leaf Spots
Leaf Necrosis
Stem Canker
Fresh Stem Canker
Fig. 1. Sample images of five classes
17
18
S. Islam et al. Table 1. Dataset Description. Classes
Train Set (80%) Test Set (20%) Total Image
Leaf Spots
160
40
200
Leaf Necrosis
240
60
300
Fresh Leaf
240
60
300
Stem Canker
240
60
300
Fresh Stem Canker 240
60
300
3.3 Model Evaluation Its purpose is to determine, based on upcoming data, how well a model generalizes to those data. We utilized well-known assessment metrics such as recall, precision, accuracy, and f1-score in order to evaluate the prediction performance of the algorithms that were investigated in this study (Fig. 2).
Fig. 2. Proposed method workflow.
3.4 InceptionV3 The Deep learning convolutional neural architectures sequence has reached its third part with this segment. The TensorFlow category of InceptionV3 [17] was trained using the actual ImageNet dataset, which included more than 1 million training images. Its initial debut was in the ImageNet Large Visual Recognition Challenge, where it placed second all in all. Transfer learning gives us the ability to retrain the final layer of an existing model, which results in a significant reduction in the amount of training time required as well as the size of the dataset. The InceptionV3 model is a well-known model that is used for transfer learning. This model was trained using more than a million photos from one thousand different classes on some exceptional dynamic machines, as was previously mentioned. 3.5 VGG16 It’s another form of a convolutional neural network. The number that follows a network’s name represents the number of architectural layers. The objective aimed to build a very
DCNN Based Disease Prediction of Lychee Tree
19
deep convolutional neural network by stacking layers. A network that would do duties well. One of the awards in the ImageNet 2014 competition was won by the VGG16 [18] team. Five categories to watch evaluation following the output of the categorization vector. The ReLU serves as the activation mechanism for the entirety of the hidden layers. 3.6 MobileNet The MobileNet [19] model uses depth-wise separable convolutions, a kind of factorized convolution, to convert a conventional convolution into a depth-wise and a pointwise convolution, respectively (11 convolutions). One filter is used for each input channel in Mobile Nets’ depth-wise convolution. The results are then blended using an 11 convolution in terms of points. Convolution done in depth is called production. A modern convolution produces a fresh set of outputs by filtering and integrating inputs simultaneously. Using depthwise separable convolution, this is separated into two layers, one for filtering and one for mixing. With this factorization, compute time and model size are significantly reduced. 3.7 NASNetMobile Reinforcement learning is used to optimize the simple building pieces that make up the scalable CNN architecture. Convolutions and pooling are the only two operations that make up a cell and they are repeated numerous times depending on the desired capability of the network. Within this version of NASNetMobile [20], there are 12 cells, 5.3 million parameters, and 564 million multiply-accumulates (MACs). 3.8 Parameter Setting The training parameters for all convolutional neural networks are, learning rate η = e− 5, β1 = 0.9, β2 = 0.999, ε = e − 8, and decay rate is set to 1e−5 for adaptive moment estimation (Adam) optimizer. Activation function Softmax is used and sets a dropout rate of 0.5 to prevent the model from becoming overfitting. All models are trained over the duration of 10 epochs, with a batch size of 32.
4 Result and Discussion The consistency of the employed model is on the order of four. InceptionV3 outperforms the other three models in terms of accuracy, recall, precision, and FI scoring (88.21%, 0.88, 0.88, and 0.87 respectively). The VGG16 come in second with an 84.64% accuracy. Table 2 displays the confusion matrix results using four distinct evaluation criteria. The values on the main diagonal represent all instances that were correctly classified. The rate of accuracy achieved for each predicted class and misclass prediction is indicated in the row under each confusion matrix in Fig 3. As it can be observed, the highest misclassified value is claimed by NASNetMobile which is 78. And InceptionV3 has the lowest misclassified value of 33 (Table 3).
20
S. Islam et al. Table 2. Classification report of the four CNNs. Algorithms
Accuracy
Precision
Recall
F1-score
InceptionV3
88.21%
0.88
0.88
0.87
VGG16
84.64%
0.85
0.85
0.83
MobileNet
83.21%
0.86
0.81
0.82
NASNetMobile
72.14%
0.78
0.68
0.66
Fig. 3. Confusion matrix of four DCNN models. A) NASNetMobile, B) MobileNet, C) VGG16 and D) InceptionV3.
DCNN Based Disease Prediction of Lychee Tree
21
Table 3. Study comparison. Paper
Method
Best Accuracy
Worst Accuracy
Yamparala et al. [21]
Convolution Neural Network(CNN) based classification method
CNN gives the highest PNN gives the lowest accuracy of 90% accuracy 86%
Peng et al. [22]
Feature-extraction network model
Results show that the improved YOLOv3 Litchi model achieved better results The mean average precision (mAP) score was 97.07%
The YOLOv3_Tiny model has 94.48% mAP which is the lowest
Wang et al. [23]
YOLOv3-Litchi model
The average detection time of YOLOv3-Litchi is 29.44 ms
The test results show that the F1 of YOLOv3-Litchi is higher than that of Faster R-CNN algorithm 0.05
Miah et al. [15]
CNN
The InceptionV3 model has the highest accuracy, at 97.34%
NASNetMobile 75.29%
Our Study
CNN
The highest accuracy of 88.21%was achieved by the InceptionV3
NASNetMobile has given us the least accuracy of all
According to all of these metrics, the InceptionV3 model outperforms all others. This study showed that well-tuned deep learning algorithms perform better in terms of accuracy when compared to deep learning algorithms with automatically produced features for impacted leaf and stem diagnosis on images.
5 Conclusion This is very important in agriculture to find the difference between fresh and affected leaves and stems. For the classification of fresh and affected leaves and stems, we have taken four CNN models, along with VGG16, InceptionV3, MobileNet, and NASNetMobile. In this study, we take into account a wide variety of variations, including hyperparameter discrepancy, batch size, epoch number, optimizer, and learning rate. Used models’ findings show that they can esteem between fresh and affected leaves and stems. Therefore, the help of the methodology will automatize the human brain’s classifying operations, reducing human errors when classifying fresh and affected leaves and stems of the Lychee tree. The highest accuracy was 88.21% which we got from the InceptionV3 model. One caveat of our study is that our dataset is quite limited. Accuracy may
22
S. Islam et al.
decrease if there is much noise in the images. The capacity of this research work will enlarge in the coming future to embrace more leaf pictures that will be classified, giving every fruit farmer the opportunity to utilize the system. We will add more leaf images in the near future and focus on classifying more classes.
References 1. Hossain, M.S., Al-Hammadi, M., Muhammad, G.: Automatic fruit classification using deep learning for industrial applications. IEEE Trans. Industr. Inf. 15(2), 1027–1034 (2019) 2. Dandavate, R., Patodkar, V.: CNN and data augmentation based Fruit Classification Model. In: 2020 Fourth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC) (2020) 3. Arya, S., Singh, R.: A comparative study of CNN and Alexnet for detection of disease in potato and mango leaf. In: 2019 International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT) (2019) 4. Jayakumar, D., Elakkiya, A., Rajmohan, R., Ramkumar, M.O.: Automatic prediction and classification of diseases in melons using stacked RNN based deep learning model. In: 2020 International Conference on System, Computation, Automation and Networking (ICSCAN) (2020) 5. Lakshmanarao, A., Supriya, N., Arulmurugan, A.: Plant disease prediction using transfer learning techniques. In: 2022 Second International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT) (2022) 6. Qasrawi, R., Amro, M., Zaghal, R., Sawafteh, M., Polo, S.V.: Machine learning techniques for tomato plant diseases clustering, prediction and classification. In: 2021 International Conference on Promising Electronic Technologies (ICPET) (2021) 7. Beikmohammadi, A., Faez, K.: Leaf classification for plant recognition with deep transfer learning. In: 2018 4th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS) (2018) 8. Gosai, D., Kaka, B., Garg, D., Patel, R., Ganatra, A.: Plant disease detection and classification using machine learning algorithm. In: 2022 International Conference for Advancement in Technology (ICONAT) (2022) 9. Zhang, Y.-D., et al.: Image based fruit category classification by 13-layer deep convolutional neural network and data augmentation. Multimed. Tools Appl. 78(3), 3613–3632 (2017). https://doi.org/10.1007/s11042-017-5243-3 10. Altaheri, H., Alsulaiman, M., Muhammad, G.: Date fruit classification for robotic harvesting in a natural environment using Deep Learning. IEEE Access 7, 117115–117133 (2019) 11. Wang, C., Xiao, Z.: Lychee surface defect detection based on deep convolutional neural networks with GAN-based data augmentation. Agronomy 11(8), 1500 (2021) 12. Haque, M., Alim, M., Alam, M.: Litchi Leaf Disease Recognition by Using Image Processing (2021) 13. Mo, J., et al.: Deep learning-based instance segmentation method of litchi canopy from UAVacquired images (2021) 14. Islam, M., Ani, J.F., Rahman, A., Zaman, Z.: Fake hilsa fish detection using machine vision. In: Proceedings of International Joint Conference on Advances in Computational Intelligence, pp. 167–178. Springer, Singapore, (2021) 15. Miah, M.S., Tasnuva, T., Islam, M., Keya, M., Rahman, M.R., Hossain, S.A.: An advanced method of identification fresh and rotten fruits using different convolutional neural networks. In: 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT), pp. 1–7. IEEE, July 2021
DCNN Based Disease Prediction of Lychee Tree
23
16. Islam, M., Ria, N.J., Ani, J.F., Masum, A.K., Abujar, S., Hossain, S.A.: Deep Learning based classification system for recognizing local spinach. Lecture Notes in Networks and Systems, pp. 1–14 (2022) 17. Xia, X., Xu, C., Nan, B.: Inception-v3 for flower classification. In: 2017 2nd International Conference on Image, Vision and Computing (ICIVC), pp. 783–787. IEEE, June 2017 18. Swasono, D.I., Tjandrasa, H., Fathicah, C.: Classification of tobacco leaf pests using VGG16 transfer learning. In: 2019 12th International Conference on Information & Communication Technology and System (ICTS), pp. 176–181. IEEE, July 2019 19. Bi, C., Wang, J., Duan, Y., Fu, B., Kang, J. R., Shi, Y.: MobileNet based apple leaf diseases identification. Mobile Networks and Applications, 1–9 (2020) 20. Adedoja, A.O., Owolawi, P.A., Mapayi, T., Tu, C.: Intelligent mobile plant disease diagnostic system using NASNet-mobile deep learning. IAENG Int. J. Comput. Sci. 49(1) (2022) 21. Yamparala, R., Challa, R., Kantharao, V., Krishna, P.S.R.: Computerized classification of fruits using convolution neural network. In: 2020 7th International Conference on Smart Structures and Systems (ICSSS), 2020, pp. 1–4 (2020). https://doi.org/10.1109/ICSSS49621. 2020.9202305 22. Peng, H., et al.: Litchi detection in the field using an improved YOLOv3 model. Int. J. Agric. Biol. Eng. 15(2), 211–220 (2022) 23. Wang, H., et al.: YOLOv3-Litchi detection method of densely distributed litchi in large vision scenes. Mathematical Problems in Engineering (2021)
MEVSS: Modulo Encryption Based Visual Secret Sharing Scheme for Securing Visual Content Parul Saini(B) , Krishan Kumar, Shamal Kashid, and Alok Negi Computer Science and Engineering, National Institute of Technology, Uttarakhand, India {parulsaini.phd2020,kashid.shamalphd2021,aloknegi.phd2020}@nituk.ac.in, [email protected] Abstract. In recent years, there has been a rapid development in the digital world of image communication over unsecured networks. Many domains, including military, medical, government, and education, require transmitting sensitive data through images. Hence, it raises the primary concern of its secrecy with the advancement of technologies. Traditional methods such as Advanced Encryption Standard (AES) and Data Encryption Standard (DES) have been widely used for image security. However, key management is the main concern in protecting image data. This work uses Visual Secret Sharing Scheme (VSS) or Visual Cryptography (VC) to handle the above mentioned issue. Modulo encryption-based VSS scheme (MEVSS) is proposed to encrypt the secret image shared among n number of users. The secret image is recovered loss-less by XORing n shares followed by modulo decryption. Experimental results show that the proposed technique outperforms the exiting techniques Binary tree-based Secret Sharing Scheme and RSA-based VSS in Unified average changing intensity (UACI) and Number of Pixel Change Rate (NPCR), While, the proposed model outperforms the existing techniques in term of Pixel to Signal Noise Ratio (PSNR). Moreover, it is faster than traditional techniques as it works directly on color image without converting it into another form like binary or grayscale. Keywords: Image · Visual Secret Sharing Scheme Encryption · Security
1
· Cryptography ·
Introduction
Over the last decade, the web applications that access multimedia content such as audio, video, text, and graphics have significantly expanded. It is challenging to design fast and robust security systems to process high bandwidth information like images or video, which may experience loss or format conversion. Image information security is a critical assessment course in information security. Mystery images are disseminated online for political, social, military, corporate, and other purposes. As a result, image encryption’s primary goals are information c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 L. Troiano et al. (Eds.): ICDLAIR 2022, LNNS 670, pp. 24–35, 2023. https://doi.org/10.1007/978-3-031-30396-8_3
MEVSS
25
secrecy, respectability, validation, and non-disavowal. Therefore, images are frequently used to transmit information; they must be protected and should also have the potential to retain control over intellectual content after transmission. In the digital era, digital technology for sending and storing digital images has increased, and become a challenge to maintain image secrecy and reliability. Digital services necessitate secure digital image storage and transfer. Protection of digital images has become crucial and has gotten a lot of attention due to the rising internet usage in the digital world, especially during pandemic times [1,2]. An asymmetric cryptography system uses a pair of public and private keys, such as DES, AES, and other traditional methods. The fundamental issue with these conventional techniques is key management, which can cause key storage, distribution, and transmission problems, reducing confidentiality and reliability [1,3]. VC or (K, N) VSS scheme is an encryption approach that separates a secret image into many shares. These shares are then distributed to the appropriate parties. All members must stack their shares together at the receiver to obtain the secret image. VC or VSS techniques have the following advantages over traditional algorithms [3,4]. – There is no need to handle the keys. – There are no complicated algorithms required to recover a secret image. It merely needs to stack multiple shares together to restore the image. – The adversary cannot recover the secret alone without obtaining all the shares. – The generated shares have a noise-like look, which does not reveal anything about the secret image. VSS has a variety of uses because of its unique property, including protecting online transactions, digital watermarking, authentication, copyright protection for digital photos, steganography, electronic currency banking applications, and many others [5]. Therefore, the authors have been inspired to employ the VSS techniques for securing the image data. The proposed (n, n) MEVSS approach directly encrypts the secret color image using the modulo encryption technique rather than initially transforming it to a binary or grayscale form. n meaningless shares of the encrypted secret image are generated via XOR operation and dispersed among the n participants, where each participant receives one share without the other participants’ knowledge. In traditional techniques, all shares from n participants are stacked together to reveal the secret image in VSS. The proposed MEVSS method generates an encrypted variant instead of revealing the secret image. The secret image is revealed after applying the decryption process. The recovered secret image is lossless. The main contribution of this work follows: – The proposed technique is flexible to work with any number of participants where the minimum value of participants is two. – The proposed MEVSS algorithm employs modulo encryption on the color image without converting it into another form like binary or grayscale. Therefore, the proposed technique seems faster than the traditional techniques.
26
P. Saini et al.
– The proposed MEVSS uses modulo and XOR operations for encryption and diffusion of pixels in an image. Hence, its complexity is low compared to other VSS schemes that support pixel expansion. – The novel MEVSS approach reveals the encrypted image instead of the secret image. Further, The secret image is revealed after applying the decryption process. The recovered secret image is lossless. – The proposed technique outperforms the existing techniques Binary treebased Secret Sharing Scheme [3] and RSA (Rivest-Shamir-Adleman)-based VSS [6] in UACI and NPCR. While the proposed model outperforms the existing technique except RSA-based VSS [6] in terms of PSNR. The rest of the paper is organized as Sect. 2 discusses the literature review on visual secret sharing scheme methodologies. MEVSS proposed model with a detailed description of each step is given in Sect. 3. Section 4 presents the details of the performance analysis used in the experiments of this work. Section 5 discusses the results and performance of the proposed scheme. The work is concluded with future work in Sect. 6.
2
Related Work
Moni Naor and Adi Shamir introduced visual cryptography in 1995 at the Euro crypt conference. It decodes secret images without requiring cryptographic calculations and is safe and straightforward to apply. It’s a visual version of the k out of n secret sharing problem, in which a dealer gives each of the n users transparency. Any k of them can see the image by stacking their transparencies, but any k - 1 of them learns nothing about it [7]. A secret sharing system [3] has been presented that uses a complete binary tree structure and Boolean XOR to allow the sender to divide the secret into 2h shares, where h is the tree’s height. A man-in-the-middle attack is viable if the attacker knows how many shares the sender has produced. The attacker will be unable to decipher the secret with fewer than 2h shares. This secure approach requires the least time to share and recover a secret. An XOR-based (n, n) VCS XOR technique [8] presented based on pixel vectorization to achieve the goal. Three algorithms are developed: construction matrix creation, construction matrix division into basis matrices, and (n, n) VCS XOR visual secret sharing method. The technique also resolves pixel expansion, participant limit, and lossy retrieval of the secret image. Two visual cryptography algorithms [9] are proposed that can fully restore the secret color image. (n, n) two schemes represent threshold schemes to create n-1 meaningless shares and to generate both significant and meaningless shares. Color quick response (QR) codes with n-1 meaningful shares can be decoded by the generic decoder instead of the standard decoder. An image encryption confusion-diffusion approach [10] is presented where the image pixels are first disarranged, producing a scrambled image, which is then diffused by XORing its pixels with a secret key. This key is created by combining many distinct chaotic maps. Further, the Visual Secret Share (VSS)
MEVSS
27
scheme [6] proposed where each RGB pixel shares separately, and the RSA algorithm encrypts and decrypts the generated multiple secret image shares. The multiplication technique is also employed in the encryption process for key generation, the public key for encryption, and the private key for decryption. The results are more secure while sharing secret images is lossy. An (n, n) VSS method [11] based on Mini Program codes presented to achieve the threshold scheme to share a Mini Program code based on the embedding capability of the V-36 Mini Program code. It implements Mini Program’s multiple access control and obtains the secret Mini Program code by XORing all shares. Furthermore, the reconstructed Mini Program code can be successfully decoded as the same App as the secret code. A novel cheating immune visual cryptography scheme [12] proposed without pixel expansion called CIVCS where the authentication patterns are three adjacent and non-intersect concentric solid black rings. It produces n original shares using a Random Grid-based Visual Cryptography Scheme (RG-based VCS) and marks authentication patterns on the actual shares to obtain verified shares. The authentication patterns are exposed in the authentication phase by stacking any two verifiable shares in various ways, including rotating one of the two shares 90, 180, or 270◦ C counterclockwise. From the above literature, the authors decided to employ the VSS schemes, and the proposed technique is detailed in Sect. 3.
3
Proposed Algorithm
In this work, VC or VSS cryptography [1] is employed for securing images. In visual cryptography, a secret input image is divided into two noisy images, referred to as shares. The original secret image is recovered by superimposing or stacking the shares during decoding. We used the extended idea by Naor and Shamir to share the secret k out of n secrets ((k, n) scheme), where n is the total number of shares and k is the smallest number of shares required to recover the Secret. Figure 1 shows a general representation of a (k, n) VC scheme [1,13,14]. The threshold (k, n) Scheme for secret sharing is illustrated as follows: 1. Given a secret ‘S’. 2. Divide ‘S’ among n parties as S1, S2,. . . . . . . . . . . . .., Sn. Satisfy the following properties: 3. Greater than or equal to (k) parties can recover ‘S’. 4. Less than (k) parties have no information about ‘S’. 5. If k = n, then Every piece of the original secret S is required to reconstruct the Secret MEVSS follows (n, n) the VSS scheme where n is the total number of shares and n (in this case, k = n) is the minimum number of shares required to retrieve a secret. MEVSS is flexible to any number of participants while n is decided according to the need of the problem. In MEVSS, experiments are done with values of n equal to 2 and 4, which means the number of shares generated in MEVSS is 2 and 4. In this work, the modulo encryption scheme is applied to the
28
P. Saini et al.
Fig. 1. Visual Secret Sharing Scheme
Fig. 2. General architecture of MEVSS Scheme
image before generating the shares to increase the security of the secret image. The general architecture of the proposed scheme MEVSS is shown in Fig. 2. The proposed scheme MEVSS follows the following steps: Step 1: Input. The input of the proposed algorithm is an image where the Image could be a secret for applications in the military, media, medical, etc. where the security of the secret image is the primary concern. The proposed algorithm takes the color image [I] of “Lena” with dimensions 512 × 512 as input. Step 2: Modulo Encryption. In this step, a random value (RandomVal) is generated for both encryption and decryption processes. This value is used for
MEVSS
29
modulo operation on each image pixel on all three planes. The encryption technique whose main objective is to scramble the image as much as possible must be reversible for the reverse procedure performed during decryption to produce the original image. In this step secret image is encrypted using the modulo encryption method, which results in the Encrypted secret Image (EI). Step 3: Shares Generation. Shares of the image are generated by splitting the image into shadows as the main aim of VSS is to generate meaningless shares so that it does not reflect any information. In this step, (2,2) MEVSS and (4,4) MEVSS used an encrypted image generated from step 2 to produce shares by performing an XOR operation. Here, shares are generated after the encryption process to increase the security of the secret image. Step 4: Stacking shares. In the share reconstruction process, multiple shares are stacked together to get the original image. But in this case, when all n shares generated in the previous step are collected by n parties and stacked together, then only the encrypted image of the secret image is reconstructed, instead of the original image. Hence, the secret image is yet to be recovered. Step 5: Modulo Decryption. Decryption is a process of transforming encrypted data into its original form. In this step, the Reverse procedure of step 2 is performed as a decryption process using RandomVal. EI generated in the previous step is decrypted using the modulo decryption method on each pixel of the image EI on all three planes. Step 6: Output. After applying the decryption process in step 5, MEVSS gets the secret image [I] back. In this step, a lossless secret image is regenerated using the proposed MEVSS algorithm. The following two algorithms are applied in MEVSS for image encryption and shares generation, and image retrieval and image decryption: Algorithm 1 describes the image encryption and share generation steps. It takes the secret image I as input and n as the number of shares generated.
Algorithm 1 . Proposed MEVSS algorithm for image encryption and shares generation Input: Secret Image I Output: n Shares (n=2) 1. I[]← Image[row,column,depth] 2. Modulo Encryption on Input Image For each[row,column,depth] (a) RandomVal[row,column,depth]= random in range (0,255) (b) Encrypted Image (EI) = mod (Image + Encrypted,256) 3. Share Generation (a) Generate random EI1 [] (share 1) of size I[] (b) Construct EI2 [] (share2) = EI1 [] ˆ EI[]
30
P. Saini et al.
Before share generation, each pixel of the image I is encrypted with modulo encryption, in which each pixel is encrypted with a different random number of keys generated. An Encrypted Image is then used to construct n number of shares by performing an XOR operation. Algorithm 2 describes the steps for retrieving the original image by performing a decryption process. It takes the n shares and returns an XOR operation to retrieve the encrypted image. Then reverse modulo decryption retrieves the original image back as a secret image.
Algorithm 2 . Proposed MEVSS algorithm for image retrieval and image decryption Input: n shares (n=2) Output: original Secret image 1. Stacking shares Encrypted XOR Output image(EoutI)[] = EI1 [] (share 1) ˆ EI2 [] (share2) 2. Image Decryption Original Image[] =mod ((EoutI[] - RandomVal[] +256), 256)
4
Performance Analysis
The performance of the proposed MEVSS is analyzed using differential and statistical analysis. The differential analysis examines the relationship between the original and encrypted secret when a slight alteration is made. A statistical attack is an attack that takes advantage of statistical faults in a given algorithm. Statistical analysis is done to examine such type of attack. The description of the evaluation techniques is as follows: In Differential Analysis Unified average changing intensity (UACI) and a number of pixel change rate (NPCR) are used to examine the differential attacks by determining the relationship between the original and modified photos [15]. Unified Averaging Changing Intensity (UACI) calculates the average intensity of divergence between the encrypted and relevant plain image with a onepixel difference. It is mathematically defined as in Eq. (1). l,m B(l, m) − B (l, m) × 100 (1) U ACI = 255 × Wi × Hi where B(l, m) represents the encrypted image and B’(l, m) represents the modified image and the width and height of the images are represented by Wi and Hi , respectively. Number of Pixel Change Rate (NPCR) compares the pixel values of the original image and the encrypted image. The resultant value is returned in percentage. If the value is more than 99%, then the analysis is positive [15]. It is mathematically defined as in Eq. (2).
MEVSS
l,m
N P CR =
where B(l, m) =
B(l, m)
Wi × H i
× 100
31
(2)
0 if B(l, m) = B(l, m) 1 if B(l, m) = B (l, m)
The difference between the relevant pixels of the original and modified image is denoted by B(l,m). On NPCR, the range is [0,100]. In the encrypted file, the rate of NPCR must be close to 100. In the encryption process, UACI and NPCR levels must be maximized. In Statistical Analysis The encryption approaches are decrypted using statistical analysis. Correlation Analysis (CA) studies the adjacent pixel of the image, and the Peak Signal To Noise Ratio (PSNR) is used to estimate the quality of an image, to confirm the encryption approach’s robustness against statistical analysis. Correlation Coefficient (CC) is applied to detect the similarity among the related pixel of the original image, and the encrypted image [15]. The range of values is -1.0 to 1.0. If the calculated number is greater than the range then there is an error in the correlation measurement. CC among samples i and j, containing n values, is mathematically defined as in Eq. (3). n (ix − ¯i)(jx − ¯j) n (3) CC(i, j) = n x=1 ¯2 ¯2 x=1 (ix − i) x=1 (jx − j) where
n
¯i = 1 ix n x=1
Peak Signal To Noise Ratio (PSNR) in decibels is the ratio of the strength of the signal and the noise in the signal. Here signals refer to the images. It is used as a quality measurement of the image. The higher the PSNR, the higher the image quality. It depends on the MSE of the selected image. When there is less difference between the two images, the PSNR is high, and so is the image quality [15] and is mathematically defined as in Eq. (4). P SN R = 10Log10
where M SE =
M,N
R2 M SE
(4)
[I1 (m, n) − I2 (M, N )]2
M ×N MSE is the mean squared error, M, N is the size of image matrix I1 and I2 , and R is the maximum pixel value for the 8-bit image is 255.
32
5
P. Saini et al.
Results and Discussion
The secret image as the “Lena” image (RGB) of dimensions 512 × 512 is considered in the experiment. For a number of shares (n), two values for n (n = 2 and n = 4) are considered for the experiment. MEVSS with the Secret image “Lena” and its shares are shown in Fig. 3.
Fig. 3. Results of Proposed Scheme MEVSS (n=2)
Figure 3 shows that the input image is the secret image, as I[]. According to the algorithm, it is encrypted using the modulo operation on each input image pixel, and the outcome is the encrypted secret image as EI[]. Then VSS is applied using the XOR operation to create the n number of shares (share EI1 [], EI2 [],....EIn []) to be distributed among n parties. In the experiment, the number of shares n=2 is taken. Then to get the secret back, the n parties need all the n shares to stack them together. But this will give them the encrypted secret (EoutI[]), not the original secret. So, to recover the original secret I[], the encrypted secret image needs to be decrypted with a modulo-decryption operation. After applying the decryption, the lossless secret image returns as an output image.
MEVSS
33
Table 1 shows the result of different pairs of images with their PSNR values. Table 2 shows the correlation coefficient between the input image and shares, the encrypted image with its shares, and between the shares. Tables 3 and 4 show differential analysis using UACI and NPCR values among the shares. Its value is measured in percentage. Table 5 shows the Comparison of the proposed MEVSS algorithm with Binary tree-based Secret Sharing Scheme [3] and RSA-based VSS [6], where the best results are shown in bold. Table 1. Results with PSNR Value Images
PSNR (dB) (n=2) PSNR (dB) (n=4)
(Input, Share1)
27.8952
27.9011
(Input, Share2)
27.8960
27.8940
(Input, Share3)
-
27.8955
(Input, Share4)
-
27.8937
(Input, Encrypted)
08.6184
08.6113
(Encrypted, Share1)
07.7340
07.7436
(Encrypted, Share2)
07.7560
07.7353
(Encrypted, Share3)
-
07.3630
(Encrypted, Share4)
-
07.4260
(Encrypted, Encrypted XOR Output) 100.000
100.000
(Input, Recovered Decrypted Output)
100.000
100.000
Table 2. Results with Correlation Coefficient Value Images
Correlation (n=2) Correlation (n=4)
(Input, Share1)
−0.000151
−0.000250
(Input, Share2)
0.000160
−0.000016
(Input, Share3)
-
0.000005
(Input, Share4)
-
−0.000016
(Encrypted, Share1) 0.000115
−0.000230
(Encrypted, Share2) 0.000118
−0.000070
(Encrypted, Share3) -
0.000140
(Encrypted, Share4) -
−0.000020
(Share1,Share2)
0.000117
0.000380
(Share1,Share3)
-
0.000018
(Share1,Share4)
-
−0.000210
(Share2,Share3)
-
0.000019
(Share2,Share4)
-
−0.000220
(Share3,Share4)
-
−0.000219
34
P. Saini et al.
Table 3. Differential analysis using UACI and NPCR values among the shares (n=2) Shares
UACI
NPCR
(Share1, Share2) 33.5498 99.60428 Table 4. Differential analysis using UACI and NPCR values among the shares (n=4) Shares
UACI
NPCR
(Share1, Share2) 33.4113 99.6058 (Share1, Share3) 33.4630 99.5998 (Share1, Share4) 33.4670 99.5980 (Share2, Share3) 33.3960 99.6983 (Share2, Share4) 33.4889 99.6056 (Share3, Share4) 33.4222 99.6167 Table 5. Comparison between proposed MEVSS, Binary tree based Secret Sharing Scheme [3] and RSA based VSS [6] Performance metrics Binary tree based Secret Sharing Scheme [3]
RSA based VSS [6]
Proposed MEVSS
PSNR(dB)
51.1725
156.32
100.00
Correlation Coefficient
0.000825
-
−0.000073
UACI
25.006
13.88
33.4469
NPCR
50.030
69.44
99.604
The following observations are made from Tables 1, 2, 3, 4 and 5. – From Table 1, it is clear that the PSNR value of the input image and shares are low, indicating dissimilarity between the image and its shares. PSNR value of input and encrypted images is also very low, showing their vast dissimilarity. PSNR value is 100 for the secret input and recovered image, showing the lossless secret recovery in the proposed MEVSS. – From Tables 3 and 4, it is observed that the value of UACI is around 33%, and NPCR value is above 99% which shows that the proposed technique is fit for an excellent encryption technique. – Table 5 shows that the proposed technique outperforms the existing techniques Binary tree-based Secret Sharing Scheme [3] and RSA-based VSS [6] in UACI and NPCR. While the proposed model outperforms the existing technique except RSA-based VSS [6] in terms of PSNR.
MEVSS
6
35
Conclusion
The proposed work is based on modulo encryption scheme. The secret image is recovered lossless by XORing the n shares followed by modulo decryption. Performance measure analyzes the difference and similarity between the Secret and encrypted image, secret and shares, secret and recovered image. The experiment done shows the effective results on the secret image, which supports the feasibility of the proposed algorithm MEVSS. The proposed technique outperforms the exiting techniques. The Proposed algorithm can further be applied to multi secret images and steganography can also be used for generating meaningful shares. Acknowledgment. The authors would like to thank to the DST GoI for sponsoring this work under DST/ICPS/General/2018.
References 1. Sajitha, et al.: Review on various image encryption schemes. Materials Today: Proceedings (2022) 2. Devade, et al.: Image encryption technique for improvement of image security. Ambient Communications and Computer Systems, pp. 671–682 (2018) 3. Deshmukh, M., Nain, N., Ahmed, M.: Secret sharing scheme based on binary trees and Boolean operation. Knowl. Inf. Syst. 60(3), 1377–1396 (2019) 4. Mhala, N.C., Pais, A.R.: A secure visual secret sharing (VSS) scheme with CNNbased image enhancement for underwater images. Vis. Comput. 37(8), 2097–2111 (2021) 5. Ibrahim, D.R., Teh, J.S., Abdullah, R.: An overview of visual cryptography techniques. Multimed. Tools Appl. 80(21), 31927–31952 (2021) 6. Karolin, M., Meyyappan, T.: Authentic secret share creation techniques using visual cryptography with public key encryption. Multimed. Tools Appl. 80(21), 32023–32040 (2021) 7. Naor, M., Shamir, A.: Visual cryptography. Workshop on the Theory and Application of of Cryptographic Techniques, Springer, Heidelberg (1994) 8. Kannojia, et al.: XOR-based visual secret sharing scheme using pixel vectorization. Multimed. Tools Appl. 80(10), 14609–14635 (2021) 9. Pan, et al.: Visual cryptography scheme for secret color images with color QR codes. JVCIR 82, 103405 (2022) 10. Elkandoz, M.T., Alexan, W.: Image encryption based on a combination of multiple chaotic maps. Multimed. Tools Appl., 1–22 (2022) 11. Chen, J., et al.: Visual secret sharing scheme with (n, n) threshold based on WeChat Mini Program codes. J. Vis. Commun. Image Representation, 103409 (2022) 12. Zhao et al.: A cheating immune (k, n) visual cryptography scheme by using the rotation of shares. Multimed. Tools Appl., 1–23 (2022) 13. Weir et al.: A comprehensive study of visual cryptography. Transactions on data hiding and multimedia security V, pp. 70–105. Springer, Heidelberg (2010) 14. Kashid et al.: Approach of a multilevel secret sharing scheme for extracted text data. In: IEEE Students Conference on Engineering and Systems, pp. 1–5 (2022) 15. Mahendiran, N., Deepa, C.: A comprehensive analysis on image encryption and compression techniques with the assessment of performance evaluation metrics. SN Comput. Sci. 2(1), 1–12 (2021)
Analysis of Bangla Transformation of Sentences Using Machine Learning Rajesh Kumar Das, Samrina Sarkar Sammi, Khadijatul Kobra, Moshfiqur Rahman Ajmain(B) , Sharun Akter khushbu, and Sheak Rashed Haider Noori Daffodil International University, 1341 Dhaka, Bangladesh {rajesh15-13032,samrina15-12532,khadijaatul15-12319,moshfiqur15-14090, sharun.cse}@diu.edu.bd, [email protected]
Abstract. In many languages, various language processing tools have been developed. The work of the Bengali NLP is getting richer day by day. Sentence pattern recognition in Bangla is a subject of attention. Additionally, our motivation was to work on implementing this pattern recognition concept into user-friendly applications. So, we generated an approach where a sentence (sorol, jotil and jougik) can be correctly identified. Our model accepts a Bangla sentence as input, determines the sentence construction type, and outputs the sentence type. The most popular and well-known six supervised machine learning algorithms were used to classify three types of sentence formation: Sorol Bakko (simple sentence), Jotil Bakko (complex sentence) and Jougik Bakko(compound sentence). We trained and tested our dataset, which contains 2727 numbers of data from various sources. We analyzed our dataset and got accuracy, precision, recall, f1-score and confusion matrix. We get the highest accuracy with the decision tree classifier, which is 93.72%. Keywords: Formation of sentence · Bangla NLP · Supervised Machine learning · Decision Tree Classifier · Sorol Bakko · Jotil Bakko · Jougik Bakko
1
Introduction
Bangla is the world’s eighth most pronounced language, ranking eighth among all languages. Since Bangla is a language with a very rich morphological structure, it is very difficult to implement every guideline for word construction necessary to create the stemmer that can accurately mark out all the root words in Bangla. It is difficult. Bangladeshi people speak Bangla as their mother tongue and Indian as a second language. As a result, sentence identification in Bangla texts has become a difficult problem in modern times. There are many studies on sentence classification in English, which is very different from Bangla. About 261 million people speak Bengali worldwide. In Bangla, there are three types of sentences based on sentence structure, and those are Sorol Bakko (simple sentence), Jotil Bakko (complex sentence) and Jougik Bakko(compound sentence). c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 L. Troiano et al. (Eds.): ICDLAIR 2022, LNNS 670, pp. 36–52, 2023. https://doi.org/10.1007/978-3-031-30396-8_4
Analysis of Bangla Transformation of Sentences Using Machine Learning
37
Sorol Bakko contains only one verb and one or more subjects. In Jotil Bakko, two sentence blocks or clauses are joined together to form one sentence. One clause is independent, and that’s called the main clause, and the other is a subordinate clause, which is dependent. Jougik Bakko or Compound sentence is a combination of two or more sorol bakko or jotil bakko. In this paper, we will categorize sorol, jotil, and jougik bakko using machine learning algorithms. The use of this model will identify if a sentence is a Sorol, Jotil, or Jougik sentence. The text carries a lot of important information, but it becomes very difficult to extract that information manually from plenty of texts. Natural Language Processing and Machine Learning make this task quite easy. It automatically extracts a lot of important information, and it gets the job done in less time and at a lower cost. Businesses can, and still do, benefit greatly by building systems that can work and extract information from text automatically. A data set’s data is analyzed and classified using natural language processing, and the classification is carried out in NLP. Multi-level classification in addition to binary level classification, are two different types of text categorization. All throughout this paper, we will discuss multi-level text classification as our dataset contains three different sorts of text. The types are Sorol bakko, Jotil bakko, and Jougik bakko. Accurate data and the amount of data play a big role in building and training a model. If there is a lot of data, it will be easy to understand the formulation or structure of a model, and then it will give a more accurate result. So it is necessary to collect junk-free and proper data through the data cleaning process. We have used six machine learning algorithms Random Forest, Logistic Regression, Decision Tree Classifier, KNN, SVM, and SGD. From the model, the result we get is a unique output from our dataset. In this classification, we compared input data with training data. If the dataset and data were more accurate, then we would get more accuracy, so we cleaned the dataset. In this process, we train the model, and after that, we give input to the model to predict its label. Thus, it’ll compare the input with the training data and give the nearest result. We’ll know about this briefly in this paper.
2
Literature Review
Shetu et al. [1] developed an algorithm that can determine whether the sentence was written in sadhu basha or chorito basha. This has helped in identifying Guruchandali Dosh, a typical grammar mistake in written Bangla. Two types of data were collected. One of them was Sadhu Data, while the other one was Cholito Data. They collected data from daily newspapers and popular Bengali novels. Bijoy et al. [2] collected data from different kinds of sources such as Bangla Blog, Conversation and Story etc. To get the results, their workflow was data processing, tokenization using the Countervector, and data cleaning. They use various classification techniques after processing the data for the feeding model to anticipate the sentence and deliver very accurate decision-making.
38
R. K. Das et al.
Random Forest and XGBoost produce the highest accuracy of 96.39% in parallel in their approach to analyze large amounts of data. ˇ Candrli´ c et al. [3] purposed of developing a system model and methods for converting textual knowledge into relational databases was the focus of their work. This study used a one-way knowledge node approach, where the links connecting the nodes matter only when they move from one node to another. This work implements the technology developed and makes use of the research findings to turn natural language sentences into an enhanced and formalized record. According to Dhar et al. [4], different classifiers performed differently when it came to categorizing Bangla texts. Eight separate domains or text categories containing a total of 8000 Bangla text documents were gathered from various web news sources for the experiment. Two weighting systems based on word association and term aggregation were utilized as experimental feature extraction techniques. Over 1,000 product reviews and opinions were gathered by Shafin et al. [5]. Therefore, based on Bangla comments and reviews, their goal was to ascertain consumer opinions around products, particularly product assessments both favorable and negative. KNN, decision trees, support vector machines (SVM), random forests, and logistic regression were among the categorization methods they employed. With a maximum accuracy of 88.81%, SVM surpassed all other algorithms. Al-Radaideh et al. [6] evaluated and developed effective classification strategies to create rule-based classifiers for medical free texts written in Arabic using association rule mining techniques. In addition to examining the impact of rule pruning on the rule generation stage, this study will look at the impact of integrating categorization and association rules in the domain of Arabic medical texts. Bolaj et al. [7] proposed an effective ontology-based and supervised learning method for categorizing Marathi text. It automatically classifies and detects patterns in various document types by combining data mining, natural language processing (NLP), and machine learning techniques. The system accepts Marathi documents as input. Article preprocessing includes validation of the input, tokenization, elimination of stopwords, stemming, and morphological analysis. Dhar et al. [8] investigated methods to classify Bangla text documents from an open web corpus using both traditional features and machine learning techniques. Here, they combine a dimensionality reduction approach (40% of TF) with the TF-IDF feature to increase the accuracy of the overall lexical matching process and determine the domain categories or classes of text documents. Islam et al. [9] used three supervised learning strategies: Support Vector Machines (SVM), Naive Bayes (NB), and Stochastic Gradient Descent (SGD). These types of algorithms are used for comparing Bengali document categorization. In this research, they used two alternative feature selection strategies in an effort to examine the effectiveness of those three classification algorithms.
Analysis of Bangla Transformation of Sentences Using Machine Learning
39
Sen et al. [10] analyzed 75 BNLP research publications in detail and divided them into 11 categories, including word group, part-of-speech tagging, sentiment analysis, fraud and forgery detection, information extraction, machine translation, named entity identification, and question-answering systems. Language processing and recognition, word meaning clarification, and summarization. They discuss the drawbacks of BNLP and present and potential future trends while describing traditional machine learning and deep learning techniques utilizing a variety of datasets. Tuhin et al. [11] proposed two machine learning methods to extract sentiment from Bangla texts. The six different emotional classes were happiness, sadness, kindness, excitement, anger, and fear. The Topic Approach and the Na¨ıve Bayes Classification Algorithm used for topical techniques yielded the best results on both levels. They used a hand-compiled data corpus of over 7,500 sentences as a learning resource. Das et al. [12] presented an overall opinion mining system and an effective automated opinion polarity identification technique based on features for determining the polarity of phrases in documents. As a result of the evaluation, the accuracy was 70.04% and the recall was 63.02%. Hasan et al. [13] evaluated sentiment in Bangla texts. Contextual valence analysis is employed. They used SentiWordNet to prioritize each word and WorldNet to determine the part-of-speech meaning of each word. They determine the overall positivity, negativity, and neutrality of a sentence in relation to its overall meaning. Using valence analysis, they have developed a unique approach to determining emotions from Bangla texts. Uddin et al. [14] created a Long Short-Term Memory (LSTM) neural network which was used to analyze negative Bangla texts. They collected objectionable tweets in Bangla from Twitter, as well as other bad Bangla phrases collected through Google Forms. In this study, four hyperparameters of the LSTM model were tuned in three steps. Hassan et al. [15] collected data from YouTube, Facebook, Twitter, internet news sources, and product review websites. Their data set contained 9337 post samples, where 72% of the textual data was held for Bangla text, and the remaining 28% of the samples were for Romanized Bangla text. Three types of fully connected neural network layers were used. Their model was based on RNN and more specifically used the neural network LSTM. Sarker et al. [16] described an effort to develop a Bengali closed-ended appropriate question-and-answer system. They looked at various sources to collect data, such as crowdsourcing, social media, and manual generation. For both the questions and the papers, they had to gather information through various methods. They gathered questions from Shahjalal University of Science and Technology students (SUST) and the official website of the university. There they found a list of frequently asked questions by applicants. Monisha et al. [17] used machine learning-based techniques to categorize the questions. They have used four different types of machine learning algorithms, such as SVM, Naive Bayes, Decision Trees, and Stochastic Gradient Descent.
40
R. K. Das et al.
Moreover, stochastic gradient descent classifiers perform better on datasets. 25% of the dataset was utilized for testing, and 75% for training questions. The overall count of SUST-related questions in their question dataset is 15355 in Bengali. Khan et al. [18] implemented an effective QA system. This approach has 60% accuracy. In this study, they used WordNet to experiment with extracting the exact answer from the data set and reducing the complexity of replacing pronouns with the most appropriate nouns. They used over 50 sentence pairs as their dataset. Urmi et al. [19] provided a contextual similarity-based method, based on an N-gram language model, to identify stems or root forms in Bangla. They used a 6-gram model for their phylogeny identification process, which improved the accuracy of the corpus by 40.18%. About 1,593,398 sentences in the test corpus cover various topics such as news, sports, blogs, websites, business journals, and magazines. Ahmad et al. [20] described using Bengali word embeddings to address the problem of Bengali document classification. Using the K-means technique, they cluster the Bengali words based on their vector representations. They completed the classification task using the Support Vector Machine (SVM) and received an F1 score of about 91%. In three steps, their task was completed. Firstly, for each word in a corpus, create word embeddings. Secondly, reduce the vector dimension. Finally, make word-representing vector clusters. Rahaman et al. [21] presented a two-phase automatic hand sign and written Bangla sign recognition using a Bangla language modeling algorithm. They have developed an approach for modeling in the Bangla language that finds all hidden characters. In this experiment, the system recognizes words with an average accuracy of 93.50% in BdSL, compo digits with an average accuracy of 95.50%, and sentences with an average accuracy of 90.50%. Haque et al. [22] provided the “Subject-Verb Relational Algorithm” as the algorithm whose objective is to determine the sentence’s main verb’s validity from a semantic perspective. specific subjects when translating with a machine. With 600 sentences, they tested the subject-verb relationship algorithm. For 598 sentences, the system produced accurate results. During testing, the algorithm’s accuracy was 99.67%. Islam et al. [23] discussed how deep learning may be used to generate Bangla text. Special kinds of RNN (Recurrent Neural Network) and Long Short-term Memory are used in their study. They used LSTM in their article for the Bangla Text Generator. Online Prothom Alo’s website provided them with a corpus of 917 days’ worth of newspaper text. Abujar et al. [24] used an extraction method to summarize a text in Bangla. They have suggested that analytical models can be used to summarize Bangla text. For the purpose of summarizing Bangla texts, this paper introduced a novel approach to sentence grading. When their technique was evaluated, the system displayed good accuracy. Dhar et al. [25] described forth two hypotheses: (a) the word length of medical texts is longer than that of other texts when measured in terms of characters, and
Analysis of Bangla Transformation of Sentences Using Machine Learning
41
(b) the length of medical texts‘ sentences, when measured in terms of their word counts, is longer than that of other texts. They gathered data from a Bangla daily newspaper’s online corpus of Bangla news articles. The sample set for their investigation includes texts from the five textual categories of politics, business, sports, medicine, and legal. Hamid et al. [26] suggested a technique to distinguish transliterated Bangla sentences from interrogative statements in Bangla. Explore rule-based techniques, supervised learning approaches, and deep learning approaches to find solutions. With the use of machine learning methods including Support Vector Machine, k-Nearest Neighbors, Multilayer Perceptron, and Logistic Regression, they were able to attain accuracy levels of 91.43%, 75.98%, 92.11%, and 91.68%, respectively. Razzaghi et al. [27] described the parsing and classification of questions for FAQs using machine learning. With over 3000 questions on the Internet, we’ve compiled FAQs and non-FAQs about sports, foods, and computers (especially the Internet). They demonstrated the 80.3% accuracy of the SVM with Naive Bayes. Wang et al. [28] suggested a multilayer sub-neural network using a separate structural match for each mode exists. It is used to transform features in various modes into features in the same mode. SQuAD, Wiki Text, and NarrativeQA are three corpus databases from which experimental data was collected. Oliinyk et al. [29] detected successfully propaganda indicators in text data and aims to develop a machine learning model, preliminary data processing and feature extraction techniques, and binary classification tasks. The Grid Search cross-validation algorithm and the Logistic Regression model have been used to perform categorization. Alian et al. [30] investigated techniques for identifying paraphrases in English and Arabic texts and provide an overview of previous work that they have proposed. For recognizing paraphrases in English, better results were obtained using WordNet-based measures. Accuracy of deep learning by statistics function yields the highest accuracy. Mohammad et al. [31] used a number of text processing, feature extraction, and text categorization processes. For the Arabic language, lexical, syntactic, and semantic aspects are extracted to overcome the shortcomings and constraints of the available technologies. They crawled Twitter data using the Twitter Streaming API. More than 8000 tweets were gathered. Scikit-learn, a Python-based machine learning library, was used to create the SVR model. Lamba et al. [32] provided a survey of numerous plagiarism detection methods applied to various languages. They used NLP techniques. The effectiveness of plagiarism detection has been investigated using a wide variety of text preparation approaches. Ngoc et al. [33] discovered several straightforward features that let them conduct both operations on Twitter data with a level of efficiency that is highly competitive. It’s interesting to note that they also support the importance of
42
R. K. Das et al.
applying word alignment methods from machine translation assessment measures to the overall performance of these activities. 2.1
Comparison
Reference
Dataset Quantity Performance evaluation Lack of scope
Md. Hasan Imam Bijoy et al. [2] introduced an automated approach for Bangla sentence classification
Collected bangla data from different kinds of sources such as BanglaBlog, Conversation and Story etc.
Random Forest and XGBoost produce the highest accuracy of 96.39%.
Md. Musfique Anwar et al. [34] implements a technique using context -sensitive grammar rules with all types of Bangla sentences.
sentence is taken as input of the parsing system +Sentence
28 decomposition rules and 90% success rate in all cases
Pooja Bolaj et al. presents an [7] efficient Marathi text classification system using Supervised Learning Methods and Ontology based classification.
Set of Marathi text documents + Sentence
Na¨ıve Bayes (NB), Modified K-Nearest Neighbor (MKNN) and Support Vector Machine (SVM) algorithms used+ output is set of classified Marathi documents as per the class label.
Qasem A. Al-Radaideh et al. [6] proposed rulebased classifier for Arabic medical text
the ordered decision list strategy out performed other methods, with an accuracy rate of 90.6%.
Lenin Mehedy et al. [35] introduced an approach of bangla syntax analysis
Proposed all Bangla sentences, including complicated, compound, exclamatory, and optative ones, are accepted under these rules’ context-free use.
sentences composed of idioms and phrases are beyond the scope Of the paper, mixed sentence are out of discussion
Analysis of Bangla Transformation of Sentences Using Machine Learning Reference
Dataset Quantity Performance evaluation Lack of scope
Bidyut Das et al. [36] sentence proposed a novel system for generating simple sentences from complex and compound sentences
Modified Stanford Dependency (MSD) and Simple Sentence Generation (SSG) Algorithm+ as per judgment of five human linguistic experts the system’s accuracy is 91.102%
Parijat Prashun Purohit et al. [37] proposes a semantic analyzer that can semantically parse the Bangla sentences.
Complex sentence (word length 5) give 100% accuracy and Compound sentence (word length 5) give 100% accuracy (accuracy varies with word length)
1120 sentence + sentence
K. M. Azharul Hasan et al. [38] described the detection of Semantic Errors from Simple Bangla Sentences
Ayesha Khatun et al. [39] proposed Statistical Parsing of Bangla Sentences by CYK Algorithm
43
The system sometime generates incomplete sentences
The approach is for simple sentences of the form SOV, although the classification of nouns and verbs may be applied for various types of Bangla phrases, including complicated and compound sentences as well as sentences with numerous verbs. Collected 2025 different kinds of sentences and word lengths from different bangla sites
The average accuracy of a simple sentence is 92.75%, but the average accuracy of a complex and compound sentence is 83.75% and 76.66%, respectively. Accuracy depended on the length of word and the number of parsed sentences.
Parijat Prashun 2120 sentences Purohit et al. [40] of various propose a framework word length for the semantic analyzer that can parse the Bangla sentence semantically
Simple, complex & compound sentence(word length 5) give 98.67%, 100% & 100% accuracy respectively (accuracy varies with word length)
increase the system’s efficiency by expanding the probabilistic context -free grammar’s existing vocabulary and grammar.
44
R. K. Das et al.
3
Methodology
The measures required to implement the proposed “Simple, Complex, And Compound Sentence Detection Using Machine Learning for the Bangla Language” are listed below. 3.1
Dataset Preparation
There are two key stages to the dataset preparation process. – Data Collection – Preprocessing of Data 3.2
Data Collection
Our dataset gathered information from several blogs, Facebook, and the SSC (Secondary School Certificate) Bangla 2nd Paper book. We gathered about 2727 data from the above mentioned sources and the dataset was one we had built in. Three separate classes-Simple, Complex, and Compound-are included in our dataset. Our dataset contains 2 columns- the first column contains sentences that we collected from different resources and the last one contains types of sentences (Fig. 1).
Fig. 1. Simple, complex, and compound form data in Bangla.
3.3
Preprocessing of Data
Data Preprocessing is the process of transforming unstructured and raw data into useful sets of information so that analysis using data mining can operate as anticipated. Preprocessing of data is a necessary step since even after data collection because certain mistakes might still exist. A specific technique can be used to address these issues. The accomplishment of a data analysis project is closely correlated with how well or poorly data preprocessing was done.
Analysis of Bangla Transformation of Sentences Using Machine Learning
45
To produce clean data, procedures such as cleaning duplicate data, removing punctuation, label encoding, dataset separation, and tokenization must be carried out. Because punctuation is often used in sentences, removing it gives us a clean dataset. When label encoding, the text is first labeled, followed by an instantaneous label, and then processing the outcome using a large dataset. Model comprehension requires the use of LabelEncoder by the Python scikitlearn library. Categorization is a feature of the Label-Encoder, which transfers data into columns and reloads encoded text data all at once. Tokenization of simple, complex, and compound forms is required for classifiers in a Bengali dataset. After that, the text has been split, and to prepare the text for classifiers preprocessing is carried out using tokenization (Fig. 2).
Fig. 2. Preprocessing method The totals for all categories of clean data are displayed below.
We obtained 2703 data after cleaning the dataset.
Type of Sentence Cleaned Sentence Simple
904
Complex
902
Compound
897
46
3.4
R. K. Das et al.
Data Vectorization or Distribution
A great usefulness provided by the Python scikit-learn library is CountVectorizer. It’s utilized to change a particular sentence into a vector depending on how frequently (count) each word appears across the whole text. The size of the ngrams we want to utilize will be specified by the ngram range parameter, thus 1, 1 would give us unigrams (n-grams made up of only one word), while 1–3 would give us n-grams made up of one to three words. – Unigram: We pass the value of n = 1 to the n-grams function to produce unigram or 1-grams and also calculate the word frequency of the words. – Bigram: We pass the value of n = 2 to the n-grams function to produce bigrams or 2-grams and also calculate the word frequency of the words. – Tigram: We pass the value of n = 3 to the n-grams function to produce trigrams or 3-grams and also calculate the word frequency of the words (Fig. 3).
Fig. 3. Example of n-gram distribution
Here, we have determined the value for the quantity of simple, complex, and compound datasets’ documents, words, and unique words and visualized them with the Fig. 4:
Fig. 4. Statistics between class name within sentence
Analysis of Bangla Transformation of Sentences Using Machine Learning
4 4.1
47
Model and Performance Proposed Model
A range of supervised and unsupervised models were available thanks to machine learning. Using six of the most pertinent classification techniques in ML supervised model, we categorized Bengali texts as simple, complex, and compound. There are several types of machine learning algorithms, including support vector machine, Logistic Regression, Decision Tree, Random Forest, KNN (K-nearest neighbor) and SGD (Stochastic Gradient Descent) (Fig. 5).
Fig. 5. Working form input
4.2
Model Performance
Based on our results in the classification of Bengal data text, we give a short statement below. Logistic Regression Classifier: Logistic regression is also a supervised algorithm used to categorize dependent variables. LR is used to express the connection between dependent and independent variables. We get 84.84% accuracy with our dataset using the LR algorithm. Decision Tree Classifier: For solving classification problem Decision tree classifier is mostly used and also for regression problems. Decision tree organizes a series of roots in a tree structure. From all other algorithms, the decision tree predicts too much accurately and the score is 93.72%. Random Forest Classifier: A classification system made up of several decision trees is called the random forest classifier, which deals with high-dimensional data. Since we are just utilizing a portion of the input in our model, we can easily accommodate hundreds of characteristics, which makes training our model faster than training decision trees. We get a good accuracy of 91.68%.
48
R. K. Das et al.
KNN Classifier: K-nearest neighbor algorithm is used for regression and also for machine learning in text processing. It classifies data by assuming the similarity between new data and available data. Include new information in the category that is much more similar to the previous existing category. Through our KNN model, we get an accuracy of 73.94%. SVM Classifier: Support Vector Machine is a powerful algorithm because it does not require much training data to start giving accurate results. SVMs perform the classification test by drawing a hyperplane that is a line in 2D or a plane in 3D in such a way that the categories can be differentiated by that line. Using SVM, our dataset gets 85.95% accuracy. SGD Classifier: The optimization procedure of stochastic gradient descent is frequently used in machine learning applications to identify the model parameters that best match the expected and actual outputs. It is an imprecise method. Through this method, we get 87.06% accuracy. Here, we demonstrate our noteworthy results for Table 1 with accuracy, precision, recall, and f1-score (Fig. 6). Table 1. Significant results for each classifier Model Name
Accuracy Precision Recall F1-score
Logistic regression 84.84%
84.84
84.84
84.84
Decision Tree
93.72%
93.72
93.72
93.72
Random Forest
91.68%
91.68
91.68
91.68
KNN
73.94%
73.94
73.94
73.94
SMV
85.95%
85.95
85.95
85.95
SGD
87.06%
87.06
87.06
87.06
Fig. 6. Confusion Matrix on prediction
Analysis of Bangla Transformation of Sentences Using Machine Learning
5
49
Result and Discussion
We implemented numerous classification methods, including KNN, decision trees, SGD, support vector machine (SVM), random forests, and logistic regression in machine learning models using the simple, complex and compound form, and the results are quantifiable. All other algorithms were surpassed by the decision tree, which had an accuracy of up to 93.72 There are many work related to simple, complex and compound sentence. Many of them similar to ours are of different languages. Some of the Bangla language works are available related to this but not exactly same as our work. They worked in syntex analysis, simple to complex and compound, generate bangla simple, complex, compound to english simple, complex and compound etc. But our work is the first sentence classification (simple, complex, compound) work in Bangla Language (Fig. 7).
Fig. 7. Accuracy bar graph for classification of Bengali text
6
Conclusion
There are some analogous works in Bangla, although they differ somewhat from our work. However, we are the first to classify sentences in Bangla as simple, complex, or compound using Bangla NLP. This study can be viewed as the first significant step in novel extensibility initiatives. In this paper, we suggest a model development classification approach based on Decision Tree, Random Forest, Logistic Regression, KNN, SVM, and SVG using a simple, complex and compound dataset with over two thousand data points. Data were compared with training data and input data. Experimental findings demonstrate the proposed strategy’s effectiveness and productivity. While doing this work, we came across a lot of suggestions that may be considered for further development of the suggested system. Utilizing these techniques, association rules are created
50
R. K. Das et al.
by using more precise stemming algorithms. Different pruning techniques can be used to affect classification accuracy. In addition, we will work on converting these ideas into user-friendly programs that are called applications. We will expand our dataset to achieve more accuracy. In the future, we also intend to apply Deep Learning techniques to enhance system performance.
References 1. Shetu, S.F., et al.: Identifying the writing style of Bangla language using natural language processing. In: 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pp. 1–6. IEEE (2020) 2. Bijoy, M.H.I., Hasan, M., Tusher, A.N., Rahman, M.M., Mia, M.J., Rabbani, M.: An automated approach for Bangla sentence classification using supervised algorithms. In: 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT), pp. 1–6. IEEE, July 2021 ˇ 3. Candrli´ c, S., Kati´c, M.A., Pavli´c, M.: A system for transformation of sentences from the enriched formalized Node of Knowledge record into relational database. Expert Syst. Appl. 115, 442–464 (2019) 4. Dhar, A., Mukherjee, H., Dash, N.S., Roy, K.: Performance of classifiers in Bangla text categorization. In: 2018 International Conference on Innovations in Science, Engineering and Technology (ICISET), pp. 168–173. IEEE, October 2018 5. Shafin, M.A., Hasan, M.M., Alam, M.R., Mithu, M.A., Nur, A.U., Faruk, M.O.: Product review sentiment analysis by using NLP and machine learning in Bangla language. In: 2020 23rd International Conference on Computer and Information Technology (ICCIT), pp. 1–5. IEEE, December 2020 6. Al-Radaideh, Q.A., Al-Khateeb, S.S.: An associative rule-based classifier for Arabic medical text. Int. J. Knowl. Eng. Data Mining 3(3–4), 255–273 (2015) 7. Bolaj, P., Govilkar, S.: Text classification for Marathi documents using supervised learning methods. Int. J. Comput. Appl. 155(8), 0975–8887 (2016) 8. Dhar, A., Dash, N.S. and Roy, K.: Application of tf-idf feature for categorizing documents of online Bangla web text corpus. In: Intelligent Engineering Informatics, pp. 51–59. Springer, Singapore (2018) 9. Islam, M., Jubayer, F.E.M., Ahmed, S.I.: A comparative study on different types of approaches to Bengali document categorization (2017). arXiv preprint arXiv:1701.08694 10. Sen, O., et al.: Bangla Natural Language Processing: A Comprehensive Analysis of Classical, Machine Learning, and Deep Learning Based Methods. IEEE Access (2022) 11. Tuhin, R.A., Paul, B.K., Nawrine, F., Akter, M., Das, A.K.: An automated system of sentiment analysis from Bangla text using supervised learning techniques. In: 2019 IEEE 4th International Conference on Computer and Communication Systems (ICCCS), pp. 360–364. IEEE, February 2019 12. Das, A., Bandyopadhyay, S.: Phrase-level polarity identification for Bangla. Int. J. Comput. Linguistics Appl. 1(1–2), 169–182 (2010) 13. Hasan, K.A., Rahman, M.: Sentiment detection from Bangla text using contextual valency analysis. In: 2014 17th International Conference on Computer and Information Technology (ICCIT), pp. 292–295. IEEE, December 2014
Analysis of Bangla Transformation of Sentences Using Machine Learning
51
14. Uddin, A.H., Dam, S.K. and Arif, A.S.M.: Extracting severe negative sentence pattern from Bangla data via long short-term memory neural network. In: 2019 4th International Conference on Electrical Information and Communication Technology (EICT), pp. 1–6. IEEE, December 2019 15. Hassan, A., Amin, M.R., Al Azad, A.K., Mohammed, N.: Sentiment analysis on bangla and romanized bangla text using deep recurrent models. In: 2016 International Workshop on Computational Intelligence (IWCI), pp. 51–56. IEEE, December 2016 16. Sarker, S., Monisha, S.T.A., Nahid, M.M.H.: Bengali question answering system for factoid questions: a statistical approach. In 2019 International Conference on Bangla Speech and Language Processing (ICBSLP), pp. 1–5. IEEE, September 2019 17. Monisha, S.T.A., Sarker, S., Nahid, M.M.H.: Classification of bengali questions towards a factoid question answering system. In: 2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT), pp. 1– 5. IEEE, May 2019 18. Khan, S., Kubra, K.T., Nahid, M.M.H.: Improving answer extraction for bangali q/a system using anaphora-cataphora resolution. In 2018 International Conference on Innovation in Engineering and Technology (ICIET), pp. 1–6. IEEE, December 2018 19. Urmi, T.T., Jammy, J.J., Ismail, S.: A corpus based unsupervised Bangla word stemming using N-gram language model. In: 2016 5th International Conference on Informatics, Electronics and Vision (ICIEV), pp. 824–828. IEEE, May 2016 20. Ahmad, A., Amin, M.R.: Bengali word embeddings and it’s application in solving document classification problem. In: 2016 19th International Conference on Computer and Information Technology (ICCIT), pp. 425–430. IEEE, December 2016 21. Rahaman, M.A., Jasim, M., Ali, M., Hasanuzzaman, M.: Bangla language modeling algorithm for automatic recognition of hand-sign-spelled Bangla sign language. Front. Comp. Sci. 14(3), 1–20 (2020) 22. Haque, M., Huda, M.N.: Relation between subject and verb in Bangla Language: a semantic analysis. In: 2016 5th International Conference on Informatics, Electronics and Vision (ICIEV), pp. 41–44. IEEE, May 2016 23. Islam, M.S., Mousumi, S.S.S., Abujar, S., Hossain, S.A.: Sequence-to-sequence Bangla sentence generation with LSTM recurrent neural networks. Procedia Comput. Sci. 152, 51–58 (2019) 24. Abujar, S., Hasan, M., Shahin, M.S.I., Hossain, S.A.: A heuristic approach of text summarization for Bengali documentation. In: 2017 8th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pp. 1–8. IEEE, July 2017 25. Dhar, A., Dash, N.S. and Roy, K., Weighing Word Length and Sentence Length as Parameters for Subject Area Identification in Bangla Text Documents 26. Hamid, M.M., Alam, T., Ismail, S., Rabbi, M.: Bangla interrogative sentence identification from transliterated Bangla sentences. In: 2018 International Conference on Bangla Speech and Language Processing (ICBSLP), pp. 1–6. IEEE, September 2018 27. Razzaghi, F., Minaee, H., Ghorbani, A.A.: Context free frequently asked questions detection using machine learning techniques. In: 2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI), pp. 558–561. IEEE, October 2016 28. Wang, D., Su, J., Yu, H.: Feature extraction and analysis of natural language processing for deep learning English language. IEEE Access 8, 46335–46345 (2020)
52
R. K. Das et al.
29. Oliinyk, V.A., Vysotska, V., Burov, Y., Mykich, K., Fernandes, V.B.: Propaganda detection in text data based on NLP and machine learning. In: MoMLeT+ DS, pp. 132–144 (2020) 30. Alian, M., Awajan, A.: Paraphrasing identification techniques in English and Arabic texts. In: 2020 11th International Conference on Information and Communication Systems (ICICS), pp. 155–160. IEEE, April 2020 31. Mohammad, A.S., Jaradat, Z., Mahmoud, A.A., Jararweh, Y.: Paraphrase identification and semantic text similarity analysis in Arabic news tweets using lexical, syntactic, and semantic features. Inf. Process. Manage. 53(3), 640–652 (2017) 32. Lamba, H., Govilkar, S.: A survey on plagiarism detection techniques for Indian regional languages. Int. J. Comput. Appl. 975, 8887 (2017) 33. Ngoc Phuoc An, V., Magnolini, S. and Popescu, O.: Paraphrase identification and semantic similarity in twitter with simple features (2015) 34. Anwar, M.M., Anwar, M.Z., Bhuiyan, M.A.A.: Syntax analysis and machine translation of Bangla sentences. Int. J. Comput. Sci. Network Secur. 9(8), 317–326 (2009) 35. Mehedy, L., Arifin, N., Kaykobad, M.: Bangla syntax analysis: a comprehensive approach. In: Proceedings of International Conference on Computer and Information Technology (ICCIT), Dhaka, Bangladesh, pp. 287–293 (2003) 36. Das, B., Majumder, M., Phadikar, S.: A novel system for generating simple sentences from complex and compound sentences. Int. J. Modern Educ. Comput. Sci. 11(1), 57 (2018) 37. Purohit, P.P., Hoque, M.M., Hassan, M.K.: Feature based semantic analyzer for parsing Bangla complex and compound sentences. In: The 8th International Conference on Software, Knowledge, Information Management and Applications (SKIMA 2014), pp. 1–7. IEEE, December 2014 38. Hasan, K.A., Hozaifa, M., Dutta, S.: Detection of semantic errors from simple Bangla sentences. In: 2014 17th International Conference on Computer and Information Technology (ICCIT), pp. 296–299. IEEE, December 2014 39. Khatun, A., Hoque, M.M.: Statistical parsing of Bangla sentences by CYK algorithm. In: 2017 International Conference on Electrical, Computer and Communication Engineering (ECCE), pp. 655–661. IEEE, February 2017 40. Purohit, P.P., Hoque, M.M., Hassan, M.K.: An empirical framework for semantic analysis of Bangla sentences. In: 2014 9th International Forum on Strategic Technology (IFOST), pp. 34–39. IEEE, October 2014
Towards a Novel Machine Learning and Hybrid Questionnaire Based Approach for Early Autism Detection Sid Ahmed Hadri1(B)
and Abdelkrim Bouramoul2
1
2
LICUS Laboratory, University of 20 August 1955, Skikda, Algeria [email protected] MISC Laboratory, University of Abdelhamid Mehri, Constantine, Algeria [email protected]
Abstract. An early detection of the autism spectrum disorder is an element in favor of a better management of the autistic child and allows to expect better results in terms of independence and acquisition of social skills. It is therefore crucial to identify autism as early as possible. Usually, traditional techniques like questionnaires are used for autism screening. These methods relies mainly on the expertise and empirical knowledge of psychiatrists and are known to exaggerate results, leading to a high false positive rate. In this paper we address this problem by proposing a novel screening method for autism based on combining machine learning and questionnaire hybridization techniques in order to improve the screening accuracy. After testing several machine learning models including SVM and random forest trees on a locally collected dataset, an accuracy of 97.5% has been achieved. These results are promising and incite to deepen and concretize this study as well as its generalization on other neurodevelopmental disorders. Keywords: Autism Screening · Machine Learning Disorder · Child Psychiatry · Autism Detection
1
· Neurological
Introduction
Autism Spectrum Disorder (ASD) or also known as Autism is one of a group of neurodevelopmental disorders characterised by difficulties in communication and social interaction. [1]. According to the DSM-5 [2], autism manifests itself from the first months of a person’s life and is characterized by a group of symptoms such as difficulty in creating bonds or social interactions, difficulties in communication, Repetitive and stereotyped behaviours, very restricted interests and deficiency in sensory processing. According to the latest version of the International Classification of Diseases (ICD-10) [3], published by the World Health Organization (WHO), autism belongs to the group of pervasive developmental disorders (PDDs). About 1 in 54 children have autism [4]. ASD does not discriminate between races, ethnicities, or socioeconomic levels. Boys are 4 times c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 L. Troiano et al. (Eds.): ICDLAIR 2022, LNNS 670, pp. 53–61, 2023. https://doi.org/10.1007/978-3-031-30396-8_5
54
S. A. Hadri and A. Bouramoul
more likely to have autism compared to girls [4]. To date, there is no treatment for autism, but autistic children follow programs adapted to their needs in dedicated institutions. Children with autism need to be taken care of to help them acquire social skills and an adequate level of independence in order to achieve a better quality of life in general and to integrate into society. In order for these children to get the most out of these programs, it is important to screen them as early as possible. Autism screening is an important phase in making a diagnosis, its aim is to get an idea of the child’s condition, and to be able to decide whether the child is at risk, i.e. the child is likely to be autistic and therefore requires further investigation by a psychiatrist. Or that the child is normal and will be screened again at the next medical visit (Fig. 1 allows us to schematise this).
Fig. 1. Autism Screening Protocol
Screening is usually done by means of questionnaires [5,6]. These questionnaires contain, in addition to a section covering the child’s personal information, questions that cover different aspects such as communication, social orientation,
Smart Autism Screening
55
sensory processing, emotional reactivity and involvement, etc. There are more than 45 scientifically validated questionnaires used in different parts of the world. The choice of using a questionnaire in a region depends on several factors, such as the availability of the questionnaire, its cost, its conditions of use, the financial income of the country or region and the cultural and social appropriateness of the questionnaire to the latter, etc. [7]. Some questionnaires such as the Modified Checklist for Autism in Toddlers (M-CHAT-R/F) [8] or the Quantitative Checklist for Autism in Toddlers (Q-CHAT) [9] are used and quoted much more frequently than others. Our paper is structured as follows: In Sect. 2 We present a brief overview of recent studies that have implemented machine learning techniques for autism screening. In Sect. 3, we propose a novel approach based on tool hybridisation and machine learning. In the next two sections, we implement our approach in the form of an experimental protocol and discuss the results obtained. Finally, we conclude our paper with a conclusion and perspectives.
2
Related Works
In this section we review various studies that have been conducted to explicitly or implicitly use and test the performance of machine learning techniques in the screening and diagnosis of autistic disorder in children and infants. In the study of [10], machine learning was used to screen children for autistic disorder, using data collected via two questionnaires: the Diagnostic InterviewRevised (ADI-R) and the Social Responsiveness Scale (SRS). The researchers’ dataset consisted of 1264 children diagnosed with autism and 462 healthy children. This population was divided into two different age groups (under 10 years and over 10 years). Their learning model achieved an autism sensitivity score of 89.2% for children under 10 years and 86.7% for children over 10 years. And a specificity of 59% and 53.4% respectively using 5 behavioral categories. In [11] A machine learning approach was used and tested on the 4-module diagnostic tool: Diagnostic Observation Schedule-Generic (ADOS). The aim of this study was to shorten the size of the tool to a single module consisting of 8 items and to prove that it is fully sufficient to give a 100% valid screening for autism disorder. The approach was validated on two groups of 446 individuals and the results obtained had an accuracy close to 100%. In the study conducted by Buyukoflaz et al. [12], different machine learning techniques were tested and compared. These included random forests, Naive Bayes and Radial Basis Function Network. These tests were carried out on the UCI 2017 public dataset containing records of autistic children. The aim of the study is to develop an efficient application for the general public that can be used to screen for autistic disorder in young children. The models were very efficient, notably random forests with an accuracy of 100% according to the authors. In the study of Shazamiri [13], an innovative tool called AutismAI has been developed to screen for ASD in different ages using the AQ 10 public dataset, the tool is a publicly available mobile application that achieves 97% accuracy in ASD detection.
56
3
S. A. Hadri and A. Bouramoul
Hybridisation Approach
In the related works section, we note that the machine learning models were implemented and tested on datasets collected from a single screening questionnaire. Further investigation based on the study of [7] shows that autism screening questionnaires are numerous and fundamentally different, and their use differs according to the socio-cultural context. Thus, there is no sufficiently generic questionnaire that would allow universal screening for autism. Adding a layer of machine learning could certainly improve the performance of a questionnaire, but unfortunately will not make it a universal screening tool. Therefore, our approach is based on the implementation of machine learning models built and trained on data issued from combined questionnaires. In this context, we thought of building a reasoning model that would allow us to select a set of instruments and extract a hybrid one according to criteria inspired by the exhaustive review of Marlow et al. [7] at the global and singular level. The selection model built is shown in Fig. 2.
Fig. 2. Selection Criteria and Sub-criteria
Smart Autism Screening
4 4.1
57
Experimental Setup Hybrid Questionnaire
By applying our selection model, we were able to extract 10 questionnaires that we considered to be the most interesting and that met the criteria previously stated (as shown in Table 1). Table 1. Selected Questionnaires Questionnaire
Year Age Window
Autism Spectrum Quotient Child Version (ASQ) [14] 2012 4-11 years
Number Of Items 50 items
The Childhood Asperger Syndrom Test (CAST) [15]
2007 4-11 years
37 items
Checklist of Early Signs of Developmental Disorders (CESDD) [16]
2010 3-36 months
12 items
Early Screening of Autistic Traits (ESAT) [17]
2006 14-15 months 14 items
Parent’s Observation of Social interaction (POSI) [18]
2013 16-30 months 7 items
Quantitative Checklist for Autism in Toddlers (QCHAT) [19]
2008 18-24 months 25 items
Screen For Social Interaction-Toddlers (SSI-T) [20]
2011 24-42 months 21 items
Pictorial Autism Assessment Schedule (PAAS) [21]
2017 18-48 months 21 items
First Year Inventory (FYI) [22]
2013 12 months
Modified CHecklist for Autism (MCHAT-R/F) [8]
2014 16–30 months 20 items
63 items
We then proceeded to merge these questionnaires, taking into account the deletion of redundancies, the generalisation of related questions as well as the omission of specific cultural factors. This resulted in a hybrid questionnaire consisting of 51 items using the Likert scale [23] distributed over 8 categories as shown in Table 2. 4.2
Data Collection
A data collection process has been launched locally in several dedicated institutions including hospitals, autistic children specialized centers, independent practicioneers cabinets and associations etc. The aim of this data collection is to generate two types of records: – Already diagnosed and confirmed children with autism. – Healthy confirmed children. We were able to collect 264 records (124 autistic patients, 140 healthy patients) brought from 5 different Areas distributed on the north of Algeria as shown in Fig. 3.
58
S. A. Hadri and A. Bouramoul Table 2. Hybrid Questionaire Overview
Category
Exemple Question
Social Orientation and Receptive Communication (9 items)
Does your child seem to be interested in other children of his/her age?
Expressive Communication (7 items)
Is your child trying to get your attention to show you something interesting?
Social and Affective Engagement (10 items)
Does your child reciprocate your gestures of affection?
Repetitive Behaviours (9 items)
Does your child rock his or her body over and over again?
Reactivity (2 items)
It is difficult to calm your child down when they are angry
Imitation (4 items)
Does your child imitate your facial gestures (facial expressions)
Sensory Processing (6 items)
Does your child seem disturbed by loud sounds?
Others (4 items)
Does your child have a blank stare (a neutral face)?
Fig. 3. Collected data distribution
4.3
Models Testing and Results
We implemented, tested and compared several machine learning algorithms including Support Vector Machine (SVM), Random Forest, K-means, and logistic regression. The dataset was split into 2 parts (70% for training and 30% for testing). We proceeded with 10 executions per model on a machine with the following specifications: CPU Rayzen 5 3500U, a GPU GEFORCE GTX1060ti and 16GB of RAM. Results obtained are presented in Table 3.
Smart Autism Screening
59
Table 3. Models Accuracy Per Execution Execution
1
2
3
4
5
6
7
8
9
10
SVM
85.2 89.7 97.5 97.5 92.0 86.1 85.2 83.6 90.8 90.8
Random Forest
80.5 80.5 82.4 80.5 82.4 86.1 80.3 79.3 80.5 79.3
K-MEANS
78.6 89.1 89.1 78.2 88.5 88.5 78.2 89.1 86.7 78.4
Logistic Regression 92.6 92.2 93.8 92.3 89.7 87.6 89.8 92.6 90.5 86.3
5
Discussion and Evaluation
From Table 2, we observe very good accuracy scores obtained by the 4 models, with a slight lead for the SVM model which reaches 97.5% accuracy. These scores show that there is clearly a marked difference between the two groups of children and that there is a particular petternization for each class. These results also show us that a combinatorial approach of existing tools enables us to obtain more information and to fill the gaps of each questionnaire. However, the scores obtained are quite high, which may indicate that the data collected may be too deterministic, i.e. the children either have very marked autistic symptoms or they have nothing significant, which does not reflect the reality of the field. This conclusion pushes us to seek to expand the dataset by integrating complex cases such as children with co-morbid disorders to autism, children where the symptoms are not very apparent and on the other hand healthy children but with autistic behaviour that could be due to certain social factors etc.
6
Conclusion and Perspectives
This paper is a pilot study in which we investigate a novel approach to the detection of autism in young children. This approach consists of using machine learning techniques on data resulting from the fusion of existing screening questionnaires validated by the scientific literature. Our approach has resulted in a hybrid and general tool that covers most aspects and observations of the former questionnaires and the results obtained are very promising. Short-term goals include increasing data and reducing dimensionality. Long-term goals include the development of concrete and accessible tools based on our approach for prompt autism screening.
References 1. Lord, C., Risi, S., DiLavore, P.S., Shulman, C., Thurm, A., Pickles, A.: Autism from 2 to 9 years of age. Arch. Gen. Psychiatry 63(6), 694–701 (2006) 2. American Psychiatric Association: Diagnostic and statistical manual of mental disorders: DSM-5. Autor, Washington, DC, 5th edn. (2013)
60
S. A. Hadri and A. Bouramoul
3. The ICD-10 classification of mental and behavioural disorders: diagnostic criteria for research. World Health Organization (2003) 4. Maenner, M.J., Shaw, K.A., Baio, J., et al.: Prevalence of autism spectrum disorder among children aged 8 years-autism and developmental disabilities monitoring network, 11 sites, united states, 2016. MMWR Surveill. Summ. 69(4), 1 (2020) 5. Berument, S.K., Rutter, M., Lord, C., Pickles, A., Bailey, A.: Autism screening questionnaire: diagnostic validity. Br. J. Psychiatry 175(5), 444–451 (1999) 6. Carbone, P.S., et al.: Primary care autism screening and later autism diagnosis. Pediatrics 146(2) (2020) 7. Marlow, M., Servili, C., Tomlinson, M.: A review of screening tools for the identification of autism spectrum disorders and developmental delay in infants and young children: recommendations for use in low-and middle-income countries. Autism Res. 12(2), 176–199 (2019) 8. Robins, D.L., Casagrande, K., Barton, M., Chen, C.M.A., Dumont-Mathieu, T., Fein, D.: Validation of the modified checklist for autism in toddlers, revised with follow-up (m-chat-r/f). Pediatrics 133(1), 37–45 (2014) 9. Magiati, I., Goh, D.A., Lim, S.J., Gan, D.Z.Q., Leong, J., Allison, C., Baron-Cohen, S., Rifkin-Graboi, A., Broekman, B.P., Saw, S.M., et al.: The psychometric properties of the quantitative-checklist for autism in toddlers (q-chat) as a measure of autistic traits in a community sample of singaporean infants and toddlers. Molecular Autism 6(1), 1–14 (2015) 10. Bone, D., Bishop, S.L., Black, M.P., Goodwin, M.S., Lord, C., Narayanan, S.S.: Use of machine learning to improve autism screening and diagnostic instruments: effectiveness, efficiency, and multi-instrument fusion. J. Child Psychol. Psychiatry 57(8), 927–937 (2016) 11. Wall, D.P., Kosmicki, J., Deluca, T., Harstad, E., Fusaro, V.A.: Use of machine learning to shorten observation-based screening and diagnosis of autism. Transl. Psychiatry 2(4), e100–e100 (2012) ¨ urk, A.: Early autism diagnosis of children with machine 12. B¨ uy¨ ukoflaz, F.N., Ozt¨ learning algorithms. In: 2018 26th Signal Processing and Communications Applications Conference (SIU), pp. 1–4. IEEE (2018) 13. Shahamiri, S.R., Thabtah, F.: Autism ai: a new autism screening system based on artificial intelligence. Cogn. Comput. 12(4), 766–777 (2020) 14. Allison, C., Auyeung, B., Baron-Cohen, S.: Toward brief “red flags” for autism screening: the short autism spectrum quotient and the short quantitative checklist in 1,000 cases and 3,000 controls. J. Am. Acad. Child Adolescent Psychiatry 51(2), 202–212 (2012) 15. Allison, C., Williams, J., Scott, F., Stott, C., Bolton, P., Baron-Cohen, S., Brayne, C.: The childhood asperger syndrome test (cast) test-retest reliability in a high scoring sample. Autism 11(2), 173–185 (2007) 16. Dereu, M., Warreyn, P., Raymaekers, R., Meirsschaut, M., Pattyn, G., Schietecatte, I., Roeyers, H.: Screening for autism spectrum disorders in flemish day-care centres with the checklist for early signs of developmental disorders. J. Autism Dev. Disord. 40(10), 1247–1258 (2010) 17. Swinkels, S.H., Dietz, C., van Daalen, E., Kerkhof, I.H., van Engeland, H., Buitelaar, J.K.: Screening for autistic spectrum in children aged 14 to 15 months. i: the development of the early screening of autistic traits questionnaire (esat). J. Autism Developmental Disorders 36(6), 723–732 (2006) 18. Smith, N.J., Sheldrick, R.C., Perrin, E.C.: An abbreviated screening instrument for autism spectrum disorders. Infant Ment. Health J. 34(2), 149–155 (2013)
Smart Autism Screening
61
19. Allison, C., Baron-Cohen, S., Wheelwright, S., Charman, T., Richler, J., Pasco, G., Brayne, C.: The q-chat (quantitative checklist for autism in toddlers): a normally distributed quantitative measure of autistic traits at 18–24 months of age: preliminary report. J. Autism Dev. Disord. 38(8), 1414–1425 (2008) 20. Ghuman, J.K., Leone, S.L., Lecavalier, L., Landa, R.J.: The screen for social interaction (ssi): a screening measure for autism spectrum disorders in preschoolers. Res. Dev. Disabil. 32(6), 2519–2529 (2011) 21. Perera, H., Jeewandara, K.C., Seneviratne, S., Guruge, C.: Culturally adapted pictorial screening tool for autism spectrum disorder: a new approach. World J. Clin. Pediatrics 6(1), 45 (2017) 22. Turner-Brown, L.M., Baranek, G.T., Reznick, J.S., Watson, L.R., Crais, E.R.: The first year inventory: a longitudinal follow-up of 12-month-old to 3-year-old children. Autism 17(5), 527–540 (2013) 23. Joshi, A., Kale, S., Chandel, S., Pal, D.K.: Likert scale: explored and explained. British J. Appl. Sci. Technol. 7(4), 396 (2015)
Bi-RNN and Bi-LSTM Based Text Classification for Amazon Reviews Shamal Kashid(B) , Krishan Kumar, Parul Saini, Abhishek Dhiman, and Alok Negi Computer Science and Engineering, National Institute of Technology, Uttarakhand, India {kashid.shamalphd2021,parulsaini.phd2020,mt21cse004, aloknegi.phd2020}@nituk.ac.in, [email protected]
Abstract. Electronic Commerce (E-Commerce) enables the effective implementation of product-based online business transactions. In this paper, we categorize the dataset consisting of Amazon reviews into positive and negative. Consumers are able to choose a product solely on these binary text categorizations, neglecting the manual rating provided in the reviews. Recently, Deep Learning (DL) approaches started gaining popularity in E-Commerce applications. DL Techniques offers a powerful and effective approach for data analysis through filtering humongous E-Commerce data and finding hidden patterns and valuable details. Here, the research focuses on investing DL techniques for categorizing the text into positive and negative reviews. In this paper, we implemented two different DL approaches and found the accuracy of both approaches. DL models used here are bidirectional recurrent neural networks (BiRNN), and bidirectional long-short-term memory (Bi-LSTM). The accuracy obtained with Bi-RNN and Bi-LSTM is 89.60% and 92.80% respectively. The result shows that Bi-LSTM performs best on the Amazon review dataset. Keywords: Bi-RNN
1
· Bi-LSTM · Deep Learning · Text Classification
Introduction
The study of text classification (TC) is a recent area of research. Text classification is the process of grouping documents according to their content into classes and categories. As a result of the vast amount of online textual material, this procedure is becoming more crucial. Procedure to increase Classification accuracy is the main challenge in text classification [3]. Many applications, such as marketing, product management, academics, and governance, need to evaluate and extract information from textual data. Sentiment analysis, subject categorization, spam identification, and intent detection are different uses of the TC. Users may quickly and effectively organize all relevant text, such as emails, legal documents, social network postings, chatbot messages, surveys, and more, with the help of text classification. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 L. Troiano et al. (Eds.): ICDLAIR 2022, LNNS 670, pp. 62–72, 2023. https://doi.org/10.1007/978-3-031-30396-8_6
Bi-RNN and Bi-LSTM Based Text Classification for Amazon Reviews
63
E-commerce (electronic commerce) is the purchasing, selling, or sending of products, services, money, or data over an electronic network, most often the Internet. People have incorporated social media applications into daily life. The Internet is now a part of everything, including E-commerce websites such as Amazon, Flipkart, Shopify, Myntra, eBay, Quikr, and Olx [1]. One of the essential tasks in various issues includes automatic text classification. Examples include spam filtering, where the main objective is eliminating unwanted emails. Email folders aim to organize incoming messages into folders, and sentiment classification, where the purpose is to determine whether a document expresses a positive or negative opinion. Amazon is one of the most significant E-commerce platforms in the world. Customers regularly browse the products and customer reviews on Amazon before making a purchase. People are relying on customer reviews more and more. People’s experiences and opinions about things have an impact on their decisions to buy. Reviews have become more valuable due to people’s ongoing need to pursue what others have to say to learn from their experiences. TC for sentiment analysis is categorized into binary classification (“positive” and “negative”) and ternary classification (“positive”, “negative,” and “ neutral”). Sentiment analysis is done based on those classifications. During the past 20 years, text categorization and keyword extraction have drawn the attention of numerous researchers [2]. Considering the TC tasks, deep learning has become a recent and more significant development when conducted on massive volumes of data due to its accuracy. DL algorithms can learn about the many relationships between text fragments and how a given output is predicted for a specific input (text). In recent years, the BERT model [5] has become a well-known state-of-the-art model. Without human supervision, it can do natural language processing (NLP) tasks like supervised text classification. This approach is particularly prominent in research studies and business because of its adaptability to handle any corpus while producing excellent outcomes [4]. Numerous well-known classification algorithms are in use, each with different advantages. Understanding which algorithm would have the best results for a specific task and data collection can occasionally be challenging [8]. This study looks at a convolutional neural network (CNN) based deep learning model’s potential for binary classification, responsive or non-responsive. CNN models provide better accuracy even without further optimization when compared to the conventional approach, SVM (support vector machine), on the more extensive dataset and a more steady growth trend with gradually rising training sample numbers [10]. The remaining paper is organized as follows: Sect. 2 outlines the existing text classification techniques and machine learning algorithms for text mining. The binary text data classification on the dataset with deep learning models is presented in Sect. 3. Section 4 introduced the dataset description and experimental results and discussed the proposed work’s key findings. The work is concluded with the future directions in Sect. 5.
64
2
S. Kashid et al.
Related Work
The related existing state-of-the-art works based on binary text classification techniques are discussed in this section. Rusli et al. [11] demonstrated a binary text classification approach with a supervised machine learning approach using Multilayer Perceptron to classify news articles to detect fake news and differentiate them from valid ones. They achieved an F1 score of 0.82 by performing the stemming and stop-word removal process and the Bag of Word model combination. Also, the authors observed that skipping the stemming and stop-word removal steps in the text pre-processing step did not significantly affect the model performance. Solovyeva et al. [12] proposed the structure of a separable CNN comprising an embedding layer, separable convolutional layers, and global average pooling represented for binary and multiclass text classifications. This structure at binary and multiclass classifications of written texts with proposed networks using Sigmoid and Softmax activation functions in the convolutional layer. At binary and multiclass classification problems, obtained higher accuracy by separable CNN compared with some Bi-RNNs and fully connected networks. Bharadwaj et al. [13] utilized semantic features and several machine learning techniques for fake news detection. They explored recurrent neural networks vs. the naive bayes classifier and random forest classifiers by using five groups of linguistic features. The proposed model achieved 95.66 % accuracy with a random forest classifier using bigram features on the actual dataset from Kaggle.com. Raj et al. [14] explored CNNs and Bi-RNNs for multi-modal online information credibility analysis, which showed rapid improvement in classification tasks without pre-processing. In this paper, multi-modal Coupled ConvNet architecture fuses both the data modules and effectively classifies online news depending on its textual and visual content with a comparative analysis of the results of all the models used on the three datasets leading to higher accuracies. Kaliyar et al. [15] presented a BERT-based (Bidirectional Encoder Representations from Transformers) DL approach (FakeBERT) with a combination of different parallel blocks of the single-layer deep CNN having different kernel sizes and filters with the BERT for fake news detection. This paper presented FakeBERT model outperforms the existing model by giving an accuracy of 98.90% on the dataset made of thousands of fake and authentic news articles during the 2016 U.S. General Presidential Election. Kumar et al. [16] proposed a novel technique of movie genre classification by utilizing a combination of problem transformation techniques, namely binary relevance (BR) and label powerset (LP), text vectorizers, and machine learning classifier models. This paper model consists of label powerset (LP) as problem transformation technique, TF-IDF as text vectorizer technique, and support vector classifier (SVC) as a machine learning model producing an overall accuracy of 0.95 and F1-score of 0.86 on the IMDb dataset, gives the best result overall. Wang et al. [17] explored the binary word embedding model, inspired by the biological neuron coding mechanisms, transforming the spike timing of neurons during particular time intervals into binary codes that reduced the space and
Bi-RNN and Bi-LSTM Based Text Classification for Amazon Reviews
65
speeded up computation. They created three models for post-processing the original dense word embeddings: the homogeneous Poisson processing-based rate coding model, the leaky integrate-and-fire neuron-based model, and Izhikevich’s. Wang et al. [18] developed a rule-based NLP algorithm for the generation of labels automatically for the training data and, after that, utilized the pretrained word embeddings as deep representation features for training machine learning models. CNN achieved the best performance among all datasets. Word embeddings significantly outperform the paradigm’s TF-IDF and topic modeling features, and CNN captures different patterns from the weak supervision compared to the rule-based NLP algorithms. Mehta et al. [19] concentrated on classifying fake news using models based on an NLP framework, Bidirectional Encoder Representations from Transformers, also known as BERT for specific domain datasets. They determined that the deep-contextualizing nature of BERT is practical for this task and obtained significant improvement over binary classification and minimal substantial improvement in the six-label category compared with previously explored models. Li et al. [20] proposed a malware classification model with Bi-RNN, mainly the LSTM and the gated recurrent unit (GRU), to classify variants of malware by using long sequences of API calls. The numerical experimentation on the benchmark dataset, which had to classify eight malware families, observed that the proposed Bi-RNN model worked well on malware classification.
3
Proposed Model
Bi-RNN and Bi-LSTM Dl models are given a set of features from the training data to use as a classifier. TensorFlow was selected as the framework with builtin Keras DL modules. Due to having a sizable user base and a high number of commits on the Tensorflow GitHub repository, both can be sufficient to build deep learning neural network models [1]. 3.1
Bi-RNN Model for the Amazon Review Binary Classification
Recurrent neural networks (RNN) is a well-known supervised deep learning approach. When dealing with a series of data and time series analyses, RNN is frequently utilized. In such work, the network, or short-term memory, learns from what it has just observed. When working with sequential data, RNNs are frequently used. The model uses layers, which give the model a short-term memory. It can more accurately forecast the subsequent data using this memory. Sentiment analysis, label sequences, and tag speech are the applications of RNN. Advancement to RNN is Bidirectional RNNs (Bi-RNN). Bi-RNN consists of two RNNs, one starting at the beginning of the data sequence and moving forward, and the other starting at the end and moving backward. Simple RNNs, GRUs, or LSTMs, are all acceptable types of network blocks for Bi-RNNs.
66
S. Kashid et al.
Fig. 1. Framework of Text classification with Bi-RNN model
A Bi-RNN can accommodate the backward training process by adding a hidden layer. Using the dataset [22], we will construct a Bi-RNN network that categorizes a statement as either positive or negative. The outcome of the BiRNN model is summarized in Fig. 1. A bidirectional RNN is made by combining two RNNs that train the network in opposing directions-one from the beginning to the end of a sequence and the other from end to start-allowing the model to learn about more than just the past and present that aids in analyzing future events. A Bi-RNN can handle sequential data and accept both inputs from the present and inputs from the past. – Encoder: Text is transformed into a series of token indices. – Embedding layer: store One vector of each word is stored in the embedding layer. Then, it transforms word index sequences into vector sequences. These vectors can be trainable. Words with similar meanings generally have similar vectors after training. – Bidirectional RNN: is a combination of two RNNs training the network in opposite directions, one from the beginning to the end of a sequence, and the other, from the end to the beginning of a sequence. – Dense Layer: Perform some final processing and change the classification output from the previous vector representation to a single logit. Internal memory in Bi-RNNs enables them to recall previous inputs. The model can forecast the sentiment orientation of a sample review on new data after training with categorization labels. In bidirectional RNNs, the data before and after the current time step are simultaneously used to determine the hidden state for each time step. The main applications of bidirectional RNNs are sequence
Bi-RNN and Bi-LSTM Based Text Classification for Amazon Reviews
67
encoding and estimating observations in a Bi-RNN context. Figure 2 depicts the Bi-RNN network’s parametric specification. The advantage of using Bi-RNN is able to stream predictions well when words are added at the end. We can run your inputs in two directions-from the present to the future and from the present to the future-by using bidirectional.
Fig. 2. Parametric specification of the Bi-RNN model
3.2
Bi-LSTM Model for the Amazon Review Binary Classification
Bi-LSTM Implementation details: The Bi-LSTM neural network consists of LSTM units that work in both directions to consider the context from the past and future. Long-term dependencies were learned using Bi-LSTM without redundant context data [15]. It has proven to have outstanding performance for sequential modeling issues and is frequently used for text classification. The Bi-LSTM network comprises two parallel layers that propagate in two directions with forward and reverse passes, in contrast to the LSTM network, to capture dependencies in two contexts [16]. For the Bi-LSTM networks, We created our model with the Keras library, which consists of 4 layers: – – – –
Embedding: Minimizing the size of inputs Spatial dropout: To avoid overfitting LSTM: Long Short Term Memory layer, which is the RNN Dense: To transform LSTM outputs to binaries
Bi-LSTM varies from unidirectional LSTM in that you can preserve information from the future in the LSTM that goes backward, and by combining the two hidden states, you may preserve information from both the past and the future at any one moment in time. The maximum number of features considered for the
68
S. Kashid et al.
model classification is 1,000. The vocabulary size is also the same as that of the maximum features. The embedding layer has a dimension of 64, which denotes that every converted vector is in the 64-dimensional embedding space [8]. The embedding layer is added through the following function tf.keras.embedding(). The dense layer uses the activation function as “ReLU”. These regularizations help avoid overfitting, as done in many of the classifiers. Figure 3 depicts the Bi-LSTM network’s structure.
Fig. 3. Parametric specification of the Bi-LSTM model
The Epoch count of 10 is set. Epoch is a neural network attribute that controls the appropriate fitting of training and testing datasets. This Epoch computation provides the correct prediction for any model. Higher Epoch calculation makes the prediction more complicated, generating inconsistent outcomes over variable data points. Lower the Epoch calculation makes the model more unstable. During every Epoch iteration, the batch size of 16 bytes is processed.
4 4.1
Result Analysis Dataset
A Kaggle-based large dataset of 4 million reviews has been used for training [22]. This Kaggle dataset consists of binary output labels (positive and negative) based on the review’s star rating and text input from Amazon customer reviews. Here, ratings of 1 or 2 stars are classified as negative, while those of 4 or 5 stars are classified as positive, and reviews with ratings of 3 stars are not included. There is equal representation of the positive and negative classes. The experiments were conducted on Google collab, and model training runs for 10 epochs with a batch size of 16. Categorical cross-entropy loss and accuracy are used as metrics.
Bi-RNN and Bi-LSTM Based Text Classification for Amazon Reviews
4.2
69
Training/Testing Procedure
A randomly chosen portion of the training dataset, made up of 60000 reviews, was retrieved to keep the computational cost minimum. Forty thousand of these reviews were used to train classifiers, while the remaining 20000 were utilized to evaluate their effectiveness. To recap our efforts, we chose 60000 reviews randomly from 4 million reviews, cleaned and tokenized them, developed various models, and chose the best. Table 1. Comparision of DL model based on accuracy
4.3
DL Model
Train Data Test Data Accuracy
LSTM [21]
94227
27981
89.00%
LSTM network [24] 40000
20000
90.00%
CNN [25]
25000
88.74%
25000
LSTM [25]
25000
25000
89.40%
Proposed Bi-RNN
40000
20000
89.60%
Proposed Bi-LSTM 40000
20000
92.80%
Results
Accuracy is the number of correct predictions made as a ratio of all predictions. Acc =
TP + TN FN + TP + TN + FP
(1)
A plot for measuring the loss and accuracy during TC is made with the help of matplotlib.pyplot.figure(). The number of Epochs is kept on the x-axis. The loss value and accuracy value are held on the y-axis, as shown in both Fig. 4 and Fig. 5. Figure 4 describes the results of the Bi-RNN model in terms of loss and accuracy. The graph is plotted between the “Number of Epochs” with “Loss Value” and “Accuracy value” respectively. The proposed Bi-RNN recorded 90.91% testing accuracy with 0.21 loss and training accuracy 89.60% with 0.23 loss score as shown in Fig. 4. Figure 5 describes the results of the Bi-LSTM model in terms of accuracy and loss. The graph is plotted between the “Number of Epochs” with an “Accuracy value” and “Loss Value” respectively. The proposed Bi-LSTM recorded 94.02% testing accuracy with 0.15 loss and training accuracy 92.80% with 0.17 loss score. Table 1 compares the deep-learning models for text classification proposed BiRNN and Bi-LSTM DL models based on performance measure accuracy. Both the DL models dataset are the same, and testing and training parameters are equal. When we compare the results with other existing DL models [21], both models perform well. When we compare both Bi-RNN and Bi-LSTM, the BiLSTM model results are best. The best result has been highlighted in bold.
70
S. Kashid et al.
Fig. 4. Correlations of Test and Train Datasets over the Epoch Iterations using Bi-RNN
Fig. 5. Correlations of Test and Train Datasets over the Epoch Iterations using Bi-LSTM
5
Conclusion
The Amazon review dataset used in this research work is to analyze semantic user analysis about reviews. TensorFlow 2.1.0 platform uses the Bi-RNN and Bi-LSTM in the Keras framework. Both proposed models employ the “ReLU” activation function to prevent over-fitting by limiting the feature size to 1,000 and embedding layer’s dimension to 16 impacting the extraction method. Bi-LSTM achieved an overall accuracy of 92.80%. Every “epoch” iteration of the Bi-LSTM feedback connection mechanism gains the capacity to supply the word alignment
Bi-RNN and Bi-LSTM Based Text Classification for Amazon Reviews
71
support count. The same set of Amazon review dataset with equal volume is classified incorrectly when using classical models, which have prediction strength comparable to LSTM, even when the number of words are off by a small margin. The old models’ built-in feedforward mechanism is the main reason for this. The Bi-LSTM model completely avoids and successfully handles this as well. The primary goal of this article is to offer insight into the training of data and text classification using Bi-RNN and Bi-LSTM models to analyze, extract and generate the output. The Bi-LSTM algorithm performs in subsequent research when more classes are included in the classification. The reviews with 3 stars were not included in the classification for this research and have been incorporated along with a “neutral” class. To further improve the models, future research can be conducted on the photographs, videos, and texts on images of the reviews. Acknowledgment. The authors would like to thank to the DST GoI for sponsoring the work under DST/ICPS/General/2018.
References 1. Kong, S.H., Tan, L.M., Gan, K.H., Samsudin, N.H.: Fake news detection using deep learning. In: 2020 IEEE 10th Symposium on Computer Applications and Industrial Electronics (ISCAIE), pp. 102-107. IEEE (2020) 2. Badawi, D., Altın¸cay, H.: A novel framework for termset selection and weighting in binary text classification. Eng. Appl. Artif. Intell. 35, 38–53 (2014) 3. Boukil, S., Biniz, M., El Adnani, F., Cherrat, L., El Moutaouakkil, A.E.: Arabic text classification using deep learning technics. Int. J. Grid Distributed Comput. 11(9), 103–114 (2018) 4. Gonz´ alez-Carvajal, S., Garrido-Merch´ an, E.C.: Comparing BERT against traditional machine learning text classification. arXiv preprint arXiv:2005.13012 (2020) 5. Fan, J., Zhang, X., Zhang, S., Pan, Y., Guo, L.: Can depth-adaptive BERT perform better on binary classification tasks. arXiv preprint arXiv:2111.10951 (2021) 6. Bangyal, W.H., et al.: Detection of fake news text classification on COVID-19 using deep learning approaches. Computational and mathematical methods in medicine 2021 (2021) 7. Kashid, S., Kumar, K., Saini, P., Negi, A., Saini, A.: Approach of a multilevel secret sharing scheme for extracted text data. In: IEEE Students Conference on Engineering and Systems (SCES) 2022, pp. 1–5 (2022). https://doi.org/10.1109/ SCES55490.2022.9887697 8. Baluja, M.: Supervised Learning Comparison For Binary Text Classification (2021) 9. Lavanya, P.M., Sasikala, E.: Deep learning techniques on text classification using Natural language processing (NLP) in social healthcare network: a comprehensive survey. In: 2021 3rd International Conference on Signal Processing and Communication (ICPSC), pp. 603–609. IEEE (2021) 10. Wei, F., Qin, H., Ye, S., Zhao, H.: Empirical study of deep learning for text classification in legal document review. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 3317–3320. IEEE( 2018)
72
S. Kashid et al.
11. Rusli, A., Young, J.C., Iswari, N.M.S.: Identifying fake news in Indonesian via supervised binary text classification. In: 2020 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT), pp. 86–90. IEEE (2020) 12. Solovyeva, E., Abdullah, A.: Binary and multiclass text classification by means of separable convolutional neural network. Inventions 6(4), 70 (2021) 13. Bharadwaj, P., Shao, Z.: Fake news detection with semantic features and text mining. Int. J. Natural Lang. Comput. (IJNLC), vol. 8 (2019) 14. Raj, C., Meel, P.: ConvNet frameworks for multi-modal fake news detection. Appl. Intell. 51(11), 8132–8148 (2021). https://doi.org/10.1007/s10489-021-02345-y 15. Kaliyar, R.K., Goswami, A., Narang, P.: FakeBERT: fake news detection in social media with a BERT-based deep learning approach. Multimed. Tools Appl. 80(8), 11765–11788 (2021). https://doi.org/10.1007/s11042-020-10183-2 16. Kumar, S., Kumar, N., Dev, A., Naorem, S.: Movie genre classification using binary relevance, label powerset, and machine learning classifiers. Multimed. Tools Appl., 1–24 (2022) 17. Wang, Y., Zeng, Y., Tang, J., Bo, X.: Biological neuron coding inspired binary word embeddings. Cogn. Comput. 11(5), 676–684 (2019) 18. Wang, Y., Sohn, S., Liu, S., Shen, F., Wang, L., Atkinson, E.J., Amin, S., Liu, H.: A clinical text classification paradigm using weak supervision and deep representation. BMC Med. Inform. Decis. Mak. 19(1), 1–13 (2019) 19. Mehta, D., Dwivedi, A., Patra, A., Anand Kumar, M.: A transformer-based architecture for fake news classification. Soc. Netw. Anal. Min. 11(1), 1–12 (2021). https://doi.org/10.1007/s13278-021-00738-y 20. Li, C., Zheng, J.: API call-based malware classification using recurrent neural networks. J. Cyber Secur. Mob., 617–640 (2021) 21. Thivaharan, S., Srivatsun, G.: Keras Model for Text Classification in Amazon Review Dataset using LSTM 22. Amazon Reviews for Sentiment Analysis. https://www.kaggle.com/bittlingmayer/ amazonreviews#train.ft.txt.bz2 23. Shrestha, N., Nasoz, F.: Deep learning sentiment analysis of amazon. com reviews and ratings. arXiv preprint arXiv:1904.04096 (2019) 24. G¨ uner, L., Coyne, E., Smit, J.: Sentiment analysis for amazon. com reviews. Big Data in Media Technology (DM2583) KTH Royal Institute of Technology 9 (2019) 25. Jang, B., Kim, M., Harerimana, G., Kang, S., Kim, J.W.: Bi-LSTM model to increase accuracy in text classification: Combining Word2vec CNN and attention mechanism. Appl. Sci. 10(17), 5841 (2020)
Resource Utilization Tracking for Fine-Tuning Based Event Detection and Summarization Over Cloud Alok Negi1(B) , Krishan Kumar1 , Prachi Chauhan2 , Parul Saini1 , and Shamal Kashid1 1 National Institute of Technology Uttarakhand, Computer Science and Engineering, (Pauri,
Garhwal), NH 58, Srinagar 246174, Uttarakhand, India [email protected] 2 Department of Information Technology, G. B. Pant University of Agriculture and Technology, District-Udham Singh Nagar, Pantnagar 263153, Uttarakhand, India
Abstract. A huge volume of data generated by the video surveillance creates a variety of problems when utilized for different purposes such as video indexing, retrieval, analysis and summarization. The exponential growth of video content needs an effective video summarization which can help in efficient indexing and retrieval of data. The video summariza-tion is a challenging task due to huge amounts of data, redundancy, interview correlation and lighting variation. To overcome these problems, the proposed work presents object of interest-based event detection summa-rization using YOLOv5 which first detects the target object from the extracted frames and discards the frames which have no objects. The frames with the target objects then used for the creation of video sum-mary. The proposed method can detect the single and multiple objects in the frames and capable to detect the objects directly in the videos. This study also compared the different versions of the Yolov5 (YoloVs, YoloVm, YoloVl and YoloVn) against the resource utilization during the model training over cloud. The experimental study shows the better performance of our approach compared in terms of precision, recall and resource utilization to the state-of-the-art approaches. Keywords: GPU · Object of Interest (OoI) · OpenStack · Resource Utilization · Virtual Machine · YoloV5
1 Introduction Multimedia content has different forms including text, images, audio and video etc. With the growth of digital multimedia content, a lot of digital content such as movies, news, television shows and sports are widely available. To manage and access the daily generated huge volume of data is very difficult in real time scenarios due to illumination variations, presence of duplicate and unimportant frames and interview dependencies. Video summarization produces concise sum-maries or abstract views of the video data and provides quick indexing, accessing, efficient storage and browsing. To retrieve meaningful and interesting information for the analysis of video is a time consuming and complicated process due to the temporal nature of the video, high storage © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 L. Troiano et al. (Eds.): ICDLAIR 2022, LNNS 670, pp. 73–83, 2023. https://doi.org/10.1007/978-3-031-30396-8_7
74
A. Negi et al.
space and processing power. So, the user is restricted to watch the whole video to fetch the meaningful information. Most of the attempts rely on features like color, motion and texture information. In recent days, there are several efforts by researchers to propose automatic Video summarization using deep learning [2, 8, 14, 14, 15]. Most of the summa-rization process is based on selecting the keyframes, shot boundary detection, clustering and trajectorybased. These approaches have limitations towards the retrieval task. This study proposes the object of interest-based framework to overcome the video summarization issues. The proposed framework selects the video as input, extract the frames by 25 fps, detect the object using YoloV5 under the object localization phase, discard the frames without objects and compose the summary using the target object frames. The usage of resources in the cloud is now a major challenge for data centers. Computer resource provisioning on demand uses a lot of energy, power, and other resources. In order to monitor resource usage while training the model, the proposed approach additionally deploys virtual machines using OpenStack. The contributions of the proposed work are as follows: – To deploy a virtual machine using the Openstack in order to monitor system resource utilization during the model training. – To design a YoloV5 based model with a fine-tuning approach which detects the target objects on extracted frames to generate the Video summary. – To detect single and multiple objects in frames as well as in video. – To track the resource utilization during the YoloVs, YoloVm, YoloVl and YoloVn model training in terms of GPU memory allocated, GPU power, GPU utilization, Network traffics, System memory utilization, temperature and comparison among them. – To track the resource utilization during the YoloVs, YoloVm, YoloVl and YoloVn model training in terms of GPU memory allocated, GPU power, GPU utilization, Network traffics, System memory utilization, temperature and comparison among them. – To compare the results with the existing approach in terms of summary length. The rest of the paper is structured as follows: The literature survey is pro-vided in Sect. 2 and includes the related work on Video summarization. Section 3 describes the proposed work followed by results, discussion and performance comparison with the existing approaches in Sect. 4. Finally, Conclusion with future work are described in Sect. 5.
2 Related Work Most of the summarization techniques produces the summary by selecting the keyframes and video representation through the skimming process [3, 20]. Srinivas et al. [19] targeted the random video sequences dataset and computed the score for the each of the frame. Further the frames are ranked, and duplicate frames are eliminated. Uchihachi et al. [21] used the shot’s importance and eliminated the duplicate scenes on Staff meeting
Resource Utilization Tracking for Event Detection and Summarization
75
video dataset. Ajmal et al. [1] detected the person using the support vector machine, histogram of oriented gradient and Kalman filter. Muhammad et al. [12] proposed deep feature-based shot segmentation method and generated summary using entropy and memorability. Keyframe selection is done using the highest entropy and memorability score on open video (OV) and YouTube dataset. Hussain et al. [7] introduced deep feature based two tier framework. The first tier is used for the target-based shot segmentation and maintain them in a table to process further. The second tier is used to extract the feature and passed to Bidirectional long short-term memory to get the probability of informativeness and video summary. Muhammad et al. [13] captured the data from resource-constrained device and then obtained frame are coarse refined using the ORB based low level fea-tures. Finally, candidate frames are selected using the sequence learning for sum-mary generation. Kumar et al. [11] proposed a local-alignment-based FASTA approach for multiview videos. Deep learning is used to extract the feature and object detection on three dataset BL–7F, office and lobby dataset. Rani et al. [18] used the multi-visual features such as mutual information, color histogram, moments of inertia and correlation-based approach on social media. Then Kohonen Self Organizing map clustering is used to extract the keyframe with maximum Euclidean distance. Hussain et al. [6] proposed light-weight CNN and IIoT based multi-View summarization framework. The model is fine tuned for three types of object detection in the offline step while IIoT setup is done in online steps. The entropy and complexity score are used for the final keyframe selection. Kumar et al. [10] introduced deep learning-based event detect and summa-rization framework for monocular videos. The frames are represented by sparse matrices as graph vertices and highly connected subgraphs are designed as a clus-ter. Centroid of the cluster is considered as a key frame for the video summary. Khan et al. [9] found the scene boundaries with motion features and passed it to the CNN architecture which provides the importance of each frame. Bidirec-tional LSTM is used to remove the redundancy and experiments are performed on the TVSUM50 dataset. Gygli et al. [5] proposed “superframe” segmentation and then visual inter-estingness is estimated using the low, mid and high-level features. Based on this score, optimal subset is created to generate the video summary. Thomas et al. [21] proposed the perceptual video summarization to increase the speed of visualizing the accident content from videos. The salient features of the moving object are used for video summarization of road traffic surveillance.
3 Proposed Approach The proposed work deploys the virtual machine on the OpenStack cloud and then finds the targeted object using YoloV5 as shown in Figure 1. The frames with the objects are used to create the video summary. The proposed work has the following steps: – The OpenStack cloud installation and then VM scheduling on the cloud. – Read the Video and extract the frames.
76
A. Negi et al.
– Train the YoloV5 model using the fine tuning to detect single and multiple objects in the frames. – If an object of interest is found on the frame, then save it else discard it. – Generate the video summary by combining all the buffered frames with the object of interest. – Monitor the system resource utilization in terms of GPU memory allocated, GPU power, GPU utilization, Network traffics, System memory utilization and temperature. – Comparison of Performance matrices with existing approach in terms of Precision, Recall, F1 score and Mean Average Presion (mAP).
Fig. 1. Virtual Machine Scheduling on OpenStack Cloud and Object Detection
3.1 Virtual Machine Scheduling Using OpenStack Cloud After installation of the openstack cloud, cloud services can be accessed using the Horizon GUI. Each request goes to the controller node which decides the server for the new VM. The Controller checks whether a new VM can be created on the active server or not because initially there are no VM on any server. If it cannot be accommodated on an active server. Then magic packets can be sent by WoL technology for the new VM scheduling. Openstack provides infrastruc-ture as a Service (IaaS) and all components are accessed by the dashboard for administrative control. Nova service is used for the scheduling of virtual machine instances, life cycle and hypervisor management. Neutron service maintains the network services and the connectivity. Cinder service provides the way to manage and create external storage while Swift service is responsible for object-based storage. Keystone service provides identity authentication for the user and service. Glance service manages the uploaded images and it’s not a storage service. Heat is responsible for managing the life cycle of the OpenStack infrastructure while Horizon is used as a GUI for the users. Murano, Ceilometer, Octavia, Barbican, Ironic, Trove and Designate are some other services which are used by the Openstack Cloud. 3.2 YOLOv5 Based Model Training and Fine Tuning Object Detection finds the presence of relevant objects in the extracted Video frames. YOLOv5 works on the single stage which has Model backbone, model neck and model
Resource Utilization Tracking for Event Detection and Summarization
77
head. Model backbone extract the useful feature from the frames and use the cross stage partial networks. Feature pyramids are generated by the model neck and generalized the object scaling. It is useful to get the same object with different scales and size as well for unseen data. YOLOv5 uses the PANet as a neck to get the feature pyramids. Sigmoid activation function is used by the YOLOv5 for the final layer while leaky ReLU is used for the hidden layer. SGD is the default optimization function for the YOLOv5. The proposed work used the YOLOv5n (nano), YOLOv5s (small), YOLOv5m (medium), YOLOv5l (large) for finding the best model among them. These architectures are best suited for the 640 × 640 image size. The predictions are shown by Figs. 2 and 3. The proposed work used the wandb for resource utilization tracking which automatically tracks the system metrics in every two seconds. The total eight graphs are plotted as shown in Fig. 4 which gives the training insights.
Fig. 2. Prediction for Validation set
4 Results and Discussion The proposed work used the office and lobby dataset for the training and target object is person. The office dataset has the four views with a total 3016 s duration which are not synchronized. The lobby dataset contains three views with a total 1482 s duration which are not fixed and crowded. After ex-tracting the video frames by 25 fps, extracted
78
A. Negi et al.
Fig. 3. Label Prediction for Validation set
frames are annotated for training and validation set. The annotations are normalized to the image size and lie within the range of 0 to 1. They are represented by object-class-ID, X center, Y center, Box width and Box height. After annotations, the training set has 241 images while the validation set has 65 images. The model is trained for the batch size 8 and 120 epochs for the different version of YoloV5 such as YOLOv5n, YOLOv5s, YOLOv5m, YOLOv5l and comparison history are saved in Table 1. The proposed work is validated on the lobby dataset for three views. The proposed work used the precision, recall, PR curve, mean average precision (mAP), box loss, obj loss and cls loss to evaluate the model as shown in Fig. 5. The Box loss denotes the bounding box regression loss (Mean Squared Error). Object loss denotes the confidence of object presence in the objectness loss (Binary Cross Entropy) and Class loss is the classification loss (Cross Entropy). The mathematical equations are given by Eqs. 1, 2, 3 and 4. Precision = Recall =
True Positive True Positive + False Positive
True Positive True Positive + False Negative
(1) (2)
A single metric for the imbalanced data, the F1 score is the harmonic mean of the preision and recall. Similar values for recall and precision are encouraged by the harmonic mean.
Resource Utilization Tracking for Event Detection and Summarization
(a) GPU Utilization (percent)
79
(b) GPU Power Usage (percent)
(c) System Memory Utilization (percent)
(d) GPU Memory Allocated (percent)
(e) GPU Temp (ഒ)
(f) Network Traffic (bytes)
(h) GPU Time Spent Accessing Memory (percent) )
(g) Disk utilization
Fig. 4. System Resource Utilization during Training
The harmonic mean is therefore poorer the more the precision and recall scores diverge from one another. f1 Score =
2 × Precision × Recall Precision + Recall
mAP = 1/N
N
AP
i=1
where AP denotes average precision and N denotes the number of classes.
(3)
(4)
80
A. Negi et al. Table 1. Experiments Summary using YOLOv5 Versions. Parameters
Yolo5s Yolo5m Yolo5l Yolo5n
best/epoch best/mAP0.5 best/mAP 0.5:0.95 best/precision best/recall metrics/mAP 0.5 metrics/mAP 0.5:0.95 metrics/precision metrics/recall train/box loss train/cls loss train/obj loss val/box loss val/cls loss val/obj loss
111 0.987 0.63 0.93 0.97 0.98 0.63 0.93 0.97 0.018 0.0 0.009 0.034 0.0 0.009
83 0.981 0.63 0.92 0.95 0.98 0.63 0.92 0.95 0.015 0.0 0.007 0.031 0.0 0.009
101 0.990 0.63 0.97 0.92 0.99 0.63 0.97 0.92 0.013 0.0 0.007 0.030 0.0 0.009
74 0.971 0.63 0.97 0.90 0.97 0.63 0.97 0.90 0.022 0.0 0.010 0.033 0.0 0.009
Fig. 5. Visualization of Performance Metrics
For Lobby-0 view, the proposed model recorded Precision, Recall, and F1 scores of 99.8, 99.8, and 99.8 %, respectively. As indicated in Table 2, it is recorded as 99.8, 99.7, and 99.74 % for Lobby-1 View and 99.8, 99.7, and 99.74 % for Lobby-2 View. Based on the comparison Table 1, the proposed model found the YoloV5s as a best model because it has 212 layers, 20852934 parameters, 0 gradients and 47.9 GFLOPs and better recall and mAP score. As shown in Fig. 4, the model also shows the efficient resource utilization in terms of GPU memory allocated, GPU power, GPU utilization, Network traffic, System memory utilization and temperature.
Resource Utilization Tracking for Event Detection and Summarization
81
4.1 Comparison with Related Works Most of the work is based on shot boundary detection, keyframe selection and scene elimination. This proves the novelty of the proposed work which considers objects of interest only for the video summary. The Table 2 shows the comparative analysis of the proposed approach with the existing approach with the best precision and recall. Table 2. Comparison with the Related Work View No. Method [10] Lobby–0 [15] [16] Proposed [10] Lobby–1 [15] [16] Proposed [10] Lobby–2 [15] [16] Proposed
Precision 95.1 365.3 65.1 99.8 93.8 70.0 79.4 99.8 91.5 76.0 77.6 99.8
Recall 76.9 90.2 91.7 99.8 81.9 91.8 88.3 99.7 87.0 93.2 85.8 99.7
5 Conclusion This paper suggested the object of interest-based event detection and summa-rization over OpenStack cloud by scheduling the VM and tracking the resource utilization over cloud. It provides a better way to access huge volumes of data in an efficient manner and generate the best summary length as compared to the existing approach. The proposed approach can detect single and multiple objects in a frame as well as in the Video. The experimental result on the lobby dataset using YoloV5s shows the efficient system resource utilization and the best precision and recall over the cloud. The large duration multi–view surveillance videos can be used in future for better analysis. Acknowledgments. The authors would like to thank to the DST GoI for sponsoring the work under DST/ICPS/General/2018.
82
A. Negi et al.
References 1. Ajmal, M., Naseer, M., Ahmad, F., Saleem, A.: Human motion trajectory analysis based video summarization. In: 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 550–555. IEEE (2017) 2. Alok, N., Krishan, K., Chauhan, P.: Deep learning-based image classifier for malaria cell detection. In: Machine Learning for Healthcare Applications, pp. 187–197 (2021) 3. Elkhattabi, Z., Tabii, Y., Benkaddour, A.: Video summarization: techniques and applications. Int. J. Comput. Inf. Eng. 9(4), 928–933 (2015) 4. Gygli, M., Grabner, H., Riemenschneider, H., Gool, L.V.: Creating summaries from user videos. In: European Conference on Computer Vision, pp. 505–520. Springer (2014). https:// doi.org/10.1007/978-3-319-10584-0_33 5. Hussain, T., Muhammad, K., Del Ser, J., Baik, S.W., de Albuquerque, V.H.C.: Intelligent embedded vision for summarization of multiview videos in IIOT. IEEE Trans. Ind. Inform. 16(4), 2592–2602 (2019) 6. Hussain, T., Muhammad, K., Ullah, A., Cao, Z., Baik, S.W., de Albuquerque, V.H.C.: Cloudassisted multiview video summarization using cnn and bidirectional lstm. IEEE Trans. Industr. Inf. 16(1), 77–86 (2019) 7. Kashid, S., Kumar, K., Saini, P., Negi, A., Saini, A.: Approach of a multilevel secret sharing scheme for extracted text data. In: 2022 IEEE Students Conference on Engineering and Systems (SCES), pp. 1–5. IEEE (2022) 8. Khan, M.Z., Jabeen, S., ul Hassan, S., Hassan, M., Khan, M.U.G.: Video summarization using CNN and bidirectional LSTM by utilizing scene boundary detection. In: 2019 International conference on applied and engineering mathematics (ICAEM), pp. 197–202. IEEE (2019) 9. Kumar, K.: Evs-dk: Event video skimming using deep keyframe. J. Vis. Commun. Image Represent. 58, 345–352 (2019) 10. Kumar, K., Shrimankar, D.D.: F-des: Fast and deep event summarization. IEEE Trans. Multimedia 20(2), 323–334 (2017) 11. Muhammad, K., Hussain, T., Baik, S.W.: Efficient cnn based summarization of surveillance videos for resource-constrained devices. Pattern Recogn. Lett. 130, 370–375 (2020) 12. Muhammad, K., Hussain, T., Del Ser, J., Palade, V., De Albuquerque, V.H.C.: Deepres: a deep learning-based video summarization strategy for resource-constrained industrial surveillance scenarios. IEEE Trans. Ind. Inform. 16(9), 5938–5947 (2019) 13. Negi, A., Kumar, K., Chauhan, P.: Deep neural network-based multi-class image classification for plant diseases. Agricultural informatics: automation using the IoT and machine learning, pp. 117–129 (2021) 14. Negi, A., Kumar, K., Saini, P., Kashid, S.: Object detection based approach for an efficient video summarization with system statistics over cloud. In: 2022 IEEE 9th Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON), pp. 1–6. IEEE (2022) 15. Ou, S.H., Lee, C.H., Somayazulu, V.S., Chen, Y.K., Chien, S.Y.: On-line multi-view video summarization for wireless video sensor network. IEEE J. Selected Top. Sig. Process. 9(1), 165–179 (2014) 16. Potapov, D., Douze, M., Harchaoui, Z., Schmid, C.: Category-specific video summarization. In: European Conference on Computer Vision, pp. 540–555. Springer (2014). https://doi.org/ 10.1007/978-3-319-10599-4_35 17. Rani, S., Kumar, M.: Social media video summarization using multi-visual features and kohnen’s self organizing map. Inf. Process. Manage. 57(3), 102190 (2020) 18. Srinivas, M., Pai, M.M., Pai, R.M.: An improved algorithm for video summarization–a rank based approach. Procedia Comput. Sci. 89, 812–819 (2016)
Resource Utilization Tracking for Event Detection and Summarization
83
19. Thomas, S.S., Gupta, S., Subramanian, V.K.: Perceptual video summarization—a new framework for video summarization. IEEE Trans. Circ. Syst. Video Technol. 27(8), 1790–1802 (2016) 20. Thomas, S.S., Gupta, S., Subramanian, V.K.: Event detection on roads using per-ceptual video summarization. IEEE Trans. Intell. Transp. Syst. 19(9), 2944–2954 (2017) 21. Uchihachi, S., Foote, J.T., Wilcox, L.: Automatic video summarization using a measure of shot importance and a frame-packing method (Mar 18 2003), uS Patent 6,535,639
Automatic Fake News Detection: A Review Article on State of the Art Karim Hemina, Fatima Boumahdi(B) , and Amina Madani LRDSI laboratory, Sciences Faculty, Saad Dahlab Blida University, PB 270 Soumaa Road, 09000 Blida, Algeria [email protected], {f boumahdi,a madani}@esi.dz
Abstract. Fake news is a term used to describe incorrect information released to the public to hurt or profit others. People have used conventional media to distribute fake news and generate propaganda to sway public opinion since the dawn. However, new obstacles in detecting fake news have evolved with modern social media. Moreover, false news dissemination is becoming more like a business, with individuals being paid to generate bogus information, making old methods of detecting fake news inefficient. As a result, and because of the threat of fake news on societies, several academics are working to differentiate between false and authentic news by automating the detection process using machine learning. This paper addresses several known machine learning-based works for detecting bogus material based on context or content features. We will be explaining different methods for detecting fake news. We also describe various publicly available datasets for detecting false news. Finally, we conclude our paper by discussing the ongoing difficulties in detecting fake news. Keywords: Fake news · Deep learning · Machine learning language processing · Misinformation detection
1
· Natural
Introduction
Using social media for news consumption is a double-edged sword with its pros and cons. On the one hand, individuals seek and consume information through social media because it is cheap, easily accessible, and readily available. On the other hand, it encourages the spread of “fake news” as we recently saw in numerous instances, such as the 2016 US presidential election [1], rumors during the Brexit, and the bogus reports about the COVID-19 epidemic [2]. Thus, fake news has become a major threat to democracy, free expression, and journalism today. The widespread dissemination of fake news could have devastating effects on people and society. Therefore, detecting fake news on social media has become a growing research area. The remainder of this paper is composed of 4 more sections. Section 2 is dedicated to existing works, in Sect. 3 we will be talking about the available datasets, Sect. 4 is a discussion about the previously mentioned methods and what can be done in the future, and Sect. 5 is the conclusion. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 L. Troiano et al. (Eds.): ICDLAIR 2022, LNNS 670, pp. 84–93, 2023. https://doi.org/10.1007/978-3-031-30396-8_8
Automatic Fake News Detection: A Review Article on State of the Art
2
85
Existing Works
In this section, we will be classifying fake news detection existing works into two main categories, content-based and context-based methods, also we will be seeing some works in which a combination of both methods was used. Before talking about each of the previously mentioned methods, first, we will be defining the most used evaluation metrics. 2.1
Evaluation Metrics
For each of the mentioned works in this section, one or many of the following evaluation metrics are used. – Precision: The percentage of true predictions, calculated by dividing true positives (TP) by the predicted positives (PP) [3]. P recision =
TP PP
(1)
– Recall: Captures the ability of the classifier to find all the positive samples, which is the true positives divided by the true positives plus the false negatives (FN) [4]. TP (2) Recall = TP + FN – Accuracy: The percentage of true predictions, which is the true positives plus the true negatives (TN), divided by, the predicted positives plus the predicted negatives (PN) [5]. TP + TN (3) Accuracy = PP + PN – F1-score: The combination of precision and recall, we multiply the precision and recall then divide them by the precision plus recall and then multiply by two [6]. P recision ∗ Recall F1 = ∗2 (4) P recision + Recall – AUC ROC: performance measurement for the classification problem, it tells how much the model is capable of distinguishing between classes [7] 2.2
Content-Based Fake News Detection
Content-based features are extracted from textual components of the news such as headlines or news body, and visual components such as images if exists [8].
86
K. Hemina et al.
We can classify content-based methods into two classes [9]. 1. Style-based methods, which focus more on the shared patterns between fake news instead of focusing on the news itself. 2. Fact-based methods, which focus on the news instead of the patterns by comparing news to an external source of facts to verify how much truth it is Style-Based Fake News Detection Fake news writers tend often to use persuasive and emotional language to convince the readers [9], that is why researchers try to extract fake news common characteristics to detect it. Barbara et al. [10] tested five different machine learning techniques on the ISOT dataset1 , the one with the highest accuracy rate of 0.9964 was Bagging while Support Vector Machine (SVM) was the technique with the lowest accuracy rate of 0.9889. For CART, random forest, and AdaBoost they have got respectively 0.9954, 0.9913, and 0.9950 accuracy rates. Mathews and Preethi [11] achieved 0.967 accuracy, 0.962 precision, 0.975 recall, and 0.969 F1-score using Support Vector Classifier on the ISOT dataset. Bergstra et al. [12] worked on detecting political fake news using the LIAR datatset [13]. they tested different machine learning classifiers. the XGBOOST got the highest accuracy of 75% next is SVM and random forest with approximately 73% while naive bayes got the lowest accuracy with approximately 68%. Babu et al. [14] used Kaggle fake and real news dataset2 to implement their model using five different techniques. Logistic regression technique gave 98.84% accuracy. For naive bayes, it was 95.48% of accuracy. Random forest could achieve 99.09% of accuracy. The highest result of 99.71% accuracy was achieved by the decision tree. The last tested technique is SVM with 99.49% accuracy. A hybrid combination of CNN and RNN was proposed by Nasir et al. [15]. CNN was used for extracting local features which were used as input for the LSTM to extract long-term dependencies. For training, they used the ISOT dataset, and they validated their model on two datasets, the ISOT dataset and FA-KES dataset [16]. The choice of another validation dataset is to test the generalizing of the pre-trained model. They have got 0.99 accuracy on the ISOT dataset, while they could only achieve 0.60 accuracy on the FA-KES dataset. Saleh et al. [17] used an optimized CNN, trained and tested on the ISOT dataset basing on the content of the news to detect fake news. for features extraction two methods were used, word embedding and N-gram with TF-IDF. They have got 99.99% of accuracy, precision, recall, and f1-score. Ghanem et al. [18] approached fake news detection from a different angle, they worked on the part that fake news writers tend usually to play with reader’s emotions to drive them to believe their fake news, that’s why they proposed a model which uses both word embeddings and other effective features such as emotions and hyperbolic words. Bi-GRU is used to model the flow of effective 1 2
uvic.ca/engineering/ece/isot/datasets/fake-news/index.php. kaggle.com/clmentbisaillon/fake-and-real-news-dataset.
Automatic Fake News Detection: A Review Article on State of the Art
87
information after using CNN to extract topic-based information. They have got 0.96 accuracy, 0.93 precision, 0.97 recall, and 0.96 F1-score. In order to identify fake news using a multiclassification, Majumdar et al. [19] employed LSTM. They were able to obtain 98% accuracy using the CheckThat!2021 dataset [20], which comprises 4 classes of labeling (False, Partial False, True, and Other). Khaled et al. [21] worked on detecting Arabic fake news using only textual features, three datasets which the first one they collected from tweets, the second one was published in [22] and the third one is a merge of the first two datasets, were used with different deep learning models, CNN, LSTM, BiLSTM, CNN+LSTM, and CNN+BiLSTM. Their results indicate that the BiLSTM model outperforms the other models. For the BiLSTM they have achieved an accuracy of 0.848283 with the first dataset, 0.742747 with the second dataset, and 0.773275 with the merged dataset. Mahlous and Al-Laith [23] collected more than seven million Arabic tweets related to the coronavirus pandemic. They used content features out of these tweets to detect Arabic covid fake news and could achieve 93.3% f1-score using Logistic Regression and word count. Nassif et al. [24] constructed an Arabic fake news dataset. They achieved 98.5% accuracy, 99.1% precision, 98.2% recall, and 98.6% F1-score using the QaribBert-base model which was trained on around 420 million tweets. Fact-Based Fake News Detection Fact-checking, also called knowledge-based fake news detection, is the process of detecting fake news by comparing it to an external set of previously known facts [25]. This approach can be separated into two steps: 1. Fact extraction: Extracting knowledge from the web as raw facts and cleaning it up from redundancy, invalidity, conflicts, unreliability, and incompleteness. 2. Fact checking: Comparing the new article to the knowledge base we constructed from step 1 and getting the output as fake or not. Knowledge is a set of SPO triples (Subject, Predicate, Object) extracted from the given information, eg (Algeria, Qualified, World Cup 2022). Once this knowledge is verified as truth it becomes a fact, while a knowledge base is a set of facts. From a knowledge base, we can construct a knowledge graph which is a graph structure where an edge is the predicate and the two nodes linked by an edge are the subject and the object [25]. DEAP-FAKES, a knowledge graph fake news detection framework for identifying fake news using GNN, was proposed by Mayank et al. [26]. They used a bi-LSTM neural network to encode news titles. A knowledge graph was constructed and encoded using GNN. They used two datasets, Kaggle fake news3 and CoAID dataset [27]. For the knowledge graph, Wikidata5M4 was used. They achieved a result of 0.8955 accuracy and 0.8866 F1-score. 3 4
kaggle.com/c/fake-news/overview. Subset of the wikidata knowledge graph.
88
K. Hemina et al.
Harrag and Djahli [7] used CNN for Arabic fake news detection. The Arabic Fact-Checking and stance detection Corpus5 was used in their work since according to them it’s the only Arabic dataset that they could find for fake news detection. They achieved 91% accuracy and 89.9% F1 score. Sheng et al. [9] combined both fact-based and style-based methods into one framework they call Pref-FEND. They used two datasets, one in Chinese from Weibo social media and the other one in English from Twitter posts. For the Weibo dataset, they got 0.756 accuracy and 0.754 F1-score while for the Twitter database they achieved 0.814 accuracy and 0.801 F1-score. 2.3
Context-Based Fake News Detection
Context-based fake news detection methods are language agnostic, which means that they may allow us to build one model for different languages and it can be either alternative or complementary to content-based methods. Monti et al. [28] used a database of tweets that were published in the period between 2013 and 2018 to train their model. They worked on Fake news detection based on propagation features using a 4-layer graph CNN with 2 convolutional layers and 2 fully connected layers to predict fake and true class probabilities. The features they extracted from the database are separated into four categories, user profile, user activity, network and spreading, and content For the evaluation, they use ROC AUC as a measure, and they achieved nearly 93%. Han et al. [29] tested how accurate can Graph Neural Networks(GNN) be without relying on any text information. That is why they chose to extract the following features from the FakeNewsNet dataset [30]. Whether the user is verified or not, the timestamp when the user was created, the number of followers, the number of friends, and the timestamp of the tweet. The achieved results are 0.853 accuracy, 0.834 precision, 0.852 recall, and 0.841 f1-score. Graph Transformer Network (GTN) was used by Matsumoto et al. [31]. A labeled graph was constructed from the FakeNewsNet dataset. Each node of the graph represents either source news, tweet, or retweet. They used textual features and user-related features such as the number of words in the self-introduction, the number of words in the screen name, the number of followers, and whether location information exists or not. They achieved 0.9379 accuracy and 0.9132 f1-score. Davoudi et al. [8] proposed a new method to analyze the propagation tree which was constructed to investigate the news spreading process using python’s NetworkX library6 . They used cumulative features such as the total number of tweets, and global features such as the number of tweets without engagement. LSTM was used on the FakeNewsNet dataset and they could achieve 98.4% accuracy.
5 6
groups.csail.mit.edu/sls/downloads/factchecking/index.cgi. networkx.org.
Automatic Fake News Detection: A Review Article on State of the Art
2.4
89
Hybrid Fake News Detection
Because fake news are purposefully crafted to deceive viewers, style-based methods are not much efficient. The same is true for fact-based solutions because bogus news are time sensitive, and our knowledge bases are often outdated. Neither context-based methods are sufficient for early detection. As a result, researchers are currently investigating the prospect of detecting false news by combining content and context methods. Kaliyar et al. [32] combined both context and content features to detect fake news, they used the real word BuzzFeed dataset. An ensemble machine learning classifier (XGBoost) and a deep neural network model (DeepFakE) were used for classification. They achieved 0.8333 precision, 0.8696 recall, 0.8511 F1-score, and 0.8649 accuracy. Rohit et al. [33] employed multiple machine learning models to multiclassify news using the FNC-based fake news dataset [34] with both content and contextual features. The machine learning models employed were decision tree with 0.74 accuracy, gradient boosting with 0.86 accuracy (the highest), random forest with 0.70, logistic regression with 0.68, linear-SVM with 0.73, and multinomial naive bayes (MNB) with 0.725 accuracy. Sharma and Kalra [35] used user and text characteristics to detect fake news. For user characteristics, they have used the XGBoost algorithm. While for the text they a used sequential neural network and BERT transformer. This work was done on the FakeNewsNet dataset and they have achieved 98.53% accuracy, 98.20% precision, 98.51% recall, and 98.57% F1-score.
3
Datasets
In this section, we describe the publicly available English and Arabic datasets for fake news detection. The ISOT dataset [36] consists of more than 21400 real news and more than 23400 fake news separated into two CSV files. The dataset is composed of four columns, article title, text, type, and the date the article was published on. The LIAR dataset [13] contains 12856 articles, 10269 articles for training, 1283 articles for testing, and 1284 articles for validation. It is labeled on multi classes (True | False | Barely-True | Half-True | Mostly-True | Pants-fire). The MultiSourceFake dataset was created by Ghanem et al. [18], it contains 11397 English articles in a single CSV file with content, label(0 for fake and 1 for true), and type(training or testing) as information. The Covid-19 heAlthcare misinformation Dataset (CoAiD) created by Limeng and Dongwon [27], it contains 301177 confirmed fake and true news about covid-19 separated as 281080 true news and 20097 fake news. The FakeNewsNet dataset [30] was collected from fact-checking websites such as PolitiFact7 , it contains 5755 fake articles and 17441 real articles. It has content 7
https://www.politifact.com/.
90
K. Hemina et al.
information(linguistic and visual), social context information(user, post, response and network) and spatiotemporal information(spatial, temporal) as features. The Arabic fact-checking and stance detection corpus [37] is an Arabic corpus with 422 assertions concerning the Syrian war and associated middle eastern political problems. The Arabic fake news dataset, created by Ashwaq et al. [38] for the classification of articles’ credibility, has a total of 606912 articles of which 207310 are true, 167233 are false and 232369 undecided.
4
Discussion
In Sect. 2, we categorized fake news detecting techniques into three types, fact-based(knowledge-based) detection, style-based detection, and context-based detection. To overcome the still present obstacles of false news detection, such as early identification of fake news, various of the previously mentioned strategies can be combined. Partially true news and outdated news: With the exception of [19] and [33], all the studies mentioned in Sect. 2 concentrated on the binary classification of news, whether it is fake or not. The drawback of this binary classification is that it cannot identify partially fake news (e.g., news with text and image where the text is false while the image is true, or the contrary) or obsolete news (e.g., “Algeria is a French colony”). We can employ dynamic knowledge graphs, which are constantly updated in accordance with the most recent actual news, as well as a multiclass classification to identify partially true and out-of-date news in order to get around these problems. Early detection of fake news: The major goal of fake news identification is to prevent its dissemination among the public as soon as possible. Early fake news detection is a crucial field that has not yet been fully studied. Contextbased strategies are ineffective for early detection since they are most beneficial once the news has spread, while content-based methods are effective if we can maintain a knowledge base that is regularly updated.
5
Conclusion
In this paper, we reviewed current fake news detection research by discussing the various approaches used and the results obtained by researchers. The content of the news article is a common instinct for determining whether the news is fake or real. News content-based methodologies concentrate on extracting various features in fake news content, such as information-based and style-based features. Style-based methodologies try to detect fake news by capturing the manipulators in their writing style. However, the goal of creating fake news is to mislead consumers. Thus, detecting fake news more accurately with the help of news content is a very difficult task for researchers. For better detection, we must consider the social context in addition to the content features.
Automatic Fake News Detection: A Review Article on State of the Art
91
In the future, we will be conducting more research on the early detection of fake news. Also, more research is required on news multi-classification, not just as fake or accurate, but also as outdated, misleading, or half-truth. Finally, crossdomain fake news detection is a research area that needs to be better explored and considered very important to detect fake news concerning all fields that impact individuals.
References 1. Karwa, R.R., Gupta, S.R.: Artificial intelligence based approach to validate the authenticity of news. In: 2021 International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT), pp. 1–6 (2021) 2. Pavlov, T., Mirceva, G.: Covid-19 fake news detection by using BERT and RoBERTa models. In: 2022 45th Jubilee International Convention on Information, Communication and Electronic Technology (MIPRO), pp. 312–316 (2022) 3. Ramkissoon, A.N., Mohammed, S.: An experimental evaluation of data classification models for credibility based fake news detection. In: IEEE International Conference on Data Mining Workshops, ICDMW, November 2020, pp. 93–100 (2020) 4. Patil, D.R.: Fake news detection using majority voting technique. arXiv e-prints, arXiv:2203.09936, March 2022 5. Aphiwongsophon, S., Chongstitvatana, P.: Detecting fake news with machine learning method. In: 2018 15th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), pp. 528–531. IEEE, July 2018 6. Umer, M., Imtiaz, Z., Ullah, S., Mehmood, A., Choi, G.S., On, B.-W.: Fake news stance detection using deep learning architecture (CNN-LSTM). IEEE Access 1 (2020) 7. Fouzi Harrag and Mohamed Khalil Djahli: Arabic fake news detection: a fact checking based deep learning approach. ACM Trans. Asian Low-Resource Lang. Inf. Process. 21, 7 (2022) 8. Davoudi, M., Moosavi, M.R., Sadreddini, M.H.: DSS: a hybrid deep model for fake news detection using propagation tree and stance network. Expert Syst. Appl. 198, 116635 (2022) 9. Sheng, Q., Zhang, X., Cao, J., Zhong, L.: Integrating pattern- and fact-based fake news detection via model preference learning, p. 11 (2021) 10. Probierz, B., Stefanski, P., Kozak, J.: Rapid detection of fake news based on machine learning methods. Procedia Comput. Sci. 192, 2893–2902 (2021) 11. Mathews, E.Z., Preethi, N.: Fake news detection: an effective content-based approach using machine learning techniques. In: 2022 International Conference on Computer Communication and Informatics, ICCCI 2022 (2022) 12. Bergstra, J., Komer, B., Khanam, Z., Alwasel, B.N., Sirafi, H., Rashid, M.: Fake news detection using machine learning approaches. In: IOP Conference Series: Materials Science and Engineering, vol. 1099, p. 012040, March 2021 13. Wang, W.Y.: “liar, liar pants on fire”: A new benchmark dataset for fake news detection. vol. 2 (2017) 14. Babu, D.J., Sushmitha, G., Lasya, D., Gopi Krishna, D., Rajesh, V.: Identifying fake news using machine learning. In: 2022 International Conference on Electronics and Renewable Systems (ICEARS), pp. 1–6, March 2022
92
K. Hemina et al.
15. Nasir, J.A., Khan, O.S., Varlamis, I.: Fake news detection: a hybrid CNN-RNN based deep learning approach. Int. J. Inf. Manag. Data Insights 1, 100007 (2021) 16. Salem, F.K.A., Al Feel, R., Elbassuoni, S., Jaber, M., Farah, M.: Fa-kes: a fake news dataset around the Syrian war. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 13, pp. 573–582, July 2019 17. Saleh, H., Alharbi, A., Alsamhi, S.H.: Opcnn-fake: optimized convolutional neural network for fake news detection. IEEE Access 9, 129471–129489 (2021) 18. Ghanem, B., Ponzetto, S.P., Rosso, P., Rangel, F.: Fakeflow: fake news detection by modeling the flow of affective information. In: EACL 2021 - 16th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference, pp. 679–689 (2021) 19. Majumdar, B., RafiuzzamanBhuiyan, Md., Hasan, Md.A., Islam, Md.S., Noori, S.R.H.: Multi class fake news detection using LSTM approach. In: 2021 10th International Conference on System Modeling and Advancement in Research Trends (SMART), pp. 75–79 (2021) 20. Nakov, P., et al.: Overview of the CLEF-2021 checkthat! lab on detecting check-worthy claims, previously fact-checked claims, and fake news. CoRR, abs/2109.12987 (2021) 21. Khaled, W.M., Fouad, M., Sabbeh, S.F.: Arabic fake news detection using deep learning. Comput. Mater. Continua 71(2), 3647–3665 (2022) 22. Pardo, F.M.R., Rosso, P., Charfi, A., Zaghouani, W., Ghanem, B., S´ anchezJunquera, J.: Overview of the track on author profiling and deception detection in Arabic. In: FIRE (2019) 23. Mahlous, A.R., Al-Laith, A.: Fake news detection in Arabic tweets during the Covid-19 pandemic. Int. J. Adv. Comput. Sci. Appl. 12(6) (2021) 24. Nassif, A.B., Elnagar, A., Elgendy, O., Afadar, Y.: Arabic fake news detection based on deep contextualized embedding models. Neural Comput. Appl. 34, 16019– 16032 (2022) 25. Zhou, X., Zafarani, R.: A survey of fake news. ACM Comput. Surv. (CSUR) 53, 9 (2020) 26. Mayank, M., Sharma, S., Sharma, R.: DEAP-FAKED: knowledge graph based approach for fake news detection. CoRR, abs/2107.10648 (2021) 27. Cui, L., Lee, D.: Coaid: COVID-19 healthcare misinformation dataset. CoRR, abs/2006.00885 (2020) 28. Monti, F., Frasca, F., Eynard, D., Mannion, D., Bronstein, M.M.: Fake news detection on social media using geometric deep learning, February 2019 29. Han, Y., Karunasekera, S., Leckie, C.: Graph neural networks with continual learning for fake news detection from social media. arXiv, abs/2007.03316 (2020) 30. Wang, S., Liu, H., Shu, K., Mahudeswaran, D., Lee, D.: Fakenewsnet: a data repository with news content, social context and dynamic information for studying fake news on social media. Researchgate.Net (2018) 31. Matsumoto, H., Yoshida, S., Muneyasu, M.: Propagation-based fake news detection using graph neural networks with transformer. In: 2021 IEEE 10th Global Conference on Consumer Electronics, GCCE 2021, pp. 19–20 (2021) 32. Kaliyar, R.K., Goswami, A., Narang, P.: Deepfake: improving fake news detection using tensor decomposition-based deep neural network. J. Supercomput. 77, 1015– 1037 (2021) 33. Kaliyar, R.K., Goswami, A., Narang, P.: Multiclass fake news detection using ensemble machine learning. In: 2019 IEEE 9th International Conference on Advanced Computing (IACC), pp. 103–107 (2019)
Automatic Fake News Detection: A Review Article on State of the Art
93
34. Shang, J., et al.: Investigating rumor news using agreement-aware search. CoRR, abs/1802.07398 (2018) 35. Sharma, S., Kalra, V.: A deep learning based approach for fake news detection. Int. J. Sci. Res. Sci. Eng. Technol. 8, 388–394 (2021) 36. Ahmed, H., Traor´e, I., Saad, S.: Detecting opinion spams and fake news using text classification. Secur. Privacy 1 (2018) 37. Baly, R., Mohtarami, M., Glass, J., M` aquez, L., Moschitti, A., Nakov, P.: Integrating stance detection and fact checking in a unified corpus. In: Proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL-HLT ’18, New Orleans, LA, USA, June 2018 38. Khalil, A., Jarrah, M., Aldwairi, M., Jaradat, M.: AFND: Arabic fake news dataset for the detection and classification of articles credibility. Data Brief 42, 108141 (2022)
Cascaded 3D V-Net for Fully Automatic Segmentation and Classification of Brain Tumor Using Multi-channel MRI Brain Images Maahi Khemchandani(B) , Shivajirao Jadhav, and Vinod Kadam Dr. Babasaheb Ambedkar Technological University Vidyavihar, Lonere, Maharashtra, India {mahi,smjadhav,vjkadam}@dbatu.ac.in
Abstract. Recently, the incidence of Brain tumor (BT) has become highly common worldwide. BT is a life threatening ailment and its early identification is imperative for saving human life. BT segmentation and categorization are crucial tasks involved in BT recognition. Many existing approaches have been developed for detecting BTs. But, the existing models do not focus on segmenting and categorizing diverse categories of BTs. Further, these existing models are incapable of processing 3D images. Hence, to address these requirements, the work proposed a cascaded 3D V-Net framework that aims at segmenting and categorizing three distinct BT categories like non-enhancing and necrotic tumor core (NET/NCT), peritumoral edema (ED) along with enhancing tumor (ET) from 3D magnetic resonance imaging (MRI) brain input images. This work adopts a BRATS 2020 MRI image database for experimentation. The developed 3D V-Net framework’s performance is assessed using an existing framework through considering accuracy, sensitivity, precision and IoU metrics for manifesting the proposed V-Net model’s superiority in BT segmentation and classification performance. And the presented 3D V-Net framework supersedes the existing model framework in BT categorization performance through exhibiting 99.58% accuracy. Keywords: 3D V-Net · Brain tumor · Segmentation · Classification · Multichannel MRI
1 Introduction In the machine vision realm, recognition and categorization of clinical infections has acquired much attention owing to their looming applications like clinical imaging [1– 3]. In this present world with advancements in the field of information technology, diverse machine learning (ML) [4–6], deep learning (DL) [7–10] and image processing techniques [11–13] are introduced chiefly for precise identification and classification of distinct symptoms of diseases. These techniques recognize and categorize disease regions precisely and further categorize them into the corresponding category. This procedure can aid physicians in rapid and precise diagnosis and classification of lesion or tumor region. In recent times, Brain tumor (BT) detection has become one of the active research hotspots in the disease recognition domain [14–16]. BT tissues affect © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 L. Troiano et al. (Eds.): ICDLAIR 2022, LNNS 670, pp. 94–110, 2023. https://doi.org/10.1007/978-3-031-30396-8_9
Cascaded 3D V-Net for Fully Automatic Segmentation
95
regular development of brain. BT is chiefly caused by the formation of abnormal sets of cells owing to uncontrollable division of cells inside or around the human brain. This sort of cluster cells can mainly affect the healthy brain cells and usual brain functions. Several determinants exist for BT threat which include location, tumor texture, tumor size and shape. Generally, BT is mainly of two types: a) benign and b) malignant. The benign BT does not spread rapidly and in this type the adjoining healthy brain tissues are not affected whereas the malignant BT spreads rapidly and failure to detect this sort of BT timely and precisely can lead to death [17]. Moreover, malignant BT can affect the adjoining healthy brain tissues. Presently, the malignant BT cases have considerably increased. Therefore, early BT identification has become indispensable for saving human lives. Consequently, the MRI scanning method has been employed for recognizing BT in humans at an initial level so as to avoid fatalities owing to this disease [18–22]. MRI technology is highly useful in determining BT and is deemed to be an effective screening procedure than the computed tomography. However, analyzation of MRI images using conventional imaging methods is time consuming and ineffective for precise and rapid BT segmentation and categorization. Adoption of fully automated systems for BT segmentation and categorization can greatly aid in addressing diverse concerns experienced in existing manual and semi-automated disease detection systems. Advanced technologies like DL and ML seem to be high useful and promising candidates in this context. Thus, this research work develops a DL-oriented completely automated framework for effectively and precisely segmenting and classifying BTs. 1.1 Contributions of This Work The prominent contributions of this work include: • • • •
To Segment and categorize BT using multi-channel brain MRI images. Development of a DL framework for effective BT segmentation and categorization. Utilizing a V-Net framework to leverage the data samples for BT image categorization. An Image data generator (IDG) is employed to offer an easy and quick way for augmenting the images. And, also helps in memory saving. • Comparison of developed 3D V-Net framework’s BT classification performance against existing models to show the superiority of the proposed framework. The rest section of this paper is structured as follows: Sect. 2 reviews the different studies and approaches on BT segmentation and categorization. Section 3 describes the implementation of developed research methodology. Section 4 presents the simulation outputs and discusses the achieved results. Section 5 provides the cardinal findings of this research article.
2 Related Work A newfangled approach for BT segmentation and its classification was presented in [23] depending on grade level fusion. The authors exploited distinct spatial domain schemes
96
M. Khemchandani et al.
for augmenting and precisely segmenting input images. This work adopted Google and Alex networks for classification wherein two grade vectors were achieved after softmax layer. Then, these grade vectors were combined and input to diverse classifiers together with a softmax layer. The presented architecture accurately segmented and categorized the malignant and benign tumor cases. However, BT detection in sagittal and coronial views was not discussed in this work. In [24], automatic BT segmentation and categorization via machine learning scheme was discussed. This approach involved four phases namely preprocessing for noise elimination, segmentation for identifying region of interest, feature extraction for mining distinct features and finally categorization for categorizing BT as malignant or benign. Experimental outputs clarified that the employed scheme offered 94.44% precision, 93.33% accuracy, 93.33% sensitivity, and 96.6% specificity. But identification of distinct classes within malignant and benign sort of tumors was not explored in this work. In [25], convolutional neural network (CNN) model was developed chiefly for BT segmentation and categorization via MRI images. Here, the features from MRI images were extracted and fed to CNN model for categorization. However, performance evaluation and superiority of this model over existing frameworks was not presented. In [26] a hybrid weights alignment with a multi-dilated attention network (HybridDA Net) for automatic BT was established. It deployed multiple modules integrated into the baseline encoder-decoder architecture. Initially, a hybrid weight alignment with a multi-dilated attention module (HWADA) was wielded betwixt the skip connections. It could acquire diverse sets of aligned weights by utilizing diverse dilation schemes. After that, the model integrated a multi-channel multi-scale module (MCS) on the baseline module. The outcomes exhibited the model’s superior performance with lower accuracy. In [27], a DL-oriented BT segmentation and categorization model was presented. The authors executed the BT detection through pre-processing schemes ensued by skull demounting and then BT segmentation. Ths model effectively segmented BT from MRI and categorized tumorous images into benign and malignant using CNN dependent AlexNet approach. It offered 0.9677 f-measure, 0.9375 precision and recall value of 1. The malignant image was further categorized into Meningioma and Glioma using CNN-directed GoogleNet approach. Experiments clarified that this classification offered 0.9743 f-measure, 0.9750 accuracy, 0.95 precision and recall of 1. Performance evaluation tests declared that this model superseded the existing schemes. However, further amelioration in BT classification performance is desired. In [28], a multiscale CNN model for BT segmentation and categorization was provided. Data augmentation via elastic transfiguration was conducted for enhancing training database and avoiding overfitting. Comparison of the developed model with seven other BT classification techniques revealed that the multiscale CNN model provided 97.3% accuracy and outperformed the remaining techniques. But this model suffered from false positives in certain images. In [29], BT segmentation and categorization was accomplished via ML scheme. The authors aimed at precisely categorizing BT from MRI images through exploiting multiple segmentation schemes like threshold, K-means and watershed segmentation for tackling the miss categorization error possibilities. Adoption of multiple segmentation schemes helped in generating more precise and accurate outputs compared to an individual segmentation method. The BT classification was executed using a support vector
Cascaded 3D V-Net for Fully Automatic Segmentation
97
machine (SVM) classifier which offered 90% accuracy. However, this work was not explored much with regard to multiclass BT classification. In [30], a hybrid architecture was presented for BT segmentation and categorization. Initially, preprocessing was conducted via a mean filter for noise suppression. Further, BT segmentation was done using a Bayesian with fuzzy clustering technique. Then feature extraction schemes like data theoretic measures, Tsallis entropy and scattering transform were employed for extracting robust attributes after segmentation. Finally, a deep autoencoder with regression method was employed for categorizing the tumor section for the BT categorization process. The authors used BRATS 2015 dataset for analysis. Simulations ascertained that this approach achieved 98.5% categorization accuracy and outperformed several competing classification methods. However, efficacy of this approach using the advanced DL approach was not inspected. In [31], an enhanced DL scheme for BT segmentation was introduced. In this work, kernel dependent CNN with multiclass SVM was adopted. The approach involved diverse phases like preprocessing, attribute extraction, BT image categorization and BT segmentation. After preprocessing, the desired attributes depending on shape, surface features and BT shape position were extracted and MRI image was categorized into abnormal or normal brain image using multiclass SVM based on chosen tumor attributes of the brain. Further, segmentation was executed on abnormal MRI brain image for segmenting tumor via kernel-oriented CNN. In the discussed segmentation, CNN was integrated with multiclass-SVM method for BT segmentation and categorization. Results manifested that this approach offered 84% segmentation accuracy. But the proposed DL scheme’s performance was evaluated using only a few metrics. In [32], automated BT categorization from MRI was described using SVM and CNN. The authors in this work exploited Figshare open database comprising MRI images of three sorts of BTs. Here, CNN scheme was implemented chiefly for extracting attributes from MRI brain images. For achieving improved performance, a multicategory SVM was utilized with CNN attributes. Evaluation and testing of this hybrid method involved a five-fold cross-verification process. It was identified from experiments that the employed scheme exhibited 95.82% total categorization accuracy and superseded the previously-proposed schemes in BT classification performance. However, it required additional training time. A sophisticated CNN framework was proposed in [33] for BT classification. This work aimed at type of BT recognition in a collective patient MRI images. The authors developed a complex CNN structure, trained via cross validation. The sagittal, coronal and axial MRI scans were utilized for the framework testing and training. This framework was experimented on MRI brain images acquired from imperative open databases. The presented CNN framework successfully predicted five sorts of BTs namely astrocytoma, healthy tissue, oligodendroglioma, unidentified tumor and glioblastoma multiforme and exhibited F1-scores of 99.48%, 99.99%, 99.23%, 99.08% and 99.50% for these BT types. However, identification of exact BT location was not described in this work. In [34], an automated segmentation and categorization of BT system via MRI images was described. The entire procedure involved preprocessing of brain image for noise suppression, attribute extraction, segmentation and categorization. The preprocessing was achieved through diffusion filters. Here, tumor region was extracted using morphological and Otsu’s thresholding operations. For attribute extraction, wavelet transform
98
M. Khemchandani et al.
dependent attributes were extracted. These extracted attributes were then provided to the segmentation phase. This work adopted SVM for BT segmentation and categorization. The employed SVM method displayed 86% accuracy. However, superiority of this approach over previously-introduced schemes was not illustrated. In [35], a newfangled scheme for BT segmentation and categorization was described depending on top features selection and ameliorated saliency segmentation. The authors initially performed preprocessing for extracting regions of interest via skull stripping manually. Further, a Gaussian filter was utilized for eliminating noise effects and tumor segmentation was employed using improved threshold technique. After segmentation, feature mining was done for extracting texture and geometric attributes. The features mined were fused using a serial dependent techniquein and appropriate features were selected using genetic algorithm (GA). The classification was performed using a linear kernel SVM. Results substantiated that this approach offered 90% accuracy for private and Harvard datasets. However, this scheme experienced oversegmentation issues when tumor appeared on border areas and this greatly affected the categorization accuracy.
3 Proposed Methodology In this research work, an improved DL-oriented framework is constructed for performing BT segmentation and its classification. The implementation procedure of this research methodology is portrayed in Fig. 1. This DL-oriented framework involves four prime stages namely data collection and processing, segmentation and feature extraction, classification using V-Net and finally performance evaluation. Initially, data are collected from publically available resources. Then, the collected data are preprocessed. After that, the preprocessed image is segmented into distinct parts for utilizing only the useful parts. And the desired image attributes are extracted. After that, categorization was done to categorize BT by using the proposed V-Net algorithm. Finally, a performance analysis was carried out to show the efficiency of the proposed model.
Data collection and preprocessing
Segmentation and feature extraction
V-Net construction
Performance evaluation
Fig. 1. Implementation flow of proposed methodology
3.1 Data Collection and Preprocessing Data collection and processing are highly imperative and preliminary processes involved in BT classification. In this work, brain MRI images are utilized for gathering the data. After data collection, the acquired MRI images are preprocessed. Preprocessing is executed for image quality amelioration in order to attain better outputs in preceding phases. Preprocessing is helpful in delivering the ameliorated image features.
Cascaded 3D V-Net for Fully Automatic Segmentation
99
In this work, all multimodal scans (images) available in NIfT1 files (commonly employed clinical imaging format for storing brain image data acquired using MRI and describing distinct MRI settings) are employed. NIfT1 is typically a raster format, which includes files involving three-dimensional (3D) information like voxels or pixels with height, width and depth. The gathered data involves following parameters a) T1: T1-weighted, native image, axial or sagittal 2D acquisitions involving 1–6 mm slice thickness, b) T1c: T1-weighted, contrast-enhanced (Gadolinium) image with 3D acquisition and involving 1 mm isotropic voxel size for most patients, c) T2: T2-weighted image, axial 2D acquisition involving 2–6 mm slice thickness and d) FLAIR: T2weighted FLAIR image, coronal, axial or sagittal 2D acquisitions involving 2–6 mm slice thickness. This work acquired information using distinct medical protocols and diverse scanners specifically from 19 distinct institutions. Moreover, all imaging databases are manually segmented here, using 1 to 4 raters. The considered annotations include GDenhancing tumor (ET-label 4), the peritumoral edema (ED-label 2) and non-enhancing and necrotic tumor core (NET/NCR- label 1). Here, provided information is distributed typically after its pre-processing. In this work, brain tumor image segmentation (BRATS) 2020 MRI image dataset derived from Kaggle database is employed for analysis. 3.2 Segmentation and Feature Extraction Segmentation typically plays an imperative part in segregating an image into distinct parts for utilizing only the useful part. Segmentation is mainly performed for decreasing the amount of tasks to be executed in further phases, for instance, extraction of features becomes simpler in events when there exists only a single tumor region whose attributes need to be mined over the whole image. Feature extraction, basically, is an indispensable process as it is helpful in suppressing the colossal database representation. This process chiefly involves transformation of image data into desired features for categorization. In current work, initially, multiclass label and volume sizes are splitted. Further, the MRI format images are checked, visualized and converted into 2D images. The acquired brain MRI images are segmented in this stage and desired image attributes are extracted. From the transformed 2D images, the measurements like Dice coefficient and its loss function are estimated for gaining fundamental insights regarding 2D images. The Dice coefficient is a chiefly employed metric for estimating similarity between two images. The Dice loss metric is typically adopted in clinical image segmentation operations for tackling data imbalance issues. However, Dice loss only tackles the imbalance issue between background and foreground yet overlooks another imbalance between hard and easy examples that severely affects a learning framework’s training process. Dice coefficient, which is typically an estimate of overlap between two samples. Its value typically ranges from 0 to 1 where a Dice coefficient of 1 indicates complete and perfect overlap and Dice coefficient of 0 indicates no spatial overlap between two groups of segmentation results. This coefficient is similar to intersection over union (IoU) metric. Dice coefficient D between two binary volumes ranging between 0 and 1 can be estimated using Eq. (1) as 2
D = N
N
2 i pi
i
+
pi gi N i
gi2
(1)
100
M. Khemchandani et al.
where N represents number of voxels, pi indicates predicted voxels and gi indicates ground-truth voxels. 3.3 Proposed Model Creation As most clinical data employed in medical practice is contained in 3D form, for instance MRI volumes illustrating prostrate, but many approaches are capable of only processing 2D images and are effective for 3D images. Thus, to efficiently overcome this shortcoming, in this work, a 3D image segregation depending on a fully convolutional structure is presented. Here, a V-Net framework, a form of CNN structure is developed for leveraging the data samples for BT image categorization. The developed cascaded 3D V-Net structure is portrayed in Fig. 2. The developed V-Net structure involves a kernel size of 128*128*3, input size of (5, 5, 5) and a learning rate of 0.001. The presented V-Net architecture employs softmax activation function and Adam optimizer for learning.
Fig. 2. Proposed V-Net model
For constructing the proposed V-Net framework, initially, a convolutional block is created. Then this block is connected to 3D blocks. Further, image data generator (IDG) is employed. The IDG offers an easy and quick way for augmenting the images. It offers a host of distinct augmentation methods like standardization, flips, shifts, rotation, brightness variation, etc. The chief benefit of employing Keras IDG class is that it is devised to offer realistic data augmentation. Moreover, the IDG class assures that the framework receives new variants of images at every epoch. Another significant advantage
Cascaded 3D V-Net for Fully Automatic Segmentation
101
of IDG involves the requirement of lower memory usage. Exploitation of IDG aids in loading images in batches and thereby helps in memory saving. Further, a callback function is executed in this work. A callback is basically an object that performs activities at diverse phases of training (e.g. starting an epoch, terminating an epoch, etc.). Callbacks can be employed for writing tensorboard logs after each batch of training for monitoring the metrics. Further, weights are added to pretrained model and diverse metrics like accuracy, dice coefficient, IoU and dice loss are tested for training and validation. 3.4 Performance Assessment and Comparison Performance of the presented cascaded 3D V-Net is then evaluated using diverse performance assessment measures like precision, sensitivity, accuracy and mean IoU metrics. These metrics are computed as follows: Precision =
TP TP + FP
(2)
Sensitivity =
TP TP + FN
(3)
TP + TN TP + TN + FP + FN
(4)
Accuracy =
where TN, TP, FP, and FN represent true negative, true positive, false positive and false negative, respectively. The TN and TP indicate the number of negative and positive samples which are correctly classified while the FN and FP indicate the number of misclassified negative and positive samples, respectively. Further, the proposed cascaded 3D V-Net approach’s performance is compared with previously-reported U-Net through considering precision, sensitivity, accuracy and mean iou metrics.
4 Results and Discussion In this research study, Python 3.9.7 and Jupitor notebook 6.4.5 software is exploited for simulation. And BT categorization is achieved using a cascaded 3D V-Net framework. In this work, brain tumor image segmentation (BRATS) 2020 MRI image dataset derived from Kaggle database is employed for analysis. This database is further splitted into training and validation sets. Figure 3 displays the parameter deemed for the investigation. The brain MRI images are further transformed into 2D images as depicted in Fig. 4. Figure 5 portrays the MRI brain image. The validation of output using train and test sets is portrayed in Fig. 6. As observed in Fig. 6, initially an original flair image is considered and its ground truth is evaluated. Further, three distinct BT classes namely NET/NCR, ED and ET are determined and predicted for training and validation sets. The achieved BT classification output is presented in Fig. 7. Here, the ground truth considered for evaluation and the achieved prediction output for ED category is displayed in Fig. 7.
102
M. Khemchandani et al.
Fig. 3. Parameters considered for analysis
Fig. 4. Checking and visualization of MRI brain image format into 2D image
Fig. 5. MRI brain image
Fig. 6. Validation of output with train and test sets
From Fig. 8, it could be observed that the attained accuracy, dice loss, dice coefficient and IoU values are approximately similar for both training and validation sets. Hence, it is confirmed that the developed framework exhibits better performance. Performances of
Cascaded 3D V-Net for Fully Automatic Segmentation
103
Fig. 7. Classification output
Fig. 8. Adding weights of pre-trained model and validation of metrics
developed and existing DL frameworks are compared with regard to distinct performance evaluation metrics and the output values achieved are enumerated in Table 1. Table 1. Performance results of developed V-Net and Existing models Techniques / Metrics
Sensitivity Specificity(%) Accuracy(%) Precision(%) F1-score(%) Mean (%) IOU(%)
Proposed-V-Net 99.4762
99.7377
99.5803
99.6262
98.4262
85.2956
V-Net
98.8167
98.0428
99.076
99.2567
88.2567
81.8354
U-Net
98.3581
97.8814
98.668
99.0914
87.2881
38.8866
Resnet
97.5835
97.6392
95.953
98.3535
85.8354
35.3511
Segnet
95.7143
97.619
95.1837
95.7143
85.7143
32.5857
104
M. Khemchandani et al.
Fig. 9. Comparison of accuracy achieved using existing modelsand proposed V-Net
Accuracy metric typically describes the model’s performance across all categories. It is estimated as the rate of correct predictions to the aggregate/total predictions. Accuracy value achieved using both the proposed V-Net and existing models is depicted in Fig. 9. From this Figure, it could be observed that the prevailing models offered an (i) accuracy of 98.07% (V-Net), (ii) 96.66% (U-Net),(iii) 95.95% (Resnet), and (iii) 94.18% (Segnet) whilst the created V-Net offered 99.58% accuracy. Accuracy outcomes clearly manifested that the V-Net framework surpassed the prevailing model in performance. The IoU metric aids in evaluating the bounding box similarity between the ground truth and the prediction output. In case of IoU, the greater the overlap, the bigger the score, the better the output. The IoU metric also called Jaccard index is an extensively employed evaluation metric for segmentation tasks. It is employed for quantifying the percentage overlap between the prediction result and the targeted mask. The mean_IoU performance of the developed V-Net and existing model is demonstrated in Fig. 10. From this Figure, it could be noticed that the existing models attain the mean IoU score of 81.83% (V-Net), 38.88% (U-Net), 35.35% (Resnet), and 32.58% (Segnet), whilst the developed V-Net offered 85.28% mean_iou scores. The accomplished mean_IoU scores of both these models explicitly illustrated that the V-Net framework exhibited ameliorated performance than the prevailing framework. Precision is another cardinal assessment metric utilized for performance assessment. This metric indicates the proportion of positive instances that are correctly categorized. The precision performance of the developed V-Net and existing models is portrayed in Fig. 11. From this Figure, it could be identified that the existing models acquire the precision of 99.25% (V-Net), 99.09% (U-Net), 98.35% (Resnet), and 95.71% (Segnet), whilst the developed V-Net offered 99.60% precision. The precision outputs clearly ascertained that the V-Net framework superseded the existing framework regarding precision performance.
Cascaded 3D V-Net for Fully Automatic Segmentation
105
Fig. 10. Comparison of mean_IoU achieved using existing models and proposed V-Net
Fig. 11. Comparison of precision achieved using existing model and proposed V-Net
Sensitivity is one among the well-known metrics employed for model’s performance evaluation. Sensitivity measure indicates the proportion of real positive instances that are correctly recognized. The sensitivity values achieved using the developed V-Net and existing models are compared and displayed in Fig. 12. From this Figure, it could be viewed that the prevailing models attain a sensitivity of 97.81% (V-Net), 96.35% (U-Net), 95.58% (Resnet), and 94.71% (Segnet), whilst the developed V-Net exhibited
106
M. Khemchandani et al.
Fig. 12. Comparison of sensitivity achieved using existing models and proposed V-Net
99.47% sensitivity. The accomplished sensitivity outputs explicitly corroborated that the V-Net framework performed superior to the existing frameworks. Figure 13 shows the performance analysis of the V-Net model and prevailing models regarding specificity and F1 score. Higher specificity and F1-score values show the model’s superior performance. In (a), the V-Net model attains a specificity of 99.73%, which is higher than the existing models whereas the existing models attain a specificity of 98.04% (V-Net), 97.08% (U-Net), 96.63% (Resnet), and 94.61% (Segnet). Similarly, in (a), the proposed model attains the F1-score of 98.42, which is greater than the existing models. As of the outcomes, it is viewed that the V-Net model is superior to the existing models. The performance investigation of the V-Net model and the prevailing models regarding dice coefficient loss is exhibited in Fig. 14. Dice loss has been often used for medical image segmentation tasks. The loss obtained by the proposed model is 0.3818 whereas the loss obtained by the existing models is 0.4217 (V-Net), 0.4742 (U-Net), 0.6125 (Resnet), and 0.6482 (Segnet). From these results, it is concluded that the V-Net model is more efficient than the prevailing models. Table 2 portrays the comparative investigation of the V-Net model along with the existing models regarding accuracy. The V-Net model attains an accuracy of 99.58%, which is greater than the prevailing models, whereas the existing models, such as NN, GoogLeNet, and deep autoencoder attain an accuracy of 93.33%, 97.5%, and 98.5%, respectively. From these results, it is concluded that the V-Net model is more efficient than the prevailing models.
Cascaded 3D V-Net for Fully Automatic Segmentation
107
Fig. 13. Performance analysis of the proposed model and existing models (a) Specificity and (b) F1-score
108
M. Khemchandani et al.
Fig. 14. Performance analysis of the proposed model and the existing models
Table 2. Comparative analysis of the proposed model and existing models in terms of accuracy Techniques
Accuracy (%)
Proposed V-Net
99.58
NN
93.33
GoogLeNet
97.5
Deep autoencoder
98.5
5 Conclusion This research paper presented an effective framework for BT segmentation and categorization from brain MRI images based on cascaded 3D V-Net approach. The presented framework effeciently performed brain image preprocessing, segmentation, brain feature extraction and classification. This work employed an improved database for disease analysis. Performance evaluation of the proposed V-Net framework was conducted using distinct measures and a previously-introduced framework was employed chiefly for performance comparison. Experimental outputs manifested that the developed 3D VNet framework outperformed the existing models through achieving 99.58% accuracy, 85.29% mean_IoU, 99.62% precision and 99.47% sensitivity. Data Availability Statement. The datasets are provided by BraTS Challengeand are allowed for personal academic research. The specific link to the dataset is https://ipp.cbica.upenn.edu/.
Cascaded 3D V-Net for Fully Automatic Segmentation
109
References 1. Havaei, M., et al.: Brain tumor segmentation with deep neural networks. Med. Image Anal. 35, 18–31 (2017) 2. Irum, I., Shahid, M. A., Sharif, M., Raza, M.: A Review of Image Denoising Methods. J. Eng. Sci. Technol. Rev. 8(5) (2015) 3. Masood, S., Sharif, M., Yasmin, M., Raza, M., Mohsin, S.: Brain image compression: a brief survey. Res. J. Appl. Sci. Eng. Technol. 5(1), 49–59 (2013) 4. Amin, J., Sharif, M., Raza, M., Yasmin, M.: Detection of brain tumor based on features fusion and machine learning. J. Ambient Intell. Humanized Comput., 1–17 (2018) 5. Manogaran, G., Shakeel, P.M., Hassanein, A.S., Kumar, P.M., Babu, G.C.: Machine learning approach-based gamma distribution for brain tumor detection and data sample imbalance analysis. IEEE Access 7, 12–19 (2018) 6. Mathiyalagan, G., Devaraj, D.: A machine learning classification approach based glioma brain tumor detection. Int. J. Imaging Syst. Technol. 31(3), 1424–1436 (2021) 7. Nazir, M., Shakil, S., Khurshid, K.: Role of deep learning in brain tumor detection and classification (2015 to 2020): A review. Comput. Med. Imaging Graph. 91, 101940 (2021) 8. Amin, J., et al.: Brain tumor detection by using stacked autoencoders in deep learning. J. Med. Syst. 44(2), 1–12 (2020) 9. Noreen, N., Palaniappan, S., Qayyum, A., Ahmad, I., Imran, M., Shoaib, M.: A deep learning model based on concatenation approach for the diagnosis of brain tumor. IEEE Access 8, 55135–55144 (2020) 10. Rehman, A., Naz, S., Razzak, M.I., Akram, F., Imran, M.: A deep learning-based framework for automatic brain tumors classification using transfer learning. Circuits Systems Signal Process. 39(2), 757–775 (2020) 11. Kapoor, L., Thakur, S. (2017, January). A survey on brain tumor detection using image processing techniques. In: 2017 7th international Conference on Cloud Computing, Data Science & Engineering-Confluence (pp. 582–585). IEEE 12. Borole, V.Y., Nimbhore, S.S., Kawthekar, D.S.S.: Image processing techniques for brain tumor detection: A review. International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) 4(5), 2 (2015) 13. Sudharani, K., Sarma, T.C., Prasad, K.S.: Advanced morphological technique for automatic brain tumor detection and evaluation of statistical parameters. Procedia Technol. 24, 1374– 1387 (2016) 14. Amin, J., Sharif, M., Yasmin, M., Fernandes, S.L.: A distinctive approach in brain tumor detection and classification using MRI. Pattern Recogn. Lett. 139, 118–127 (2020) 15. Chahal, P.K., Pandey, S., Goel, S.: A survey on brain tumor detection techniques for MR images. Multimedia Tools and Applications 79(29–30), 21771–21814 (2020). https://doi. org/10.1007/s11042-020-08898-3 16. Saman, S., Jamjala Narayanan, S.: Survey on brain tumor segmentation and feature extraction of MR images. International Journal of Multimedia Information Retrieval 8(2), 79–99 (2018). https://doi.org/10.1007/s13735-018-0162-2 17. Mosquera, J.M., et al.: Novel MIR143-NOTCH fusions in benign and malignant glomus tumors. Genes Chromosom. Cancer 52(11), 1075–1087 (2013) 18. Abd-Ellah, M.K., Awad, A.I., Khalaf, A.A., Hamed, H.F.: A review on brain tumor diagnosis from MRI images: Practical implications, key achievements, and lessons learned. Magn. Reson. Imaging 61, 300–318 (2019) 19. Gordillo, N., Montseny, E., Sobrevilla, P.: State of the art survey on MRI brain tumor segmentation. Magn. Reson. Imaging 31(8), 1426–1438 (2013)
110
M. Khemchandani et al.
20. Hameurlaine, M., Moussaoui, A.: Survey of brain tumor segmentation techniques on magnetic resonance imaging. Nano Biomedicine and Engineering 11(2), 178–191 (2019) 21. Chaudhary, A., Bhattacharjee, V.: An efficient method for brain tumor detection and categorization using MRI images by K-means clustering & DWT. Int. J. Inf. Technol. 12(1), 141–148 (2020) 22. Gull, S., Akbar, S.: Artificial intelligence in brain tumor detection through MRI scans: advancements and challenges. Artif. Intell. Internet of Things, 241–276 (2021) 23. Amin, J., Sharif, M., Yasmin, M., Saba, T., Anjum, M.A., Fernandes, S.L.: A new approach for brain tumor segmentation and classification based on score level fusion using transfer learning. J. Med. Syst. 43(11), 1–16 (2019) 24. Byale, H., Lingaraju, G.M., Sivasubramanian, S.: Automatic segmentation and classification of brain tumor using machine learning techniques. Int. J. Appl. Eng. Res. 13(14), 11686–11692 (2018) 25. Kumar, S., Negi, A., Singh, J.N., Gaurav, A.: Brain tumor segmentation and classification using MRI images via fully convolution neural networks. In: 2018 International Conference on Advances in Computing, Communication Control and Networking (ICACCCN), pp. 1178– 1181 IEEE (2018) 26. Ilyas, N., Song, Y., Raja, A., Lee, B.: Hybrid-danet: an encoder-decoder based hybrid weights alignment with multi-dilated attention network for automatic brain tumor segmentation. IEEE Access 10, 122658–122669 (2022) 27. Kulkarni, S. M., Sundari, G.: A framework for brain tumor segmentation and classification using deep learning algorithm. Int. J. Adv. Comput. Sci. Appl., 11(01) (2020) 28. Díaz-Pernas, F. J., Martínez-Zarzuela, M., Antón-Rodríguez, M., González-Ortega, D.: A deep learning approach for brain tumor classification and segmentation using a multiscale convolutional neural network. In Healthcare Multidisciplinary Digital Publishing Institute 9(2), p. 153). (2019) 29. Habib, H., Mehmood, A., Nazir, T., Nawaz, M., Masood, M., Mahum, R.: Brain Tumor Segmentation and Classification using Machine Learning. In: 2021 International Conference on Applied and Engineering Mathematics (ICAEM), pp. 13–18. IEEE (2021) 30. Raja, P.S.: Brain tumor classification using a hybrid deep autoencoder with Bayesian fuzzy clustering-based segmentation approach. Biocybernetics and Biomedical Engineering 40(1), 440–453 (2020) 31. Thillaikkarasi, R., Saravanan, S.: An enhancement of deep learning algorithm for brain tumor segmentation using kernel based CNN with M-SVM. J. Med. Syst. 43(4), 1–7 (2019) 32. Deepak, S., Ameer, P.M.: Automated categorization of brain tumor from mri using cnn features and svm. J. Ambient. Intell. Humaniz. Comput. 12(8), 8357–8369 (2021) 33. Balasooriya, N. M., Nawarathna, R.D.: A sophisticated convolutional neural network model for brain tumor classification. In: 2017 IEEE International Conference on Industrial and Information Systems (ICIIS), pp. 1–5. IEEE (2017) 34. Mathew, A. R., Anto, P. B.: Tumor detection and classification of MRI brain image using wavelet transform and SVM. In: 2017 International conference on signal processing and communication (ICSPC), pp. 75–78 IEEE (2017) 35. Sharif, M., Tanvir, U., Munir, EU., Khan, M.A., Yasmin, M.: Brain tumor segmentation and classification by improved binomial thresholding and multi-features selectionJ. Ambient. Intell. Humaniz. Comput., 1–20 (2018). https://doi.org/10.1007/s12652-018-1075-x
Computing Physical Stress During Working Shift with Deep Neural Networks Vincenzo Benedetto1,2(B) , Francesco Gissi1,2 , and Marialuisa Menanno1 1
Dept of Engineering, University of Sannio, Viale Traiano 1, 82100 Benevento, Italy {vincenzo.benedetto,frangissi,menanno}@unisannio.it 2 Kebula srl, Via Giovanni Paolo II, 132, 84084 Fisciano, Italy
Abstract. Our work is structured as an experimental support to get evidences about advantages of the usage of artificial intelligence, more specifically computer vision, techniques and pose estimation algorithms in industrial context to optimise production task and to build more ergonomic workstations. The approach is based on a state of the art model for pose estimation; the output of the model is then used as abstraction of people activities during work time. This abstraction allows us to perform mathematical operations on an operator model to assess the level of physical stress and safety during a work shift. Keywords: Physical stress
1
· Pose Estimation · Deep Learning
Introduction
During last 20 years huge interest about human body observation and abstraction moved the effort of research in Artificial Intelligence fields [14–17] to develop more efficient and more accurate models to generate a faithful reproduction of human body, head and hands [4]. The range of applications empowered by these techniques ranges from augmented reality (AR) and virtual reality (VR) to industrial work optimisation and safety assessment. Computer Vision (CV) enabled several applications of pose estimation of people and objects like the tracking of people [9] and construction equipment in a construction site to avoid accidental collision and avoid personal injury. Other improvements have been achieved in intrusion detection and security fields. Think about using pose estimation, powered by CV, to track the entrance of critical buildings to identify the number of people joining the site. Moreover to identify the actions of specific individuals in a specific room, to mark unauthorized behaviours. The state-ofthe-art of open source software tools is represented by OpenPose [3] which computes joint and limb detection from images and videos in real-time. OpenPose is mainly composed by convolutional neural networks. The aim of these networks is to identify local features to identify different parts of human body, even for multi-body detection. Several improvements have been achieved with the introduction of 3D pose estimation algorithms like VideoPose3D [11]. VideoPose3D c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 L. Troiano et al. (Eds.): ICDLAIR 2022, LNNS 670, pp. 111–119, 2023. https://doi.org/10.1007/978-3-031-30396-8_10
112
V. Benedetto et al.
can compute a very accurate representation of human body in 3D enabling its application in many domains, like industrial safety and workstation design. The last one is the application field of pose estimation which drives more interest for this work. There are several works about the usage of pose estimation for people tracing and monitoring during workshift [8]. Our work aims to prove, via experimental evidence the applicability of computer vision and pose estimation model during the design and optimisation of workplace, in order to achieve a better level of physical stress for workers and to assure their safety during working time [1]. Ergonomics and physical stress analysis are strictly related conceptually to warehouse and workplace design and management [2]. The work consists of the application of a 3D pose estimation model to asses the physical stress on working people upper body via a standard risk assessment method, called RULA [13]. Our algorithm analyses the pose estimated model and automatically computes RULA risk index according to its upper body joint angles [6,7,10]. Our experimental case is represented by an operator performing manual activities in a fixed workplace. The activity of the operator has been divided into 14 elementary operations (task). Our workflow will compute the physical stress index for each of the identified elementary operations. The work is structured as follows: – – – –
Section 2 Section 3 Section 4 Section 5
-
Methodology Experimental setup Experimental evidence Conclusion and future directions
Section 2 consists of a description of project workflow, used models, application context, data acquisition process and output evaluation. Section 3 describes the experimental setup in terms of number of monitored tasks and work shift duration. Section 4 reports the experimental results with brief comments. Section 5 draws conclusions and outlines the possible improvements and future directions.
2
Methodology
The following section is intended to illustrate the applied research methodology by providing an in-depth look at the workflow, techniques and models used. Our work involves, as already mentioned, the application of computer vision techniques for the estimation of human body posture through video recordings for the purpose of assessing physical stress levels and safety at work. The execution flow implemented involves the analysis of individual frames obtained from a video capture, in detail: – The model for 3D human pose estimation (HPE) is applied to the single frame – The information obtained from the model is used to calculate the worker’s posture metrics – The RULA risk index is then calculated
Computing Physical Stress During Working
113
The process is repeated for all available frames in order to perform an analysis over time of how posture varies during the performance of job functions and how the passage of time affects the physical stress and safety of the worker. 2.1
3D Human Pose Estimation
In the field of computer vision through artificial intelligence, there are numerous approaches to the problem of estimating the pose of a person. From the most generic basic convolutional networks (CNNs) to complex models involving the use of multiple convolutional networks combined with the use of embedding techniques specific to this application. A SOA model for Human Pose Estimation was chosen for this work, which performs 3D estimation using a model based on time-dilated convolutions from 2D key points [12]. In detail, 2 models are used, the first allows the estimation of the key points of the human pose in 2D, the second, starting from the 2D key points performs a back projection in 3D. The decision to use a model for 3D projection stems from the need to better contextualise the worker’s pose in space while obtaining the highest possible level of accuracy. The 3D projection model exploits both 2D and 3D data for training by applying a semi-supervised process: unlabelled videos are analysed by means of a pre-trained 2D estimation model [18] and, thanks to a projection layer, their transposition into 3-dimensional space is performed. Then an autoencoder reprojects it in 2D, training itself to replicate the original input as best it can. In this way, the lack of availability of labelled 3D data is effectively overcome. During the training of the final model, the supervised and unsupervised components are trained in parallel, splitting the data batch into equal parts. The 3D HPE task in question, involves the use of a 17-point reference skeleton, taken from the well-known H3.6M dataset [5]. The output of the 3D HPE model, provides for each of the 17 identifiable points, an array containing the x,y coordinates of its position within the input image and the confidence of the estimate expressed as a percentage. An example of the output of 3D Human Pose Estimation model is presented in Fig. 1. 2.2
Worker Posture Metrics Computation
The second phase of our workflow provides a framework to automatically compute joint angles from the output of previous phase and provide them the the RULA module, responsible for risk index calculation. The worker posture metrics (WPM) consists of a set of joint angles computed on the 3D model of HPE. Due to RULA indexes and the nature of working task our analysis is focused on HPE of the upper body. In the specific case we compute seven angles:
114
V. Benedetto et al.
Fig. 1. 3D Human pose estimation output
– – – – – –
Right elbow Left elbow Right shoulder Left shoulder Neck frontal bending Spine frontal bending
Each angle of the list is computed for each HPE frame for each task of working shift. Having multiple angle values for each task, we have selected the worst case value for each joint angle. Then the angles are used to fed up the RULA computing module. 2.3
RULA Risk Index Computing
Last phase of the workflow is responsible for converting joint angles into standard values of risk in order to asses the safety and physical stress level of a working shift. Given the values of upper body joint angles, applying the RULA table conversion, we are able to return a qualitative index of physical stress during the work shift. Returned values are three classes of risk: – Low – Medium – High Then a class risk value is computed for each elementary task of the working operation.
Computing Physical Stress During Working
3
115
Experimental Setup
Experimental phase of our work consists of the implementation of defined research methodology. We developed this work section thanks to an international corporation, which manufactures electromechanical components. On their shop floor we recorded an entire work shift of a human operator performing the same main task (assembling a specific component). Main task, has been split into 14 subtask, named Elementary Operations (EO); each EO defines a specific action, or group of action, like a manual for the worker (e.g. “Take the group anchor with your right hand”, “Take the cover with your left hand”, “Insert the group anchor into the cover”). Then videoclips of each EO have been used to feed our workflow through the three steps defined in methodology section. At the end of the workflow execution we get a map of physical stress values for each joint angles and each EO. In next section we present our results in terms of RULA risk index on upper limb of our human operator computed over the joint angles given by human pose estimation model output.
4
Experimental Evidence
In this section we report the results obtained via the proposed framework to assess the physical stress inducted by a work shift in industrial context on human body upper limb. The qualitative risk indexes are assigned to each EO, this could drive further analysis and optimisation of industrial workplaces. Table 1 contains sexagesimal degrees values for each joint angle computed for each elementary operation. Right and left elbows represent the angle between arm and forearm, values near to 90◦ are standard values, despite 0 or to 180 are critical. Right and left shoulders represent the opening angle of the shoulder, high values are critical. Bend neck and spine represents respectively the slope of the head and back, 0◦ is the best value and represents the vertical position of head and spine. The last step of our workflow uses these angle values to compute RULA risk index. Output values are reported in Table 2. The output of the framework highlight Medium risk class for the majority of EOs, and select High risk class for two elementary operations. Infact these two EO are the most stressful for worker upper limb. As presented in Fig. 2, the worker uses a mechanical press to secure the assembled component and in Fig. 3 he moves his right hand away from the desk to put the component in a basket. Other elementary operations consist of manual activities with the end on the work table result less stressful for worker upper limb.
116
V. Benedetto et al.
Table 1. Upper body joint angles divided for each Elementary Operation. Values are in sexagesimal degrees Right elbow Left elbow Right shoulder Left shoulder Bend neck Spine E01
102.7
97.2
68.5
E02 E03
69.9
26.2
8.2
107.4
90.9
68.3
69.3
29.5
7.9
112.4
106.2
68.6
69.1
30.3
6.9
E04
110.3
102.8
64.1
58.5
27.4
3.1
E05
111.1
97.4
63.0
60.9
24.3
2.6
E06
111.6
90.5
63.2
59.9
25.1
4.7
E07
104.6
96.8
63.6
61.2
28.0
5.0
E08
104.1
99.8
63.7
61.8
27.4
6.4
E09
109.3
113.2
64.2
60.8
22.3
2.9
E010
84.0
95.5
67.9
59.8
15.2
5.2
E011
90.0
100.1
69.2
14.5
14.5
4.8
E012 125.1
92.0
83.9
64.1
15.3
9.6
E013 108.6
92.5
68.7
61.7
13.2
6.6
E014
77.6
63.0
69.2
21.8
26.7
38.8
Table 2. RULA stress index values for each Elementary Operation of the main task Elementary Operation Criticality class E01
Medium
E02
Medium
E03
Medium
E04
Medium
E05
Medium
E06
Medium
E07
Medium
E08
Medium
E09
Medium
E010
Low
E011
Medium
E012
High
E013
Medium
E014
High
Computing Physical Stress During Working
Fig. 2. Elementary operation number 12
Fig. 3. Elementary operation number 14
117
118
5
V. Benedetto et al.
Conclusion and Future Directions
As result of this work we developed a semi-automatic physical stress computing framework capable of performing an on site risk assessment (a camera or video record is needed). This framework is based on Deep Learning models so we explored the application of AI technologies in ergonomic and workplace optimisation research areas. Via the developed framework we have performed the assessment of an operating industry identifying some critical points in a specific workflow in the factory that could be optimised, making the workplace more friendly and comfortable for workers. As future works this paper reports some ways worthy of several investigations: – Most critical operations optimisation, via other technologies, like collaborative robots – Combining od AI and VR technologies in Human Pose Estimation to make an interactive realtime risk assessment framework – Integrate and compare other physical stress standard indexes
References 1. Andrews, D.M., Fiedler, K.M., Weir, P.L., Callaghan, J.P.: The effect of posture category salience on decision times and errors when using observation-based posture assessment methods. Ergonomics 55(12), 1548–1558 (2012) 2. Battini, D., Persona, A., Sgarbossa, F.: Innovative real-time system to integrate ergonomic evaluations into warehouse design and management. Comput. Ind. Eng. 77, 1–10 (2014) 3. Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7291–7299 (2017) 4. Fang, W., Love, P.E., Luo, H., Ding, L.: Computer vision for behaviour-based safety in construction: a review and future directions. Adv. Eng. Inf., 43 (2020) 5. Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6M: Large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2014) 6. Kee, D.: An empirical comparison of OWAS, RULA and REBA based on selfreported discomfort. Int. J. Occup. Saf. Ergonomics 26(2), 285–295 (2020) 7. Kong, Y.K., Lee, S.Y., Lee, K.S., Kim, D.M.: Comparisons of ergonomic evaluation tools (ALLA, RULA, REBA and OWAS) for farm work. Int. J. Occup. Saf. Ergonomics 24(2), 218–223 (2018) 8. Li, L., Martin, T., Xu, X.: A novel vision-based real-time method for evaluating postural risk factors associated with musculoskeletal disorders. Appl. Ergonomics 87, 218–223 (2020) 9. Liu, M., Han, S., Lee, S.: Tracking-based 3D human skeleton extraction from stereo video camera toward an on-site safety and ergonomic analysis. Construction Innovation. 16, 348-367 (2016) 10. Nayak, G.K., Kim, E.: Development of a fully automated RULA assessment system based on computer vision. Int. J. Ind. Ergonomics 86, 103218 (2021)
Computing Physical Stress During Working
119
11. Pavllo, D., Feichtenhofer, C., Grangier, D., Auli, M.: 3D human pose estimation in video with temporal convolutions and semi-supervised training. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7753–7762 (2019) 12. Pavllo, D., Feichtenhofer, C., Grangier, D., Auli, M.: 3D human pose estimation in video with temporal convolutions and semi-supervised training, pp. 7745-7754 (2018) 13. Stanton, N., Hedge, A., Brookhuis, K., Salas, E., Hendrick, H.: Handbook of Human Factors and Ergonomics Methods (2004) 14. Troiano, L., Birtolo, C., Armenise, R., Cirillo, G.: Optimization of Menu Layouts by Means of Genetic Algorithms. In: van Hemert, J., Cotta, C. (eds.) EvoCOP 2008. LNCS, vol. 4972, pp. 242–253. Springer, Heidelberg (2008). https://doi.org/ 10.1007/978-3-540-78604-7 21 15. Troiano, L., Rodr´ıguez-Mu˜ niz, L., D´ıaz, I.: Discovering user preferences using dempster-shafer theory. Fuzzy Sets Syst. 278, 98–117 (2015). https://doi.org/10. 1016/j.fss.2015.06.004 16. Troiano, L., Rodr´ıguez-Mu˜ niz, L., Marinaro, P., D´ıaz, I.: Statistical analysis of parametric t-norms. Inf. Sci. 257, 138–162 (2014). https://doi.org/10.1016/j.ins. 2013.09.041 17. Troiano, L., Rodr´ıguez-Mu˜ niz, L., Ranilla, J., D´ıaz, I.: Interpretability of fuzzy association rules as means of discovering threats to privacy. Int. J. Comput. Mathe. 89(3), 325–333 (2012). https://doi.org/10.1080/00207160.2011.613460 18. Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2 (2019). https:// github.com/facebookresearch/Detectron2
Trend Prediction in Finance Based on Deep Learning Feature Reduction Vincenzo Benedetto1,2(B) , Francesco Gissi1,2 , Elena Mejuto Villa1,2,3 , and Luigi Troiano3 1
3
Dept of Engineering, University of Sannio, Viale Traiano 1, 82100 Benevento, Italy 2 Kebula srl, Via della Biblioteca 2, 84084 Fisciano, Italy {vincenzo.benedetto,francesco.gissi}@kebula.it DISA-MIS, University of Salerno, Via Giovanni Paolo II, 132, 84084 Fisciano, Italy [email protected]
Abstract. One of the main features of Deep Learning is to encode the information content of a complex phenomenon in a latent representation space. This represents an element of definite interest in Finance, as it allows time series data to be compressed into a smaller feature space. Among the different models that are used to accomplish this task are Restricted Boltzmann Machines (RBM) and Auto-Encoders (AE). In this paper we present a preliminary comparative study in the use of these techniques in predicting the trend of time series finance. We attempt to outline the impact of architectural and input space characteristics have on the quality of prediction.
Keywords: Restricted Boltzmann Machines series · Finance · trend prediction
1
· Auto-Encoders · time
Introduction
Deep Learning (DL) is gaining increasing interest in the financial industry as it is able to encode the observation of a complex phenomenon into a latent representation space on which to operate. Unlike other approaches [13–16], Deep Learning allows a wider range of information to be taken into account to support the decisions behind investments. This is related to the ability of DL models to correlate a large number of observation points, through effective dimensionality reduction, to a set of independent/unrelated variables that can fully express the phenomenon under consideration. This task falls under the more general class of feature reduction achieved through machine learning. In general terms, feature reduction is based on a mapping between data points in a specific observation space X that has a high number of dimensions, into a space Y with reduced dimensionality. Such mapping is accomplished by an unknown function ρ : X → Y that can be approximated through a machine c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 L. Troiano et al. (Eds.): ICDLAIR 2022, LNNS 670, pp. 120–133, 2023. https://doi.org/10.1007/978-3-031-30396-8_11
Trend Prediction in Finance Based on Deep Learning Feature Reduction
121
Fig. 1. Scheme used for trend prediction
learning process guided by a performance measure that intends to reduce information loss. Under most circumstances, X ⊂n and Y m , with n m. Traditional techniques, such as PCA and the like, determine a linear ρ function [8]. However, to achieve greater expressive power of the latent representation, the ρ function must be nonlinear [10]. Deep Learning, unlike other conventional approaches, allows the identification of nonlinear models specifically designed for feature reduction. In the financial sector, particular attention has been paid to the use of Restricted Boltzmann Machines (RBM) (as in [18]) and Auto-Encoders (AE). The focus of the papers in the literature is strongly on the performance that can be achieved in trend prediction. There has been limited or no attention paid to issues related to the feature reduction stage, in spite of the central role it plays. In this paper we intend to investigate the different aspects that may affect the effectiveness and thus the quality of reduction in the context of trend detection and prediction. In order to schematize the reduction process, we consider the scheme shown in Fig. 1. It takes as input a large number of indicators. These values are first scaled to make them homogeneous and compatible with the model used, then they are passed to the reduction module. The data thus reduced are given as input to a classifier to perform trend prediction. Since our interest does not lie in evaluating the predictor, we will use only a standard SVM [7] to fulfill the purpose. In fact, our goal in this paper is to study the impact of feature reduction on prediction quality. In particular, we will compare RBM and AE with the aim of identifying issues related to data source selection and preprocessing. The remainder of this paper is organized as follows: Sect. 2 provides some preliminaries regarding RBM and AE; Sect. 3 discusses results of experimentation; Sect. 4 outlines conclusions and future directions.
2
Preliminaries
In this sections we provide some preliminary notions regarding Restricted Boltzmann Machines (RBM) and Auto-Encoders (AE), as they are of interest for this paper.
122
2.1
V. Benedetto et al.
Restricted Boltzmann Machine
As depicted by Fig. 2, a Restricted Boltzmann Machine (RBM) is a two layer neural network.
Fig. 2. Restricted Boltzmann Machine
RBMs are designed to work with binary values in 0, 1. The n visible units V = (V1 , . . . , Vn ) make up the input layer and are used to represent the observable data. On the other hand, the m hidden units H = (H1 , ..., Hm ) are used to capture dependencies between observed variables. The relationship between V and H is quantified by means of a weighting matrix W defined on the relationship V × H. RBMs are bidirectional: the values determined at input and hidden units are given by m wi,j vi j = 1..m (1) h j = σ bj + i=1
and
⎛ vi = σ ⎝ai +
n
⎞ wi,j hj ⎠
i = 1..n
(2)
j=1
where σ is the logistic sigmoid, ai and bj the biases. RBMs belong to the class of energy based models (EBM). Indeed, an RBM can be regarded as a Markov random field with associated a bipartite undirected graph. The values given at visible and hidden units can be interpreted in terms of conditional probabilities, that is P (Hj = 1|V ) = hj
j = 1..m
(3)
P (Vi = 1|H) = vi
i = 1..n
(4)
and
Being the RBM based on a bipartite graph, the hidden variables are mutually independent given the visible variables and vice versa. Therefore, the conditional probabilities are given as
Trend Prediction in Finance Based on Deep Learning Feature Reduction
P (V |H) =
n
123
P (Vi |H)
(5)
P (Hj |V )
(6)
i=1
P (H|V ) =
m j=1
They can be both expressed
in terms of joint probability P (V , H) and its marginal probabilities P (V ) = P (V , H) and P (H) = P (V , H). Since H
V
RBM makes use of the logistic sigmoid, the joint probability distribution is given by the Gibbs distribution P (V , H) =
1 E(V ,H ) e Z
(7)
where E(V , H) is named the energy function and defined as E(V , H) = −
n i=1
ai vi −
m j=1
bj h j −
n m
vi wi,j hj
(8)
i=1 j=1
where Z is called partition function and it is a normalizing constant used to assure that probability sums up to 1. RBM can be adapted to process real-valued visible variables by scaling the input data to the unit interval, so that input values are interpreted as a-priori probabilities pi ∈ [0, 1] that Vi = 1. RBM can be trained to replicate an input v. Given the matrix W ∂ log(P (v)) = vi hj − vi hj ∂wi,j
(9)
where hj is obtained by Eq. (1) and vi is obtained by Eq. (1). At each step, the procedure makes use of the Gibbs sampling in order to get the vector h , while v = v0 . Thus, assuming a gradient descendant rule, the update of weights is given as (10) Δwi,j = (vi hj − vi hj ) where is the learning rate. In addition, biases are updated using rules Δa = (v − v ), Δb = (h − h ). At the end of the training, the hidden units h offer a compression of visible inputs v. 2.2
Auto-Encoders
An Auto-Encoder (AE) is a neural network that is trained in order to reconstruct or approximate the input by itself. For this reason, also AEs make use of unsupervised training. AE structure consists of an input layer, an output layer and one or more hidden layers connecting them. For the purpose of reconstructing
124
V. Benedetto et al.
Fig. 3. Autoencoder
the input, the output layer has the same dimension as the input layer, forming a bottleneck structure as depicted in Fig. 3. An AE consists of two parts: one maps the input to a lower dimensional representation (encoding); the other maps back the latent representation into a reconstruction of the same shape as the input (decoding). In the simplest structure there is just one single hidden layer. The AE takes the input x ∈ Rd = X and maps it onto y ∈ Rp : y = σ1 (Wx + b)
(11)
where σ1 is an element-wise activation function such as a sigmoid function or a rectified linear unit. After that, the latent representation y, usually referred to as code, is mapped back onto the reconstruction z = x of the same shape as x: z = σ2 (W y + b )
(12)
Since we are trying to fit the model for replicating the input, the parameters (W, W , b and b ) are optimized so that the average reconstruction error is minimized. This error can be measured by different ways. Among them the squared error: L(x, z) = x − z2 = x − σ2 (W (σ1 (Wx + b)) + b )2
(13)
Trend Prediction in Finance Based on Deep Learning Feature Reduction
125
Instead if the input is interpreted as either bit vectors, i.e., xi ∈ {0, 1} or vectors of bit probabilities, i.e., xi ∈ [0, 1], the cross-entropy of the reconstruction is a suitable solution: L(x, z) = −
d
[xk log zk + (1 − xk ) log(1 − zk )]
(14)
k=1
In order to force the hidden layer to extract more robust features we train the AE to reconstruct the input from a corrupted version of by discarding some of the values. This is done by setting randomly some of the inputs to zero [17]. This version of AE is called Denoising Auto-Encoder.
3 3.1
Experimental Results Input Features and Data Labeling
As experimental setting we considered historical data given by the price series of S&P 500 from 01 Jan 2007 to 01 Jan 2017. The input is made of multiple technical indicators computed over the price series. Table 1 provides the list of indicators used in our experiments (a detailed description can be found in, e.g., [1,2,6]). In order to label the trend, each time t is given with a value y(t) ∈ {+1; −1} for uptrend and downtrend respectively, according to the following rule: given the centered moving average (cMA) of the index at time t by a rolling window [t − 3, t + 3], we assume: y(t) = +1 , if cMA(t) > close(t) and cMA(t + 3) > cMA(t+1) y(t) = −1 , if cMA(t) < close(t) and cMA(t + 3) < cMA(t+1) y(t) = y(t − 1) , otherwise Figure 4 outlines the closing price series together with the trend labels. In order to improve the quality of the training set and the predictability of the model, we focus on periods larger enough to outline a trend (at least 10 days) and consistent with value movements, meaning that an uptrend should entail a positive value increment within the period, while we should have a decrement for downtrends. Periods that do not meet these characteristics are discarded from the training set. Since we are not interested to validate the reduction model, but to test the impact of compression over the following classification model, we used all data to train the reduction model. Instead the SVM has been trained using the 90% of available data and tested over the remaining 10%. Comparisons have been performed using the most recent 10% of data for testing. In order to avoid a bias of results due to the selection of most recent period, the final comparison between RBM and AE has been performed using a 10-fold cross validation.
126
V. Benedetto et al. Table 1. List of indicators Indicator name
Type of Indicator
Absolute Price Oscillator (APO)
Momentum
Aroon
Momentum
Aroon Oscillator
Momentum
MESA Adaptive Moving Average (MAMA)
Overlap studies
Average Directional Movement Index (ADX)
Momentum
Average Directional Movement Index Rating
Momentum
Average True Range (ATR)
Volatility
Balance of Power (BOP)
Momentum
Bollinger Bands (BBANDS)
Overlap studies
Bollinger Bandwidth
Overlap studies
%B Indicator
Overlap studies
Chaikin A/D Oscillator
Volume
Chande Momentum Oscillator (CMO)
Momentum
Commodity Channel Index (CCI)
Momentum
Directional Movement Index
Momentum
Double Exponential Moving Average (DEMA)
Overlap studies
Exponential Moving Average (EMA)
Overlap studies
Kaufman’s Adaptive Moving Average (KAMA)
Overlap studies
Minimum and Maximum value over period
–
Moving Average (MA)
Momentum
Moving Average Convergence/Divergence (MACD) Momentum Momentum
3.2
Momuentum
Money Flow Index (MFI)
Momuentum
On Balance Volume
Volume
Percentage Price Oscillator (PPO)
Momuentum
Plus Directional Indicator
Momuentum
Plus Directional Movement
Momuentum
Relative Strength Index (RSI)
Momuentum
Relative Vigor Index (RVI)
Momuentum
Rate of change ratio (ROC)
Momuentum
Parabolic SAR
Overlap studies
Stochastic Oscillator
Momentum
Triple Exponential Moving Average (TEMA)
Overlap studies
Triangular Moving Average (TRIMA)
Overlap studies
1-day ROC of a Triple Smooth EMA (TRIX)
Momentum
Ultimate Oscillator
Momuentum
Weighted Moving Average (WMA)
Overlap studies
Williams’ Percent Range (%W)
Momuentum
Experiment Setting
All the experiments are carried out on a workstation equipped with an Intel Xeon Processor E5 v3 Family, 3.5 GHz x8, 16 GB RAM and GPU GeForce GTX 980 Ti with 6 GB RAM on board. The framework has been developed in Python. The implementation of AE and RBM are based on Theano [5], while SVM is based
Trend Prediction in Finance Based on Deep Learning Feature Reduction
127
Fig. 4. S&P 500 index with trend labeling (normalized values; gray bands for uptrend and white bands for downtrend).
on scikit-learn library for machine learning [3]. All indicators are calculated by using TA-Lib, a technical analysis library [4]. 3.3
Model Fitting
In order to accomplish the feature reduction we make use of AE and RBM as preliminary to a SVM classifier used for prediction. A comparison between AE and RBM have been done in terms of accuracy of prediction and time required to train the network. The training of AE is based on Backpropagation algorithm with stochastic descendant gradient for updating weights and Cross-Entropy as loss function. The weighting matrix W is initializedwith values uniformly sampled in the interval [−4 6/(nvisible + nhidden ), +4 6/(nvisible + nhidden )] and the biases are initialized to 0, as suggested by [9] for sigmoid activation functions. The algorithm used to train the RBM has been the gradient-based persistent contrastive divergence learning procedure [12]. In this case the initial values of the weights are chosen from a zero-mean Gaussian with a standard deviation of 0.01 as suggested by Hinton in [11]. Hidden and visible biases are initialized to 0. In both cases we divide the training set into small mini-batches to accelerate the computation. Also we have introduced an adaptive learning rate that is exponentially decreasing by means of a constant decay-rate, in the expectation of improving both accuracy and efficiency of the training procedures. We use a form of regularization for early-stopping to avoid over-fitting issues. 3.4
Performance Results
Alternatives are compared by means of prediction accuracy, that is the number of correct labels over the overall number of labels.First we consider two important aspects regarding input data: diversity and scaling.
128
V. Benedetto et al.
Diversity. The first question we consider is how important is to use diverse sources of information, where by “diverse” we mostly mean independent or uncorrelated. To investigate this aspect we first consider only cross-over indicators obtained by crossing different slower and faster moving averages (MA). In particular, we consider 11 faster MAs (with all periods within the range [5, 15]) and 11 slower MAs (with periods uniformly distributed within the range [20, 30]). This leads to 121 indicators obtained by the different combinations of MAs. In Table 2 and Fig. 5 we report the accuracy obtained for both AE and RBM with a varying number of hidden neurons. The performance obtained of both AE and RBM are comparable, and it tends to improve by increasing the number of hidden neurons. Table 2. Prediction accuracy with linearized crossovers as input features Number of hidden neurons AE
RBM
1 3 5 10 15 25 40 50 60 70 80 90 100 110
65.27% 68.23% 68.57% 69.3% 69.79% 70.36% 70.02% 70.41% 70.41% 70.53% 70.41% 70.08% 70.69% 70.08%
65.04% 67.06% 67.23% 69.29% 69.74% 70.53% 70.86% 71.19% 71.36% 72.15% 72.26% 72.54% 72.26% 72.36%
Alla 69.69% In this case the indicators are provided directly to the SVM classifier
a
The low accuracy values obtained and the low speed in their improvement by a higher number of neurons suggest that diversity can play a relevant role. Indeed, even an high number of features may convey few information when sources are highly correlated. In Table 3 we report accuracy obtained by using the whole set of available indicators. Some of them are parametric with respect to the look-back period. For those we assumed different periods, namely n = 3, 14, 30, in the aim of enriching the available information. We also included the adjusted closing price
Trend Prediction in Finance Based on Deep Learning Feature Reduction
129
Fig. 5. Prediction accuracy with linearized crossovers as input features.
and volume of the index. This leads to collect a total of 93 source, each providing a specific day-by-day feature. By looking at the results, we can observe a substantial improvement of accuracy and an initial differentiation between AE and RBM. Table 3. Prediction accuracy with linearized indicators as input features Number of hidden neurons AE
RBM
1 3 5 10 15 25 30 40 50 60 70 80 90
62.22% 71.43% 72.5% 73.62% 74.02% 73.33% 73.61% 73.56% 73.73% 73.56% 73.62% 73.21% 73.44%
61.98% 69.87% 71.54% 75.4% 76.61% 79.49% 82.03% 80.99% 81.74% 81.74% 81.92% 81.74% 81.62%
All a 72.40% In this case the indicators are provided directly to the SVM classifier
a
Scaling. AE and RBM require to scale input values. This is generally done by a standard max/min normalization. However, other possibilities are available. Here we consider a rescaling over the unit interval [0, 1] obtained by means of the empirical cumulative distribution function (ECDF). The procedure consists in calculating the ECDF for each individual feature and then to assign to
130
V. Benedetto et al.
Fig. 6. Prediction accuracy with linearized indicators as input features.
each instant of time t its corresponding value of the ECDF. The expectation is that max/min normalization keeps unchanged the density of data points, so that information is not uniformly distributed over the unit interval. Instead, the scaling offered by ECDF is able to better distribute data points and this may contribute to improve performances. Table 4 and Fig. 7 show the results of this experiment. Accuracy shows an actual improvement, supporting the initial hypothesis that ECDF offers a better scaling than standard normalization. Table 4. Prediction accuracy with indicators scaled by means of the ECDF of their own distribution Number of hidden neurons AE
RBM
1 3 5 10 15 25 30 40 50 60 70 80 90
62.56% 72.11% 72.75% 75.51% 74.82% 77.02% 76.32% 76.49% 75.92% 76.15% 75.98% 76.79% 76.05%
61.98% 70.04% 73.16% 78.11% 78.4% 83.64% 85.54% 86.18% 86.75% 84.85% 85.88% 84.97% 85.42%
All a 73.44% In this case the indicators are provided directly to the SVM classifier
a
Trend Prediction in Finance Based on Deep Learning Feature Reduction
131
Fig. 7. Prediction accuracy with indicators scaled by means of the ECDF of their own distribution.
3.5
AE vs. RBM
From all experiments above we can observe how there exists an optimal number of hidden neurons, over that performances do not improve or slightly decrease. That is the optimal dimensionality of the embedding performed by AE and RBM. In general, according to our experience, AE is able to reach higher dimensions. This might be the reason of better performances offered by the feature reduction based on AE. In order to validate this finding, we compare both the networks in their best case (50 hidden neurons for AE and 25 hidden neurons for RBM, all indicators used as input, rescaled by means of ECDF) using a 10-fold cross validation procedure. In Table 5 we report the results. They outline a consistent out-performance of AE versus RBM, resulting AE more accurate and faster to train. Table 5. Prediction accuracy and training time obtained with k-fold Accuracy
Training time
k
AE
RBM
AE
RBM
1 2 3 4 5 6 7 8 9 10
86.75% 85.83% 85.6% 85.25% 85.77% 85.02% 86.06% 86.17% 86.23% 85.66%
77.02% 76.73% 76.09% 75.57% 75.63% 75.34% 75.51% 76.09% 76.04% 76.4%
16.59 sec 16.63 sec 15.69 sec 16.27 sec 15.98 sec 17.16 sec 17.29 sec 16.16 sec 17.69 sec 16.25 sec
187.16 sec 189.09 sec 188.38 sec 186.62 sec 185.5 sec 184.14 sec 186.75 sec 184.81 sec 183.36 sec 183.94 sec
132
4
V. Benedetto et al.
Conclusions and Future Works
Financial prediction problems often involve large data sets with complex data interactions. DL can detect and exploit these interactions that are inherently non linear and, at least currently, cannot be modeled by any existing financial economic theory. A way to do that is by identifying the underlying geometric manifold in a dimensionally reduced feature space space by means of machine learning. In this paper we investigated the application of Auto-Encoders and Restricted Boltzmann Machines able to better accomplish this task than linear methods such as PCA. The two methods have been compared in terms of trend prediction accuracy. Experiments have shown that a preliminary pre-processing of input data plays an important role. In particular values should be remapped over the unit interval [0, 1] taking into account their distribution of frequencies. This improves the accuracy with respect to a simple max/min normalization. In addition, diversity of input sources is crucial as well. With respect architectures, AE performs generally better than RBM, and its training takes shorter time. Both show an optimal number of neuron, below that the feature reduction underperforms because of model underfitting, and over that value because of overfitting. The optimal cardinality of embedding neurons is larger in the case of AE, and this could explain why performances are better, as AE is able to learn a higher dimensional structure in the input data. In both architecture an adaptive learning rate is highly beneficial for improvement. Experimental results obtained so far are preliminary and many questions are left open. Among them if using a staked AE, i.e., made of multiple hidden layers may lead to an improvement.
References 1. 2. 3. 4. 5. 6. 7. 8. 9.
10. 11.
Exploration tool for trading. https://www.amibroker.com/index.html Platform for forex and exchange markets. https://www.metatrader5.com/en scikit-learn: Machine learning in python. http://scikit-learn.org/stable/ TA-Lib : technical analysis library. http://ta-lib.org/ Theano. http://deeplearning.net/software/theano/ Web for financial charts. http://stockcharts.com/ Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995) Cunningham, J.P., Ghahramani, Z.: Linear dimensionality reduction: survey, insights, and generalizations. J. Mach. Learn. Res. 16(1), 2859–2900 (2015) Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS) (2010) Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006) Hinton, G.E.: A practical guide to training restricted Boltzmann machines. In: Montavon, G., Orr, G.B., M¨ uller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 7700, pp. 599–619. Springer, Heidelberg (2012). https://doi. org/10.1007/978-3-642-35289-8 32
Trend Prediction in Finance Based on Deep Learning Feature Reduction
133
12. Tieleman, T.: Training restricted Boltzmann machines using approximations to the likelihood gradient. In: Proceedings of the 25th International Conference on Machine Learning (ICML), pp. 1064–1071 (2008) 13. Troiano, L., Birtolo, C., Armenise, R., Cirillo, G.: Optimization of menu layouts by means of genetic algorithms. In: van Hemert, J., Cotta, C. (eds.) EvoCOP 2008. LNCS, vol. 4972, pp. 242–253. Springer, Heidelberg (2008). https://doi.org/10. 1007/978-3-540-78604-7 21 14. Troiano, L., Rodr´ıguez-Mu˜ niz, L., D´ıaz, I.: Discovering user preferences using dempster-Shafer theory. Fuzzy Sets Syst. 278, 98–117 (2015) 15. Troiano, L., Rodr´ıguez-Mu˜ niz, L., Marinaro, P., D´ıaz, I.: Statistical analysis of parametric t-norms. Inf. Sci. 257, 138–162 (2014) 16. Troiano, L., Rodr´ıguez-Mu˜ niz, L., Ranilla, J., D´ıaz, I.: Interpretability of fuzzy association rules as means of discovering threats to privacy. Int. J. Comput. Math. 89(3), 325–333 (2012) 17. Pascal, V., Larochelle, H., Bengio, Y., Manzagol, P.-A.: Extracting and composing robust features with denoising autoencoders, pp. 1096–1103 (2008) 18. Cai, X., Hu, S., Lin, X.: Feature extraction using restricted Boltzmann machine for stock price prediction. In: IEEE International Conference on Computer Science and Automation Engineering (CSAE), vol. 3, pp. 80–83 (2012)
A Preliminary Study on AI for Telemetry Data Compression Gioele Ciaparrone2 , Vincenzo Benedetto1,2(B) , and Francesco Gissi1,2 1
Department of Engineering, University of Sannio, Viale Traiano, 1, 82100 Benevento, Italy 2 Kebula srl, Via della Biblioteca 2, 84084 Fisciano, Italy {gioele.ciaparrone,vincenzo.benedetto,francesco.gissi}@kebula.it
Abstract. Compression of telemetry streams is fundamental for both their storage and transmission. Recently, machine learning has been employed to enhance traditional data compression algorithms specifically for telemetry compression, both lossless and lossy. However, state-of-theart telemetry compression algorithms are usually tailored to work with very specific datasets and can hardly generalize to different datasets. Moreover, much simpler traditional algorithms can often obtain better compression ratios with less computational complexity. In this work, we attempt a preliminary experiment aiming to verify the effectiveness of one of the most representative AI-based lossless telemetry compression algorithms against three different NASA datasets. Experimental results show that the model still struggles to perform better than traditional approaches, highlighting the necessity to design and study more sophisticated machine learning models for telemetry compression with a broader applicability.
Keywords: telemetry
1
· data compression · machine learning · LSTM
Introduction
Telemetry compression is important for storage or transmission of data, especially in cases where transmission bandwidth is limited. Compression techniques can be divided into lossless and lossy. Lossless compression allows to perfectly reconstruct the original data. Lossy compression can obtain better compression ratios while sacrificing reconstruction accuracy and can be useful in cases where some information loss is acceptable. Machine learning (ML) and deep learning (DL) have succeeded in a wide variety of task in recent years [3–5,11,14,15,23–27]. Following these successes, ML techniques have been developed for telemetry compression, both for lossless and lossy strategies. However, most of the works in the literature present compression algorithms that are designed to work on a specific dataset and do not appear to generalize on other datasets. Moreover, datasets and code are usually not available to the c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 L. Troiano et al. (Eds.): ICDLAIR 2022, LNNS 670, pp. 134–143, 2023. https://doi.org/10.1007/978-3-031-30396-8_12
A Preliminary Study on AI for Telemetry Data Compression
135
public, making reproducibility an issue. The goal of this work is to evaluate the generalization capabilities of one of the state-of-the-art ML-based lossless compression algorithms on various telemetry datasets. We show that traditional, simpler techniques, such as delta coding, work better than currently existing ML-based compression algorithms on datasets different from the ones they were originally designed for. The work is organized as follows. Section 2 briefly describes some of the related works in the literature. Section 3 describes the model and data used in the experiments. Section 4 describes the experiments performed and the results obtained, which are discussed in Sect. 5. Section 6 concludes by summarizing the work and presenting some final remarks.
2
Related Work
ML techniques have been applied to both lossless and lossy telemetry compression. Regarding lossless compression, Shi et al. [20] proposed a differential clustering algorithm in order to compress aerospace telemetry data. Consecutive telemetry vectors are considered to be part of the same cluster if the redundancy distribution characteristic (RDC) of the difference vector is below a certain threshold. The RDC indicates the amount of compression gained by using run-length encoding (RLE). Cluster heads are encoded using the LZW compression algorithm, while cluster members are encoded by using RLE on the difference vectors between them and the cluster heads. Levenets et al. [10] proposed the use of a multi-layer perceptron to select one of 3 possible traditional, non-ML compression algorithms. The network seemed to perform better than other numerical-based selection criteria on the selected dataset. Shehab et al. [19] used an LSTM network in order to decorrelate the telemetry data, which could then be compressed using a traditional algorithm, such as arithmetic coding, Rice coding or Huffman coding. The LSTM predicts the next telemetry frame and the error between the prediction and the actual data is transmitted, after compression with the aforementioned algorithms. If the LSTM has good prediction accuracy, the distribution of the errors will have lower entropy than the original data, leading to a better compression ratio when combined with entropy coding algorithms. Regarding lossy compression, various algorithms have employed autoencoders in order to compress data into a lower-dimensionality space [6,17,21,22,29]. Other approaches include the use of polynomial fitting and Markov chain models, like the work presented by Baig et al. [2]. In general, however, many researchers have focused on lossless compression, since most applications need unaltered telemetry data, e.g. for troubleshooting issues.
136
3
G. Ciaparrone et al.
Materials and Methods
For this work, we focused on lossless compression algorithms. In particular, we chose to evaluate the effectiveness of the LSTM-based algorithm presented in [19], since it seems to be the one that can be most easily generalized to different data configurations. We selected 3 telemetry-like datasets. The first two datasets are part of the NASA Prognostics Center of Excellence Data Set Repository [13], specifically the Battery dataset [18] and the Turbofan Engine Degradation Simulation-2 dataset [1]. The third dataset is extracted from the Mars Science Laboratory (MSL) Rover Environmental Monitoring Station (REMS) Experiment Data Record (EDR) [12]. For the Battery dataset, we managed1 to download charge/discharge data from 12 different Li-ion batteries. We used 4 of those batteries for LSTM training/validation, while the remaining 8 were used for the data compression tests. For each battery, we used the “Current measured” time series. In particular, we rounded the value to 3 decimal figures, multiplied it by 1000 and stored it as a 16-bit integer. The reason is that this transformation can be seen as a kind of quantization, which makes data generally more easily compressible by reducing the possible values that each variable can take. For the LSTM-based algorithm we are testing, this results in a higher probability that small error values assume the exact same value (thus reducing entropy) as opposed to using real values represented by 32-bit or 64-bit floating point numbers. The Turbofan Engine degradation dataset contains 1416 multivariate time series (709 for training and 707 for testing) obtained using the Commercial Modular Aero-Propulsion System Simulation dynamical model. All data is in text format and has a limited number of decimal places. So, similarly to the Battery dataset, we converted the data into integers by multiplying each column by the smallest power of 10 that ensured the output value to be an integer. Since each data sample at each time step contained multiple values with different numerical ranges, the data was normalized in order to ease the model training process and to balance the weight of each variable in the loss computation. The MSL dataset contains sensor telemetry from the Curiosity rover on Mars. We selected the temperature reading as the time series to compress. The raw data consists of 16-bit integers. We used data from Sol 90 to Sol 269. We ensured that all the used time series contained data points collected at the exact same time intervals between each other. We extracted in total 928 training sequences, 464 validation sequences and 464 test sequences. Every sequence is a 308-d vector, encompassing the temperature readings for an entire Sol. Following [19], we defined a simple recurrent neural network to learn the time series patterns. We evaluated various configurations of the model. The first 1
The original dataset download page was taken down and NASA is currently in the process of putting data back online (https://www.nasa.gov/content/prognosticscenter-of-excellence-data-set-repository). We were only able to retrieve data from 12 batteries.
A Preliminary Study on AI for Telemetry Data Compression
137
section of the model contains 1 or 2 consecutive LSTM layers, followed by 1 or 2 fully-connected (FC) layers. We evaluated different sizes of the hidden state of the LSTM and of the first FC layer (when 2 FC layers are present). We also tested the effect of adding or removing a ReLU activation after the LSTM and the first FC layer, and an optional residual connection from the input layer to the final layer. Different learning rate schedules were also tested. Similarly to [19], the overall compression algorithm works by encoding the RNN errors instead of the original data, with the goal of reducing data entropy and increase the efficiency of entropy-based compression algorithm. Specifically, the LSTM is first trained to forecast the value of the telemetry stream at the next step. The weights of the network are then frozen and are assumed to be known by both the compressor and the decompressor. At compression time, a preliminary re-encoding of the original telemetry data is first performed, in which the difference between the actual telemetry value and the predicted value is computed (i.e. the prediction error). These values are then fed to a traditional compression algorithm, which performs the actual compression step. Decompression is performed symmetrically: the same traditional compression algorithm is used to decompress the data, which now represents the prediction errors of the RNN. The RNN is then used step-by-step to forecast the next data sample, and the decoded prediction errors are used to reconstruct the original telemetry data. We evaluated numerous traditional compression algorithms in order to verify the effect of the RNN-encoded data on the final compression ratio (CR). Note size . that in this work we define the CR as sizeuncompressed compressed
4
Experimental Results
For each dataset, we compared the different compressed data sizes in 3 conditions: 1) compression of the original data, 2) compression of the delta-encoded data, 3) compression of the RNN-encoded data. Delta encoding consists in compressing the difference between consecutive data samples instead of encoding the original data samples. It is a simple and commonly-used way of reducing entropy in a data stream. We compared various traditional compression algorithms, including Arithmetic coding [16], DEFLATE [7], Golomb coding [8], Huffman coding [9] (2 different configurations), LZ77 [30] (3 different implementations), LZW [28], RunLength Encoding (RLE), bit-wise RLE combined with Golomb coding. Every compression algorithm was applied on a byte-wise or bit-wise level, depending on the type of algorithm, except where specified in the results tables (e.g. symbolwise Huffman coding used each original symbol, a 16-bit integer, as a unique tree element, as opposed to using 8-bit values as symbols in the byte-wise version). Note that the implementations used for the compression algorithms have very different levels of optimization, thus a fair running time comparison is not currently possible.
138
G. Ciaparrone et al.
We present here the results obtained on the three described datasets. The last column in each table shows the results obtained by using the best performing RNN trained on that dataset, that is, the RNN that obtained the highest compression ratio with any compression algorithm. All networks were trained using MSE loss and Adam optimizer. Table 1 summarizes the results for the Battery dataset. The best RNN has 2 FC layers, hidden size 8 (for both LSTM and FC layers) and 1 LSTM layer. It also has ReLU activation before and after the first fully-connected layer and the residual connection mentioned earlier. The training batch size was set to 512. Table 1. Results on the batteries dataset. Compression algorithm
Compression ratio by encoding method Original data Delta encoded RNN encoded
No compression Arithmetic DEFLATE Golomb Byte-wise Huffman Symbol-wise Huffman LZ77 FASTLZ (LZ77) LZW RLE Bit-wise RLE + Golomb
1.000 1.456 2.681 0.606 1.451 1.848 1.376 1.742 1.919 0.514 1.089
1.000 3.145 3.745 0.426 3.096 4.785 1.235 2.110 3.584 0.702 0.886
1.000 1.715 2.786 0.564 1.706 2.257 1.289 1.776 2.096 0.585 1.012
Table 2 summarizes the results for the Engine Degradation dataset. We obtained the best results using a network with 1 FC layer, hidden size 64 and 2 LSTM layers. Other versions of the network obtained similar results, and in general did not improve the performance of delta coding. The training batch size was set to 256 for the model referred to in the table. For the Engine Degradation dataset, we also trained and tested an RNN to encode the output from delta encoding. While it obtained better results than the RNN without delta encoding, delta encoding alone is still superior, as shown in Table 3. Table 4 summarizes the results for the MSL dataset, obtained using a network with ReLU and residual connections. Different structures of the network obtained very similar results (due to the issue discussed next); the one shown in the table has 1 FC layer, hidden layer size of 4 and 2 LSTM layers. The networks trained on the MSL dataset used a Cosine Annealing schedule with warm restarts. The training batch size was set to 512. Using a batch size of 32 produced analogous results.
A Preliminary Study on AI for Telemetry Data Compression
139
Table 2. Results on the engine degradation dataset.
5
Compression algorithm
Compression ratio by encoding method Original data Delta encoded RNN encoded
No compression Arithmetic DEFLATE Golomb Byte-wise Huffman Symbol-wise Huffman LZ77 FASTLZ (LZ77) LZW RLE Bit-wise RLE + Golomb
1.000 1.152 1.650 0.507 1.148 1.339 0.890 1.242 1.274 0.506 0.989
1.000 1.464 1.642 0.451 1.456 1.546 0.959 1.208 1.274 0.621 0.947
1.000 1.208 1.205 0.378 1.205 1.227 0.890 1.009 0.964 0.508 0.847
Discussion
While RNN encoding works slightly better than compressing the original data in some cases, we can see that, in general, it does not produce better results than the much simpler delta encoding. This is due to the fact that, despite our best efforts, the RNN was unable to predict interesting patterns in the data. We can see some example predictions in Fig. 1. In the first case (Fig. 1a), the RNN predicted a constant value until the final part of the time series. In the second example (Fig. 1b) there is larger variance in the prediction, that also converges towards the target average value only at the end of the time series. The last example (Fig. 1c) shows an issue that happens in some cases when the residual link is enabled: the RNN learns to predict the input, with negligible difference. As the difference approaches zero, the error value that is encoded is equivalent to the difference between the current time step and the previous time step, thus obtaining results analogous to delta coding. In summary, the RNN was in general unable to learn the patterns of the time series in the three datasets we selected. This might be due to the lack of sufficient information for the network to use in order to correctly predict the value of a variable at the next time step. In addition to that, the necessity to compute statistics on the training set of the engine degradation dataset for data scaling might have contributed to predictions being offset from the actual values, since mean and variance might be different at test time. A further observation can be made regarding some of the traditional algorithms: DEFLATE is usually the best performing algorithm, with LZW and Huffman also performing relatively well. This does not surprise, since DEFLATE uses Huffman coding after a modified LZ77 “pre-processing” step. At the same time, RLE and Golomb coding did not perform well in any situation, but this is also expected, since those algorithms are designed to work for very specific data
140
G. Ciaparrone et al.
Fig. 1. Examples of RNN predictions.
A Preliminary Study on AI for Telemetry Data Compression
141
Table 3. Comparison between delta encoding and RNN + delta encoding on the engine degradation dataset. Compression algorithm
CR by encoding method Delta encoded Delta + RNN
No compression
1.000
1.000
Arithmetic
1.464
1.247
DEFLATE
1.642
1.279
Golomb
0.451
0.378
Byte-wise Huffman
1.456
1.241
Symbol-wise Huffman
1.546
1.274
LZ77
0.959
0.914
FASTLZ (LZ77)
1.208
1.047
LZW
1.274
0.989
RLE
0.621
0.556
Bit-wise RLE + Golomb 0.947
0.857
distributions (RLE needs many consecutive bits or bytes - depending on the algorithm version - having the same value, while Golomb needs data following a geometric distribution), and perform poorly on “generic” data. This highlights a problem regarding the current state-of-the-art AI-based telemetry data compression algorithms. Depending on the dataset, different encoding strategies are needed in order to efficiently compress data. Encoding blocks of different length, data of different data types or that follows particular distributions, may obtain completely different results using the same compression algorithm. For this reason, current AI-based approaches for telemetry compression seem to generalize poorly to other datasets, it is always fundamental to properly understand the data at hand and carefully choose an appropriate compression strategy. Table 4. Results on the MSL dataset. Compression algorithm
Compression ratio by encoding method Original data Delta encoded RNN encoded
No compression
1.000
1.000
1.000
Arithmetic
1.126
2.611
2.584
DEFLATE
1.534
4.525
4.444
Golomb
0.421
0.411
0.406
Byte-wise Huffman
1.125
2.558
2.532
Symbol-wise Huffman
1.252
3.861
3.817
LZ77
1.018
1.511
1.506
FASTLZ (LZ77)
1.144
2.273
2.262
LZW
1.014
3.968
3.906
RLE
0.502
0.652
0.653
Bit-wise RLE + Golomb 0.878
0.862
0.857
142
6
G. Ciaparrone et al.
Conclusions
Telemetry compression is fundamental to reduce transmission and storage footprints of large telemetry data streams. Recent works have tested the use of machine learning based algorithms for data compression. However, by picking one of those algorithms as an example, we showed that it is often hard to reproduce good results using a different dataset. Moreover, traditional algorithms, combined with simple techniques such as delta coding, may often obtain better compression ratios, without the need for complex algorithms, which also have the downside of being time and resource intensive. However, we believe that more research is needed in this direction and that AI-based compression algorithms have potential as part of efficient compression algorithms for certain instances of telemetry streams. In addition to that, we only evaluated the generalization capabilities of a lossless compression algorithm. Studying the range of applicability of AI-based lossy compression algorithms, currently dominated by the use of autoencoders, constitutes an interesting future direction of research.
References 1. Arias Chao, M., Kulkarni, C., Goebel, K., Fink, O.: Aircraft engine run-to-failure dataset under real flight conditions for prognostics and diagnostics. Data 6(1), 5 (2021) 2. Baig, S.R., Iqbal, W., Berral, J.L., Erradi, A., Carrera, D., et al.: Real-time data center’s telemetry reduction and reconstruction using Markov chain models. IEEE Systems J. 13(4), 4039–4050 (2019) 3. Brown, T., et al.: Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020) 4. Carvalho, T.P., Soares, F.A., Vita, R., Francisco, R.P., Basto, J.P., Alcal´ a, S.G.: A systematic literature review of machine learning methods applied to predictive maintenance. Comput. Indus. Eng. 137, 106024 (2019) 5. Ciaparrone, G., S´ anchez, F.L., Tabik, S., Troiano, L., Tagliaferri, R., Herrera, F.: Deep learning in video multi-object tracking: a survey. Neurocomputing 381, 61–88 (2020) 6. Del Testa, D., Rossi, M.: Lightweight lossy compression of biometric patterns via denoising autoencoders. IEEE Signal Process. Lett. 22(12), 2304–2308 (2015) 7. Deutsch, P.: Deflate compressed data format specification version 1.3. Tech. Rep., RFC Editor (1996) 8. Golomb, S.: Run-length encodings (corresp.). IEEE Trans. Inf. Theory 12(3), 399– 401 (1966) 9. Huffman, D.A.: A method for the construction of minimum-redundancy codes. Proc. IRE 40(9), 1098–1101 (1952) 10. Levenets, A., Chye, E.U., Bogachev, I.: Application of machine learning methods for classification of telemetric frames by compression algorithms. In: 2019 1st International Conference on Control Systems, Mathematical Modelling, Automation and Energy Efficiency (SUMMA), pp. 505–509. IEEE (2019)
A Preliminary Study on AI for Telemetry Data Compression
143
11. Minaee, S., Boykov, Y.Y., Porikli, F., Plaza, A.J., Kehtarnavaz, N., Terzopoulos, D.: Image segmentation using deep learning: a survey. IEEE Trans. Pattern Anal. Mach. Intell. PP(99), 3059968 (2021) 12. NASA: PDS atmospheres node data set catalog. https://pds-atmospheres.nmsu. edu/cgi-bin/getdir.pl?&volume=mslrem 0001. Accessed 30 Sept 2022 13. NASA: Prognostics center of excellence data set repository. https://www.nasa.gov/ content/prognostics-center-of-excellence-data-set-repository. Accessed 30 Sept 2022 14. Ni, J., Young, T., Pandelea, V., Xue, F., Cambria, E.: Recent advances in deep learning based dialogue systems: a systematic survey. Artif. Intell. Rev. 56, 3055– 3155 (2022) 15. Rawat, W., Wang, Z.: Deep convolutional neural networks for image classification: a comprehensive review. Neural Comput. 29(9), 2352–2449 (2017) 16. Rissanen, J., Langdon, G.G.: Arithmetic coding. IBM J. Res. Dev. 23(2), 149–162 (1979) 17. Russell, M., Wang, P.: Physics-informed deep learning for signal compression and reconstruction of big data in industrial condition monitoring. Mech. Syst. Signal Process. 168, 108709 (2022) 18. Saha, B., Goebel, K.: Battery data set. NASA AMES Prognostics Data Repository (2007) 19. Shehab, A.F., Elshafey, M.A., Mahmoud, T.A.: Recurrent neural network based prediction to enhance satellite telemetry compression. In: 2020 IEEE Aerospace Conference, pp. 1–11. IEEE (2020) 20. Shi, X., Shen, Y., Wang, Y., Bai, L.: Differential-clustering compression algorithm for real-time aerospace telemetry data. IEEE Access 6, 57425–57433 (2018) 21. Sun, B., Feng, H.: Efficient compressed sensing for wireless neural recording: a deep learning approach. IEEE Signal Process. Lett. 24(6), 863–867 (2017) 22. Sunil Kumar, K., Shivashankar, D., Keshavamurthy, K.: Bio-signals compression using auto encoder. J. Electr. Comput. Eng. Q 2, 424–433 (2021) 23. Tang, B., Pan, Z., Yin, K., Khateeb, A.: Recent advances of deep learning in bioinformatics and computational biology. Front. Genet. 10, 214 (2019) 24. Troiano, L., Birtolo, C., Armenise, R., Cirillo, G.: Optimization of menu layouts by means of genetic algorithms. In: van Hemert, J., Cotta, C. (eds.) EvoCOP 2008. LNCS, vol. 4972, pp. 242–253. Springer, Heidelberg (2008). https://doi.org/10. 1007/978-3-540-78604-7 21 25. Troiano, L., Rodr´ıguez-Mu˜ niz, L., D´ıaz, I.: Discovering user preferences using dempster-Shafer theory. Fuzzy Sets Syst. 278, 98–117 (2015) 26. Troiano, L., Rodr´ıguez-Mu˜ niz, L., Marinaro, P., D´ıaz, I.: Statistical analysis of parametric t-norms. Inf. Sci. 257, 138–162 (2014) 27. Troiano, L., Rodr´ıguez-Mu˜ niz, L., Ranilla, J., D´ıaz, I.: Interpretability of fuzzy association rules as means of discovering threats to privacy. Int. J. Comput. Math. 89(3), 325–333 (2012) 28. Welch, T.A.: A technique for high-performance data compression. Computer 17(06), 8–19 (1984) 29. Yildirim, O., San Tan, R., Acharya, U.R.: An efficient compression of ECG signals using deep convolutional autoencoders. Cogn. Syst. Res. 52, 198–211 (2018) 30. Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory 23(3), 337–343 (1977)
On the Use of Multivariate Medians for Nearest Neighbour Imputation Francesco Gissi1,2 , Vincenzo Benedetto1,2(B) , Parth Bhandari1,2,3 , and Ra´ ul P´erez-Fern´andez3 1
Department of Engineering, University of Sannio, Viale Traiano, 1, 82100 Benevento, Italy 2 Kebula srl, Via della Biblioteca 2, 84084 Fisciano, Italy {francesco.gissi,vincenzo.benedetto}@kebula.it 3 Department of Statistics and O.R. and Mathematics Didactics, University of Oviedo, Oviedo, Spain [email protected]
Abstract. Most data analysis techniques, e.g. those faced in [15–18] usually require datasets without missing data, however, in the current age of large datasets it is becoming more and more common to deal with huge datasets for which some values are missing. For this reason, several techniques for the imputation of missing values have been developed. In the present work, we present one such technique based on the use of multivariate medians within the context of nearest neighbour imputation. Several experiments are carried out proving a better performance than some state-of-the-art methods in presence of noise.
Keywords: Multivariate imputation robustness · multivariate median
1
· nearest neighbour imputation ·
Introduction
In the current age of large datasets it is becoming increasingly more common to deal with missing data [3]. In fact, most data analysis techniques are exclusively developed for dealing with complete datasets and, unfortunately, most of the times it is not straightforward to decide how to proceed when some part of the dataset is missing. The easiest way out is to simply delete the culprits of the missingness of information, simply dropping out of the dataset individuals/variables that are not complete. A more elaborated solution is to impute the missing values, thus avoiding the large loss of information typically associated with the deletion procedure. Imputation techniques are usually divided into two families: univariate imputation techniques and multivariate imputation techniques. The former simply consider the imputation within each variable as a single separate task and, thus, do not exploit the correlations between the different variables of the dataset. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 L. Troiano et al. (Eds.): ICDLAIR 2022, LNNS 670, pp. 144–153, 2023. https://doi.org/10.1007/978-3-031-30396-8_13
Multivariate Medians for Imputation
145
Rudimentary examples within this family are random guessing (where the values are imputed randomly according to the distribution of the non-missing values for the considered variable) and mean/median imputation (where the values are replaced by the mean/median of the considered variable). However, there is a great understanding among the scientific community that multivariate imputation techniques are more suitable than univariate imputation techniques as long as there is no clear evidence that all variables are independent with respect to each other. Prominent examples of multivariate imputation techniques are: k-nearest neighbours (KNN) [19], fuzzy k-means (FKM) [9], bayesian principal component analysis (BPCA) [10] and multiple imputations by chained equations (MICE) [21]. All these methods have been compared extensively in the literature (see, e.g., [12]). In this work, we follow the direction of imputation by k-nearest neighbours (KNN), which is known to have a big drawback: it is not robust in the presence of outliers. We propose to solve this problem by substituting the (componentwise) arithmetic mean typically used for the imputation by an appropriate multivariate median [14]. In particular, three multivariate medians are explored: the componentwise median [11], the spatial median [22] and the halfspace median [20]. Several experiments are carried out on artificially-generated data, paying attention to the influence of the size of the neighbourhood, the number of variables to be imputed, the correlation between the variables and the robustness in the presence of outliers. The remainder of the paper is structured as follows. In Sect. 2, some preliminaries on multivariatie medians are introduced. Section 3 introduces a new methodology for robust multivariate imputation. Several experiments on artificially generated data are presented in Sect. 4. We end with some conclusions and future lines of research in Sect. 5.
2
Preliminaries on Multivariate Medians
We are given a dataset X formed by n rows (individuals) and m columns (variables), where the value Xij at the i-th row and j-th column represents the value of the i-th individual for the j-th variable. We additionally consider the notation Xi. for referring to the i-th row and the notation X.j for referring to the j-th column. For summarizing the individuals in the dataset, it is typical to simply consider the arithmetic mean of each of the columns, hereinafter referred to as the componentwise arithmetic mean (CAM) when arranged as an n-dimensional vector. Formally, the componentwise arithmetic mean of the dataset X is defined as: n n 1 1 Xi1 , . . . , Xim . CAM(X) = n i=1 n i=1
146
F. Gissi et al.
Many intuitive properties are known to be fulfilled by the componentwise arithmetic mean (e.g., componentwise monotonicity, affine equivariance, idempotence and symmetry [4]). Unfortunately, the componentwise arithmetic mean lacks the key property of robustness in the presence of outliers since its finitesample breakdown point equals 1/n. A naive solution for obtaining a similar summary of the individuals in the dataset is to substitute the arithmetic mean by the median, thus giving rise to the notion of componentwise median (CME). Formally, the componentwise median of the dataset X is defined as: CME(X) = (Me(X.1 ) , . . . , Me(X.m )) , where Me denotes the classical univariate median. Unfortunately, the componentwise median gains robustness in the presence of outliers (finite-sample breakdown point of 1/2) at the cost of losing some other appealing properties such as affine equivariance [11]. This is certainly inconvenient since it means that all variables are treated separately and, thus, the dataset is regarded as a list of m separate variables rather than as a unique multivariate dataset. It is for this very reason that a large literature on the extension of the notion of univariate median to the multivariate framework has been developed [14]. An early example of multivariate median that indeed embraces the multivariate nature of the dataset is the spatial median (SPA) – also known by many other names such as the geometric median, the Torricelli point and the L1 -estimator of location – which is the point that minimizes the sum of Euclidean distances. As brought to attention in [8], the first discussions on this multivariate median are due to a problem posed by Fermat and solved by Torricelli, yet the problem gain more attention with Weber’s consideration of the problem in the context of facily location [23]. Formally, the spatial median is defined as follows: SPA(X) = arg min x∈Rm
n
d(x, Xi. ) ,
i=1
where d denotes the Euclidean distance. Note that the minimizer is assured to be unique as long as the points X1. , . . . , Xn. are not collinear, yet no closed formula is known and for its computation it is necessary to resort to techniques such as Weiszfeld’s algorithm [24] and its modification assuring convergence for all initial points [22]. Unfortunately, as happened with the componentwise median, the spatial median is also not affine equivariant [11]. Still, the spatial median has a finite-sample breakdown point of 0.5 and behaves properly under transformation that preserve Euclidean distances (therefore, it fulfills weaker but interesting properties such as translation equivariance and orthogonal equivariance). Another prominent multivariate median is the halfspace median (HSM), which is usually attributed to Tukey [20] but can actually be traced back to Hotelling [5]. This multivariate median is based on the notion of statistical depth [25] (and, in particular, on Tukey’s halspace statistical depth [20]) that formalizes the idea of centrality of a point within a dataset. Formally, the halfspace median is defined as follows:
Multivariate Medians for Imputation
147
HSM(X) = arg max HSD(x; X) , x∈Rm
where HSD(x; X) = inf{PX (H) | H is a closed halfspace such that x ∈ H} , where PX (H) represents the proportion of individuals Xi. such that Xi. ∈ H. Note that it is not assured that there exists a unique maximizer of HSD(x; X), however the set of maximizers is known to be closed, bounded and convex [13]. A common approach simply considers the average of all maximizers [2], yet this implies that for small values of n the halfspace median often returns the componentwise arithmetic mean. In terms of fulfilled properties, the halfspace median is affine equivariant (and, as discussed in [13], equivariant under a much larger class of transformations) and its finite-sample breakdown point is at least 1/(d + 1) [2]. In the following, we introduce a running example that will be used later on within a multivariate imputation problem. Example 1. Consider the dataset in Table 1. Note that the value X1,5 is a gross outlier since it greatly differs from all other values. Table 1. Illustrative dataset. i V1
V2
V3
V4
V5
1 0.29 −1.79 0.19 1.15 100.00 0.63 1.43 −2.37 2.04 −0.04 2 0.39 −0.89 0.14 −0.73 3 −0.40 0.83 −0.51 2.39 2.20 0.68 4 0.44 −0.80 −2.23 −0.42 −2.25 5 0.62 0.99 −2.23 1.85 6 −0.34 0.69 0.64 0.29 2.51 −0.76 7
We consider the four techniques for the summarization of a dataset discussed in this section. It follows that: CAM(X) = (0.305714286, −0.002857143, −0.232857143, 0.77, 14.107142857) , CME(X) = (0.44, 0.39, 0.19, 1.15, −0.04) , SPA(X) = (0.21732987, 0.37076972, −0.48580539, 0.82150193, 0.01292327) , HSM(X) = (0.5140342, 0.5185158, −0.1774926, 1.6417899, 0.1680030) . We can note that only the componentwise arithmetic mean is affected by the outlier and all three other methods provide reasonable summaries of the central tendency for the fifth component.
148
3
F. Gissi et al.
Robust Multivariate Imputation
Unfortunately, when dealing with a real-life dataset, it is often the case that some of the data is missing, thereby hindering the use of classical data science techniques. We will denote the fact that an entry Xij of the dataset is missing by Xij = ∗ and, analogously, we will denote the fact that an entry Xij of the dataset is not missing by Xij = ∗. As discussed in [3], possible ways out when facing missing data include: (1) deletion of the individuals and/or variables with missing data from the dataset; (2) use of univariate imputation techniques, where only information concerning each variable is used for reconstructing the dataset; and (3) use of multivariate imputation techniques, where the correlations between the different variables are taken into account when reconstructing the dataset. The approach presented later on in this section is of the latter type, lying on the family of k-Nearest Neighbour Imputation methods (KNNI methods) [19]. These methods proceed as follows: Step 1. Consider the subset of rows I ⊆ {1, . . . , n} of the dataset that are complete (henceforth called the pool of complete individuals); Step 2. According to a considered distance metric d, compute for each incom/ I, the k-nearest neighbours Xi1 . , . . . , Xik . from plete individual Xi. with i ∈ the pool of complete individuals (i.e., i1 , . . . , ik ∈ I). Note that the distances are measured in the subspace formed by the components in which the individual is complete, as follows m (Xij − Xj )2 , d(Xi. , X. ) = j=1 Xij =∗
for any ∈ I; Step 3. For each missing value Xij of an incomplete individual, assign the mean of the values of the k-nearest neighbours Xi1 . , . . . , Xik . from the pool of complete individuals at the same variable, as follows: Xij =
k 1 Xik j . k
(1)
=1
Some authors [7] encourage to impute the individuals sequentially starting by the individuals with the least missing values and immediately add them to the pool of complete individuals. However, the most common approach is to simply use the pool of complete individuals available at the beginning. Unfortunately, the method above has two drawbacks. Firstly, it is not robust in the presence of outliers. Secondly, although a multivariate approach is considered in ‘Step 2’, ‘Step 3’ is performed componentwisely, therefore treating the problem from the perspective of a univariate approach.
Multivariate Medians for Imputation
149
In the present work, we propose to solve these problems by substituting the componentwise arithmetic mean in Equation (1) by one of the multivariate medians presented in the previous section, as follows: Xij = A(Xi1 . , . . . , Xik . )j , where A()j denotes the j-th component of the multivariate median A. After the computation of the chosen multivariate median of the k-nearest neighbours, the missing values are substituted by the corresponding ones in the multivariate median. In the following, we provide a toy example for control. Example 2. Consider the dataset in Table 2. Note that the first seven individuals coincide with those of the dataset in Table 1 and that the tenth individual has missing data at the fourth and fifth variables. The goal is to perform imputation on the individual X10. and, in particular, to impute X10,4 and X10,5 . Table 2. Illustrative dataset with missing data. i
V1
V2
V3
V4
V5
1 0.29 −1.79 0.19 1.15 100.00 0.63 1.43 −2.37 2.04 −0.04 2 0.39 −0.89 0.14 −0.73 3 −0.40 0.83 −0.51 2.39 2.20 0.68 4 0.44 −0.80 −2.23 −0.42 −2.25 5 0.62 0.99 −2.23 1.85 6 −0.34 0.69 0.64 0.29 2.51 −0.76 7 8.31 3.42 4.21 7.65 9.87 8 7.98 7.43 4.54 8.51 9 −0.73 0.41 0.75 1.69 ∗ ∗ 10
The first nine individuals form the pool of complete individuals. Considering k = 7, the distances to X10. on the subspace formed by the first three components (those in which X10. is complete), are as follows: d(X10. , X1. ) = 2.952287 ,
d(X10. , X2. ) = 4.122426 ,
d(X10. , X3. ) = 2.728021 ,
d(X10. , X4. ) = 1.501333 , d(X10. , X7. ) = 1.431957 ,
d(X10. , X5. ) = 4.215424 , d(X10. , X8. ) = 8.711446 ,
d(X10. , X6. ) = 1.034118 , d(X10. , X9. ) = 9.301618 .
It follows that the 7-nearest neighbours are {X1. , X2. , X3. , X4. , X5. , X6. , X7. }. The results of applying the componentwise arithmetic mean, the componentwise median, the spatial median and the halfspace median were already presented in Example 1. It is then straightforward to obtain the results of the different imputation techniques for X10. :
150
– – – –
4
F. Gissi et al.
CAM: (0.41, 0.75, 1.69, 0.77, 14.107142857); CME: (0.41, 0.75, 1.69, 1.15, −0.04); SPA: (0.41, 0.75, 1.69, 0.82150193, 0.01292327); HSM: (0.41, 0.75, 1.69, 1.6417899, 0.1680030).
Experiments on Artificially Generated Data
In this section, we compare the performance of the different imputation techniques presented in the previous section under different circumstances. In all cases, we start with a clean dataset without missing data and artificially generate the missing data. The different imputation techniques are implemented on the generated dataset with missing data, whereas the original dataset is always used for control. We run several experiments concerning artificially generated data from a multivariate Gaussian distribution. The number of individuals is set to n = 1000 and the dimension (number of variables) is set to m = 10 throughout the whole section. The mean vector is set to the origin since it does not affect the performance of the imputation techniques (due to the translation equivariance of the componentwise arithmetic mean and the three multivariate medians). The covariance matrix is sometimes changed throughout the experiments in order to account for the influence of the correlations between the variables. After the original dataset is generated, 5% of the individuals are chosen at random and some of the known values for these individuals are dropped in order to generate missing data. All presented imputation techniques are applied to each data and the performance of the imputation technique is measured by means of the average error (i.e., average absolute difference between all original values and their imputed counterparts). All mentioned experiments are repeated five times and the mean and standard deviations of the obtained average errors for all repetitions are reported. The obtained average errors are reported in Table 3. Experiment 1: Influence of the Number of Neighbours. For this experiment, the covariance matrix is chosen such that diagonal values equal 1 and off-diagonal values equal 0.5. For the creation of missing data, we randomly select 5% of the individuals and miss 5 of their values randomly. We vary the number of neighbours used for the imputation, ranging within k ∈ {12, 15, 20}. As inferred from Table 3, we conclude that the number of neighbours considered for the imputation does not have much influence on the performance of the imputation technique and that there are not large differences between the different considered techniques. Experiment 2: Influence of the Number of Values to be Imputed. For this experiment, the covariance matrix is chosen such that diagonal values equal 1 and off-diagonal values equal 0.5. For the creation of missing data, we randomly select 5% of the individuals and miss x ∈ {1, 3, 5, 7, 9} of their values randomly. We consider k = 12 neighbours for the imputation. As inferred from Table 3, we conclude that the number of values to be imputed does not have much influence on the performance of the imputation technique and that there are not large differences between the different considered techniques.
Multivariate Medians for Imputation
151
Experiment 3: Influence of the Correlation Between Variables. For this experiment, the covariance matrix is chosen such that diagonal values equal 1 and off-diagonal values are varied ranging within x ∈ {0, 0.25, 0.5, 0.75, 1}. For the creation of missing data, we randomly select 5% of the individuals and miss 5 of their values randomly. We consider k = 12 neighbours for the imputation. As inferred from Table 3, we conclude that the correlation between variables plays a big role on the performance of the imputation technique. Actually, the imputation technique performs better the more correlated the variables are. Experiment 4: Robustness in Presence of Outliers. For this experiment, the covariance matrix is chosen such that diagonal values equal 1 and off-diagonal values equal 0.5. For the creation of missing data, we randomly select 5% of the individuals and miss their 5 first values. Simultaneously, we select a varying percentage ∈ {0.05, 0.1, 0.15, 0.20, 0.25} of values of the last 5 columns for all the individuals and multiply them by 1000, thus introducing a certain percentage of outliers. We consider k = 12 neighbours for the imputation. As inferred from Table 3, we conclude that the the choice of summarization technique plays a big role on the performance of the imputation technique. In particular, the use of the componentwise arithmetic mean should be avoided in presence of outliers, whereas all three multivariate medians lead to a similar performance. Table 3. Results of the four experiments discussed in Sect. 4. CAM
CME
SPA
HSM
Experiment 1 12 15 20
0.680 (0.876) 0.697 (0.870) 0.689 (0.860)
0.688 (0.876) 0.681 (0.871) 0.687(0.876) 0.699 (0.873) 0.691 (0.867) 0.696 (0.875) 0.697 (0.868) 0.6860.861 0.697 (0.867)
Experiment 2 1 3 5 7 9
0.668 0.667 0.697 0.663 0.676
(0.819) (0.823) (0.876) (0.838) (0.852)
0.656 0.672 0.699 0.669 0.693
(0.803) (0.830) (0.876) (0.852) (0.871)
0.670 0.676 0.691 0.655 0.678
(0.827) (0.830) (0.871) (0.833) (0.854)
0.667 0.668 0.696 0.660 0.694
(0.820) (0.830) (0.876) (0.842) (0.868)
Experiment 3 0 0.25 0.50 0.75
0.808 0.714 0.697 0.482
(1.035) (0.905) (0.876) (0.574)
0.826 0.731 0.699 0.492
(1.055) (0.918) (0.876) (0.590)
0.816 0.721 0.691 0.486
(1.040) (0.911) (0.871) (0.577)
0.828 0.732 0.696 0.492
(1.057) (0.922) (0.876) (0.587)
Experiment 4 0.05 0.10 0.15 0.20 0.25
30.456 (136.933) 54.159 (197.256) 76.971 (236.746) 110.607 (292.247) 133.773 (325.038)
3.615 3.597 3.581 3.516 3.504
(5.085) (5.092) (5.077) (4.972) (4.969)
3.596 3.591 3.585 3.545 3.526
(5.035) (4.969) (4.962) (4.917) (4.925)
3.564 3.517 3.511 3.461 3.438
(5.127) (5.106) (5.049) (4.966) (4.973)
152
5
F. Gissi et al.
Conclusions
In this work, we have presented a robust technique for multivariate data imputation in which the classical arithmetic mean is substituted by three robust alternatives. All three considered alternatives are proven to lead to similar results than the classical arithmetic mean when outliers are not present in the data and to be significantly more robust in the presence of outliers. We end by noting that the results of the present work have implications in almost all fields of Artificial Intelligence due to the ubiquitous nature of missing data. Interesting problem settings that shall be explored in the future concern the influence of the imputation technique on the performance of several state-of-the-art techniques for classification (such as neural networks [1]) and the use of imputation techniques in image inpainting (reconstruction of missing regions in an image [6]). Acknowledgments. Ra´ ul P´erez-Fern´ andez acknowledges the support of Campus de Excelencia Internacional de la Universidad de Oviedo in collaboration with Banco de Santander.
References 1. Choudhury, S.J., Pal, N.R.: Imputation of missing data with neural networks for classification. Knowl.-Based Syst. 182, 104838 (2019) 2. Donoho, D., Gasko, M.: Multivariate generalizations of the median and trimmed means. Tech. rep., Technical Reports 128 and 133, Department Statistics, University California, Berkeley (1987) 3. Enders, C.K.: Applied Missing Data Analysis. Guilford Press, New York (2010) 4. Gagolewski, M., P´erez-Fern´ andez, R., De Baets, B.: An inherent difficulty in the aggregation of multidimensional data. IEEE Trans. Fuzzy Syst. 28(3), 602–606 (2020) 5. Hotelling, H.: Stability in competition. Econ. J. 39(153), 41–57 (1929) 6. Hukkel˚ as, H., Lindseth, F., Mester, R.: Image inpainting with learnable feature imputation. In: Akata, Z., Geiger, A., Sattler, T. (eds.) DAGM GCPR 2020. LNCS, vol. 12544, pp. 388–403. Springer, Cham (2021). https://doi.org/10.1007/978-3030-71278-5 28 7. Kim, K.Y., Kim, B.J., Yi, G.S.: Reuse of imputed data in microarray analysis increases imputation efficiency. BMC Bioinform. 5(1), 1–9 (2004) 8. Krarup, J., Vajda, S.: On Torricelli’s geometrical solution to a problem of fermat. IMA J. Manag. Math. 8(3), 215–224 (1997) 9. Li, D., Deogun, J., Spaulding, W., Shuart, B.: Towards missing data imputation: a study of fuzzy k-means clustering method. In: International Conference on Rough Sets and Current Trends in Computing, pp. 573–579. Springer, Uppsala (2004) 10. Oba, S., Sato, M., Takemasa, I., Monden, M., Matsubara, K., Ishii, S.: A Bayesian missing value estimation method for gene expression profile data. Bioinformatics 19(16), 2088–2096 (2003) 11. Rousseeuw, P., Hubert, M.: High-breakdown estimators of multivariate location and scatter. In: Becker, C., Fried, R., Kuhnt, S. (eds.) Robustness and Complex Data Structures, pp. 49–66. Springer, Heidelberg (2013). https://doi.org/10.1007/ 978-3-642-35494-6 4
Multivariate Medians for Imputation
153
12. Schmitt, P., Mandel, J., Guedj, M.: A comparison of six methods for missing data imputation. J. Biometr. Biostatist. 6(1), 1 (2015) 13. Small, C.G.: Measures of centrality for multivariate and directional distributions. Canadian J. Statist. 15(1), 31–39 (1987) 14. Small, C.G.: A survey of multidimensional medians. Int. Stat. Rev. 58(3), 263–277 (1990) 15. Troiano, L., Birtolo, C., Armenise, R., Cirillo, G.: Optimization of menu layouts by means of genetic algorithms. In: van Hemert, J., Cotta, C. (eds.) EvoCOP 2008. LNCS, vol. 4972, pp. 242–253. Springer, Heidelberg (2008). https://doi.org/10. 1007/978-3-540-78604-7 21 16. Troiano, L., Rodr´ıguez-Mu˜ niz, L., D´ıaz, I.: Discovering user preferences using dempster-Shafer theory. Fuzzy Sets Syst. 278, 98–117 (2015). https://doi.org/10. 1016/j.fss.2015.06.004 17. Troiano, L., Rodr´ıguez-Mu˜ niz, L., Marinaro, P., D´ıaz, I.: Statistical analysis of parametric t-norms. Inf. Sci. 257, 138–162 (2014). https://doi.org/10.1016/j.ins. 2013.09.041 18. Troiano, L., Rodr´ıguez-Mu˜ niz, L., Ranilla, J., D´ıaz, I.: Interpretability of fuzzy association rules as means of discovering threats to privacy. Int. J. Comput. Math. 89(3), 325–333 (2012). https://doi.org/10.1080/00207160.2011.613460 19. Troyanskaya, O., et al.: Missing value estimation methods for DNA microarrays. Bioinformatics 17(6), 520–525 (2001) 20. Tukey, J.W.: Mathematics and the picturing of data. In: Proceedings of the International Congress of Mathematicians, pp. 523–531. Vancouver (1975) 21. Van Buuren, S., Oudshoorn, C.G.: mice: Multivariate imputation by chained equations. J. Stat. Softw. 45(3), 1–67 (2011) 22. Vardi, Y., Zhang, C.H.: The multivariate l 1-median and associated data depth. Proc. Natl. Acad. Sci. 97(4), 1423–1426 (2000) 23. Weber, A.: Ueber den Standort der Industrien. Mohr Siebeck Verlag, T¨ ubingen (1909) 24. Weiszfeld, E.: Sur le point pour lequel la somme des distances de n points donn´es est minimum. Tohoku Math. J. First Ser. 43, 355–386 (1937) 25. Zuo, Y., Serfling, R.: General notions of statistical depth function. Annal. Statist. 28(2), 461–482 (2000)
Decision Making by Applying Machine Learning Techniques to Mitigate Spam SMS Attacks Hisham AbouGrad1(B) , Salem Chakhar2 , and Ahmed Abubahia3 1 School of Architecture, Computing and Engineering, CDT, University of East London (UEL),
London E16 2RD, UK [email protected] 2 Portsmouth Business School, CORL, University of Portsmouth, Portsmouth PO1 3AH, UK [email protected] 3 School of Science, Technology & Health, York St John University, York YO31 7EX, UK [email protected]
Abstract. Due to exponential developments in communication networks and computer technologies, spammers have more options and tools to deliver their spam SMS attacks. This makes spam mitigation seen as one of the most active research areas in recent years. Spams also affect people’s privacy and cause revenue loss. Thus, tools for making accurate decisions about whether spam or not are needed. In this paper, a spam mitigation model is proposed to find spam from non-spam and the different processes used to mitigate spam SMS attacks. Also, anti-spam measures are applied to classify spam with the aim to have high classification accuracy performance using different classification methods. This paper seeks to apply the most appropriate machine learning (ML) techniques using decision-making paradigms to produce a ML model for mitigating spam attacks. The proposed model combines ML techniques and the Delphi method along with Agile to formulate the solution model. Also, three ML classifiers were used to cluster the dataset, which are Naive Bayes, Random Forests, and Support Vector Machine. These ML techniques are renowned as easy to apply, efficient and more accurate in comparison with other classifiers. The findings indicated that the number of clusters combined with the number of attributes has revealed a significant influence on the classification accuracy performance. Keyword: Machine Learning Algorithms · Feature Classification Algorithms · Decision-Making Method · Mitigating Spam Techniques · Spam Analytics Model · Mobile Network Security and Privacy Solution
1 Introduction The levels of communication increased worldwide by using mobile devices to send and receive short text messages (SMS), which become critical for consumers because of spam messages. Indeed, spam SMS is used to access data. Thus, people have lots of concerns because of the disruptions and data loss. Also, digital service providers are affected by spam attacks, as these activities affect their reputation and growth. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 L. Troiano et al. (Eds.): ICDLAIR 2022, LNNS 670, pp. 154–166, 2023. https://doi.org/10.1007/978-3-031-30396-8_14
Decision Making by Applying Machine Learning Techniques
155
Research around spam mitigation is growing as digital technologies advances that provide spammers with advanced tools to make spam attacks [1, 2]. Recent industrial research studies, in the United States, reveal that merchants will lose US$130 billion to fraud between 2018 and 2023, which comes from spam in many situations [3]. Although, financial technology (Fintech) innovations give attackers more options to collect money. Indeed, researchers are making efforts to detect spam SMS. Thus, ML techniques are considered the best way in resolving spam issues. Section 2 gives more details about the related research in this area. In this paper, a proposed spam mitigation model is provided using the Delphi decision making method to bridge the gaps in the literature. This method is used to utilise ML techniques for better decisions regarding spam or non-spam [4]. ML algorithms contain three components, which are representation, evaluation, and optimization. First, the Kmeans clustering method is applied to group words based on their occurrence similarities. The text clustering implementation will be conducted with a variety of cluster numbers, which are 10, 20 and 30 clusters. Thereafter, three classification techniques were applied to the clustered data, which include: Naive Bayes; Support Vector Machine (SVM); and Random Forests. These classifiers are known for making accurate results and finding. The study indicates that the number of clusters compared to the number of attributes has a significant impact on the accuracy performance. The paper is organised as follows: Sect. 2 introduces the research background and the related works that use machine learning techniques for mitigating unwanted spam messages; Sect. 3 describes both the study datasets and the experimental setup and presents in detail the proposed research method that is based on the Delphi decision-making method; Sect. 4 discusses the experimental results and interpret the main findings; and Sect. 5 provides a conclusion with further work.
2 Related Work In the domain of spam SMS mitigation to recognise attacks on mobile devices, many machine learning techniques are applied for decision-making to develop spam SMS recognition systems [5]. This section is a background of the most recognised previous work and reviews previous ultimate algorithms applied in the study domain. This section has studies in this area to cover the following: • Explore the usage of machine learning techniques for decision-making processes. • What are the applied machine learning algorithms to mitigate spam? • How machine learning techniques can be utilised to mitigate spam? 2.1 Background People are affected by spam through their mobile devices and the development of digital technologies, especially text classification features and data mining [6, 7]. Certainly, machine learning algorithms have been found as an effective solution and most recent research has proven that these algorithms can mitigate spam to over 94%. Also, many studies proposed lots of approaches to spam classifications including extracting specific
156
H. AbouGrad et al.
features from text messages to generate anti-spam methods. Such a method with the use of ML classification algorithms can reach acceptable accuracy. There are several renowned classic algorithms, which are applied for making text classifications, such as Random Forest, SVM and Multinomial Naive Bayes. These algorithms proved to be reliable as such algorithms achieved high-quality results [5]. Indeed, Random Forest can implement an in-depth classification to enhance performance by enhancing precision. Support vector machine is grouping binary classifier algorithms, which can achieve high accuracy using a hyperplane to train a judgement similar to an n-dimensional data presentation in two different places. The SVM classifier is reliable in categorising situations and indicates the presence of particular words, which are usually identified as spam [8]. Conversely, the Bayesian classification filter is a commonly utilised technology and ML technique [7, 9]. Bayesian methods like Naive Bayes become ML tools for information processing and retrieval [7]. Bayes’ theorem applies the naïve assumptions by making all words unique and independent. The frequency-inverse document frequency (TFIDF) sparse pattern is used to make a classification for the communications using SMS to be filtered to factual different classes or spam attacks according to feature classification algorithms. TFIDF is also commonly used and relies on the document, textual, or written text weighting. Thus, classification ML algorithms can be used along with TFIDF to filter SMS. The development of machine learning analytical methods needs a decision-making framework to manage the classification process and measure the process performance to formulate the maturity model [10–12]. Thus, the Delphi method was found to be a suitable research framework to support the decision-making process to produce the study results and findings. Also, software development including programming and system analysis requires a methodology to collaborate, communicate, and work to make the implementation [13]. Hence, Agile software development (ASD) was adopted to conduct the system analysis, software development and programming [14]. Indeed, software development methods can produce the model to demote spam and prevent spammers from accessing people’s mobile devices and information [7]. 2.2 Usage of Machine Learning Techniques for Decision-Making The classification features of ML techniques can be applied to make decisions due to their effective performance to compare and analyse to provide accurate numerical results and utilise accuracy metrics [5]. For instance, the key principle of SVM is minimizing the structural risk by finding a decision surface, which splits the instances into two classes. Also, the Random Forest algorithm is used as an ML classifier that provides highaccuracy results for predicting decisions on SMS spam. These high accuracy results can identify objects, such as SMS spam, to process them by finding the nature of such objects to accordingly process them based on requirements and decision rules. According to Tejada et al. [6] techniques that applied SVM with a Radial Basis Function (RBF) non-linear kernel function are commonly utilised to assess activities such as daily reference crop evapotranspiration in a such specific region using limited meteorological datasets. The SVM compared to five other established empirical ML techniques in terms of accuracy in daily activities estimation, and consequently, the
Decision Making by Applying Machine Learning Techniques
157
results confirm that SVM made the best daily estimates. When SVM was compared to other ML techniques with analogous datasets, the SVM made high accuracies. Improvement to make the best possible accuracies in the study results and findings may need different data mining methods for data analysis using several machine learning models [6, 9]. For example, the SVM algorithm is structured as a supervised ML technique by Vapnik [15] for data analysis and pattern recognition. SVM is commonly applied for forecasting, prediction and regression in many fields such as meteorology, agriculture, and environmental studies. In contrast, clustering methods such as ML unsupervised techniques can extract more accurate results and new knowledge using data mining methods [9]. Unsupervised descriptive data mining transfers clustering groups of data to sub-clustered datasets (sub-clusters). Data mining statistical algorithms, such as K-means cluster analysis, have proven to achieve an outstanding performance compared to other ML algorithms, especially in providing accuracy performance [5, 9]. Of course, the accuracy of such algorithms with the use of decision-making methods rounds/steps provides significant excellent outcomes to mitigate SMS spam, secure information and protect people’s privacy. 2.3 Applied Machine Learning Algorithms The implementation of data mining’s key steps, which include making data selection, pre-processing and transformation, running data mining methods, and finally, making an evaluation to produce accurate numerical metrics, results and findings [2, 9]. Based on the spam filtering study by Manaa et al. [9], there are four fundamental steps to follow using data mining methods to find spam. First, tokenise the incoming messages, so the dataset can be counted. Second, calculate the tokenised dataset after collecting it by selecting the key features from the data. Third, classify the data to prepare and make the feature vector to be ready for clustering. Fourth and final step is running the K-means model to recognise the spam cluster, and in-parallel, the Naïve Bayes model can be also applied to recognise the spam messages. According to Pandya’s [5] spam detection study, SVM efficiency can develop a spam recognition system that mitigates spam using classification and clustering-based SVM techniques. This proposed two algorithms to formulate a spam detection system, which combines clustering and classification to make what is called Clustering-based Support Vector Machine (CLSVM) system. This enhances a conventional system that uses the SVM in four steps with the use of training and testing datasets to produce a highly reliable classifier. Section 3.2 provides more details. Classification methods are supervised ML approaches, which are usually applied to make feature classification in data mining processes. Classification methods, such as the SVM classifier, eliminate the redundant features to enhance the classifiers in terms of runtime and prediction accuracy. Combining multiple methods, such as SVM and Naive Bayes, are able to find the optimum features [6, 16]. SVM can convert the data from low n-dimensions to a higher dimensional feature in such a tacit way [6]. SVM maps the relationship between input and output victor to transform data into features. Conversely, the Naive Bayes classifier uses a set of algorithms, which are developed according to the Naive Bayes theorem [16, 17]. Thus, the Naive Bayes classification method has multiple algorithms, which work as a family to run feature classification. In
158
H. AbouGrad et al.
general, such a family of algorithms have common rules, which are used to make each pair of features classified autonomously from each other. 2.4 Utilising Machine Learning Techniques to Mitigate Spams Multiple ML techniques are commonly used to recognise spam messages to mitigate spam. This supports the usage of semi-supervised learning, which is learning from both, unsupervised learning and supervised learning [16, 17]. Although, semi-supervised machine learning through different classification methods, such as SVM, Naive Bayes and Random Forests, can be applied to recognise spam messages as these methods can achieve goals by combining a small group of labelled data with a larger group of unlabelled data by the training steps. Indeed, achieving the targeted accuracy performance needs supervised learning. Thus, Random Forests can be also applied for classification. In this study, classification is used to find the number of trees in the forest, which indicates the proportion of the results, as the higher the number of trees will be achieved, the much better the accuracy of the results.
3 Methodology The study seeks to support decision-making to mitigate spam using ML techniques, and therefore, the Delphi method rounds applied as a research framework for decision making. Also, the Delphi method is supported by Agile software development to make the core processes and programs for the experiments. ASD has the flexibility to apply the Delphi method for clustering and classification processes and measure them to make decisions on SMS messages whether they are spam or non-spam [12, 13]. The research framework used the collected datasets and ML techniques using three main stages to formulate accurate results and novel findings, which are discussed in detail in Sect. 4. Also, the following sections explain the methodology and study framework to make the implementation toward accurate decisions and conclusions. 3.1 The Delphi Decision-Making Study Method The Delphi method is renowned as a decision-making approach to recognise values and attributes, and therefore, it has been implemented in this study to measure spam attacks, along with Agile methodology (Software Lifecycle) and its Scrum Sprints [12–14]. The Delphi method has three compulsory rounds, which when applied in such sequential steps, can make high quality decisions as illustrated in Table 1. The Delphi method begins by Brainstorming to recognise the measurement indicators for the research framework [10–12]. This round is used to identify initial conditions, criteria, and classes. The second round is Narrowing Down, which is used to check and then approve the recognised indicators of the framework to obtain and be ready for the final rate of consensus to make the decision based on the framework. This is used to have the key identified values. The third and final round of the Delphi method is Weighing, which is used to make an overall evaluation.
Decision Making by Applying Machine Learning Techniques
159
Table 1. The Delphi Method Rounds as described by Looy et al. and AbouGrad et al. [10–12].
3.2 Software Development Methodology and Algorithms Agile methodology and the application of its resilience methods have improved data analytics processes by breaking the software development work to a set of iterations, also known as sprints, in order to coordinate and communicate between developers or development teams, which are mostly distributed in different locations [18]. Agile makes software development projects processed flexibly by supporting the project from different angles using an instrument based on process modelling to run all the project phases. Thus, the Delphi method rounds are applied to be implemented by an Agile instrument to develop the required algorithms and process them. In this study, ASD is an iterative process applied as a sprint in Scrum to produce each working piece of the software rapidly using the Scrum lifecycle [13]. Figure 1 illustrates the use of Scrum sprints in three phases where each phase implements a Delphi round.
Fig. 1. Agile methodology using Scrum sprints as described by Alsaqqa et al. [13].
The study Agile process starts from Sprint 1 to make the Brainstorming process, and then, Sprint 2 makes the Narrowing Down to check and approve the accuracy of the results, and lastly, Sprint 3 is used for Weighing to evaluate the results and validate the accuracy performance, as shown in Fig. 1. The three Sprints do the clustering as a pre-processing step followed by a classification step, which includes three different ML techniques using the system proposed by Pandya [5]. For the pre-processing step, the dataset is divided in two parts, the first part is the training dataset part, e.g. 80%, and the second part is the testing dataset part, e.g. 20% [4]. This uses the training data for feeding the algorithm to be trained where testing data is used for validation. The pre-processing uses clustering as a unique data cleaning feature for accuracy and to improve the data
160
H. AbouGrad et al.
quality, but the disadvantage of this is time increase during data processing, which is overtaken by improvements in the levels of prediction (Fig. 2).
Fig. 2. The pre-processing step for clustering dataset.
When the pre-processing step is done and the collected dataset becomes clustered dataset, then the classification step can be implemented using the clustered dataset. The classification step process uses similar procedures, which are used in the previous process. Figure 3 illustrates the main concept of the classification step.
Fig. 3. The classification step to produce prediction results.
Clustering and classification algorithms produce 100% accuracy with high reliability outcomes [5]. For example, the clustering-based SVM system execution time was about 14.43 s for 5064 records, which is a great reliability number. Indeed, the CLSVM system has proven higher accuracy and better timing compared to SVM. 3.3 Data Collection and Processing The study collected two datasets for the experimental setup in a comparable format. These were chosen from the Kaggle website as a trusted source. To find the dataset #1 URL link is https://www.kaggle.com/code/balaka18/email-spam-classification/data, and the dataset #2 URL link is https://www.kaggle.com/datasets/uciml/sms-spam-col lection-dataset. The datasets are publicly available, and when the data is reviewed, both datasets are found to be reliable. Table 2 shows the description of the datasets. Table 2 depicts the used datasets in terms of the number of rows (records) and a number of columns (attributes). It is clear that dataset #2 has a greater number of attributes than dataset #1. Accordingly, more efforts in data reduction choices are required as this helps to focus on specific sets of attributes to be more efficient and effective.
Decision Making by Applying Machine Learning Techniques
161
Table 2. Description of the study datasets. Dataset No
Number of Rows
Number of Columns
1 2
5172 5574
3002 4
The datasets reviewed by making data processes and TFIDF calculations feature to transform messages (text) into numerical vector data to be executed using algorithms. The dataset messages were reviewed first as these messages are the most important attribute of the research because spammers use messages for spamming. Thus, stop words removal and porter stemming are used to recognise terms (words), and then, the key spam terms are selected. Finally, the datasets were calculated using the TFIDF score for the recognised terms and label them as unique term features (UTF). 3.4 Experimental Study and Implementation Three ML techniques were applied to conduct the experiment, which is discussed in the previous sections. These methods help to recognise SMS spam through all rounds of the study to mitigate spam attacks. Table 3 shows an example to demonstrate how to use the identified spam terms to count and recognise spam messages. Table 3. Example of message recognised terms profile matrix. SMS No
should
get
out
see
price
time
…
classification
SMS 1
1
1
1
0
0
0
…
spam
SMS 2
0
0
0
0
1
0
…
ham
SMS 3
0
0
0
0
0
1
…
ham
SMS 4
0
1
1
1
0
0
…
spam
SMS 5
0
0
1
1
0
0
…
spam
The experiments were executed using computer devices, and the minimum device specifications are Intel (R) Core (TM) i5-4200U [email protected] GHz 2.30 GHz Processor, 8.00 GB RAM, and Windows 10 Pro 64-bit operating system for x64-based processor. During these experiments, three different numbers of clusters (i.e. 10, 20 and 30) were applied to perform data clustering. Thus, three algorithms are applied in combination for the execution of the proposed system. The first algorithm is the improved Naive Bayes, which is used to fix problems, such as the tendency to correct the word positioning by more than 10% in comparison to negative word accuracy [17, 19]. The study has applied K-means clustering during the second round (phase) of the implementation using Algorithm 1. This method uses data from the first round for data pre-processing where the data is converted to vectors using TFIDF vectorizer as shown in Step-1 of Algorithm 2. After the data input, seven steps are followed, as shown in
162
H. AbouGrad et al.
Algorithm 1 to conduct the dataset clustering in order to produce clustered dataset to be prepared for the data classification as shown in Step-3 of Algorithm 2.
Algorithm 1: K-means Clustering Method Input: Vectorised SMS Dataset (see Algorithm 2, Step-1) Output: Clustered (grouped) Data Step-1: Select the number K to decide the number of clusters Step-2: Select random K points or centroids (it can be other from the input dataset) Step-3: Assign each data point to its closest centroid to make the predefined K clusters Step-4: Calculate the variance and place a new centroid of each cluster Step-5: Repeat Step-3 for reassign to each datapoint to the new closest centroid of each cluster Step-6: If any reassignment occurs, then go to Step-4 else go to Step-7 Step-7: FINISH, the dataset is clustered (grouped) and ready for classification
The study experiment has three main steps to mitigate SMS spam using a K-means clustering based classification model, as shown in Algorithm 2. This uses a dataset, which is collected from SMS communications to provide high accurate and classified dataset. The first step is data pre-processing with the use of TFIDF, which is used for removing the missing values, duplicated values, and stop words. The second step is data clustering (Algorithm 1), which consists of clustering the pre-processed data through 10, 20, and 30 clusters. The third step is conducting data classification through classifying the output dataset from Algorithm 2 Step-2 to make a comparison.
Algorithm 2: K-means Clustering based classification model Input: Collected Communication Messages Dataset Output: High accurately classified dataset to mitigate spam attacks Step-1: Data Pre-processing by TFIDF Vectorizer for: Removing the missing values; Removing duplicated values; and Removing stop words Step-2: Data Clustering by using K-means Clustering Method (Algorithm 1). This includes clustering the pre-processed data into 10, 20, 30 clusters Step-3: Data Classification by classifying the output dataset from Step-2 to compare the classification accuracy for: i. Naive Bayes; ii. Support Vector Machine (SVM); iii. Random Forests
For training and testing, the dataset has been split into two parts. The first part is used for training using 75% of the dataset. The second part is used for testing the dataset using 25% for validating the algorithm.
4 Discussions and Findings The study results are explained here to discuss the key findings. The facts and outputs presented describe what the methodology and implementation produced. 4.1 Delphi’s Reliability and Validity for Decision-Making The study found that key objectives for decision-making studies can be achieved through the Delphi method, as consensus and stability have been notably experienced by processing spam datasets in three different rounds to reach consensus on several significant
Decision Making by Applying Machine Learning Techniques
163
aspects. Also, the study approved that the Delphi method found practical issues and key indicators by weighing decision-making criteria using Delphi’s rounds as a multiple criteria decision-making (MCDM) framework, which complies with other studies [11, 12]. The Delphi method confirms the requirements, and then, makes decisions using Delphi’s rounds as an MCDM process. According to AbouGrad et al. study [12], the Delphi method can identify, select, conceptualise, and validate factors. Hence, the Delphi method examines the algorithm’s validity in the weighing round using a quantitative assessment of the reliability and validity of the model. 4.2 Clustering Based Classification for Decision-Making to Mitigate Spam The K-means clustering is followed by three algorithms to formulate the study classification model, as shown in Algorithm 2. The main results and findings of the proposed work are exhibited in Tables 4 and 5. These tables demonstrate each number of clusters using ML algorithms and their results, which provide classification accuracy through the different steps using Algorithm 1 and Algorithm 2. Table 4. Dataset #1 based Experimental Results. Number of Clusters
Classification Algorithm
Classification Accuracy
10
Naive Bayes
0.75
20
30
SVM
0.71
Random Forests
0.77
Naive Bayes
0.84
SVM
0.80
Random Forests
0.87
Naive Bayes
0.94
SVM
0.89
Random Forests
0.97
According to dataset #1, as shown in Table 4, the Random Forests outperforms both classifiers of Naive Bayes and SVM. By using 10 clusters based on the experiment, Random Forests classifier scores an accuracy of approximately 77% while Naive Bayes scores 75% and SVM scores 71%. This means that the Naive Bayes classifier is still outperforming the SVM classifier. Looking at the observations from both 20 clusters and 30 clusters, the classification accuracy scores support the hypothesis for concluding that the Random Forests classifier gives better prediction than both Naive Bayes and SVM classifiers. Thus, the variations in the number of clusters make it clear that the more number of clusters, the higher the accuracy score can be achieved, which means 10 clusters increase with classification accuracy score by an extra 10%. According to dataset #2 (Table 5), the Random Forests classifier again outperforms Naive Bayes and SVM. In 10 clusters based on the study experiment, Random Forests
164
H. AbouGrad et al. Table 5. Dataset #2 based Experimental Results.
Number of Clusters
Classification Algorithm
Classification Accuracy
10
Naive Bayes
0.91
SVM
0.86
Random Forests
0.94
Naive Bayes
0.92
SVM
0.87
Random Forests
0.95
Naive Bayes
0.93
SVM
0.88
Random Forests
0.96
20
30
classifier scores an accuracy of about 94% while Naive Bayes scores 91% and SVM scores the lowest of 86%. This indicates that the Naive Bayes classifier is still outperforming. Looking at the observations from both 20 clusters and 30 clusters based on the experiment, the classification accuracy scores support the hypothesis for concluding that the Random Forests classifier produces better predictions than both Naive Bayes and SVM classifiers. Thus, a greater number of clusters lead to a higher classification accuracy score. This means 10 more clusters result in increasing the accuracy score by about 1%. The differences in incremental rates between dataset #1 (10%) and dataset #2 (1%) lead to conclude that the number of clusters in combination with the number of attributes have a significant influence on the classification accuracy. Also, an overall comparison of the classification accuracy has been implemented to identify how each classifier is performing (Fig. 4). Indeed, the differences between classification algorithms illustrate that the Random Forests classifier first by 35%, and then the Naive Bayes by 33%, and the SVM classifier by 32%, as shown in Fig. 4.
Fig. 4. Overall classification accuracy comparison between the selected classifiers.
Decision Making by Applying Machine Learning Techniques
165
5 Conclusion The Delphi method with Agile software development as an iterative process has been applied. This led to utilising a sprint in Scrum to produce each piece of software in such a rapid approach using Scrum lifecycle. Indeed, this research work presented a machine learning based technique for better decision-making of spam or non-spam to mitigate spam attacks. Also, the clustering-based classification algorithm applied Kmeans clustering to group words. Afterwards, three different classifiers were applied. The findings indicated that clusters number in combination with the attributes number produced a significant influence on the classification accuracy performance.
References 1. Aliza, H.Y., et al.: A comparative analysis of SMS spam detection employing machine learning methods. In: 2022 6th International Conference on Computing Methodologies and Communication (ICCMC), pp. 916–922. IEEE (2022) 2. Delen, D.: Predictive Analytics: Data Mining. Machine Learning and Data Science for Practitioners. Pearson Education Inc, Old Tappan, New Jersey (2021) 3. King, S.T., Scaife, N., Traynor, P., Abi Din, Z., Peeters, C., Venugopala, H.: Credit card fraud is a computer security problem. IEEE Secur. Priv. 19, 65–69 (2021) 4. Achchab, S., Temsamani, Y.K.: Use of artificial intelligence in human resource management: application of machine learning algorithms to an intelligent recruitment system. In: Troiano, L., et al. (eds.) Advances in Deep Learning, Artificial Intelligence and Robotics. LNNS, vol. 249, pp. 203–215. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-85365-5_20 5. Pandya, D.: Spam detection using clustering-based SVM. In: Proceedings of the 2019 2nd International Conference on Machine Learning and Machine Intelligence, pp. 12–15. ACM, New York, NY, USA (2019) 6. Tejada, A.T., Ella, V.B., Lampayan, R.M., Reaño, C.E.: Modeling reference crop evapotranspiration using support vector machine (SVM) and extreme learning machine (ELM) in region IV-A. Philippines. Water. 14, 754 (2022) 7. Kim, S.-E., Jo, J.-T., Choi, S.-H.: SMS spam filterinig using keyword frequency ratio. Int. J. Secur. Appl. 9, 329–336 (2015) 8. Reaves, B., et al.: Characterizing the security of the SMS ecosystem with public gateways. ACM Trans. Priv. Secur. 22, 1–31 (2019) 9. Manaa, M., Obaid, A., Dosh, M.: Unsupervised approach for email spam filtering using data mining. EAI Endorsed Trans. Energy Web. 8, 162–168 (2021) 10. Looy, A., Poels, G., Snoeck, M.: Evaluating business process maturity models. J. Assoc. Inf. Syst. 18, 461–486 (2017) 11. AbouGrad, H., Warwick, J., Desta, A.: Developing the business process management performance of an information system using the Delphi study technique. In: Reyes-Munoz, A., Zheng, P., Crawford, D., Callaghan, V. (eds.) TIE 2017. LNEE, vol. 532, pp. 195–210. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-02242-6_15 12. AbouGrad, H., Warwick, J.: Applying the Delphi method to measure enterprise content management workflow system performance. In: Arai, K. (ed.) Intelligent Computing Proceedings of the 2022 Computing Conference, Vol. 2, pp. 404–419. Springer International Publishing, Cham (2022) https://doi.org/10.1007/978-3-031-10464-0_27 13. Alsaqqa, S., Sawalha, S., Abdel-Nabi, H.: Agile software development: methodologies and trends. Int. J. Interact. Mob. Technol. 14, 246 (2020)
166
H. AbouGrad et al.
14. Martin, R.C.: Clean Agile: Back to Basics. Pearson, Boston (2020) 15. Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer New York, New York, NY (2000). https://doi.org/10.1007/978-1-4757-3264-1 16. Sohom, B., et al.: Machine learning-based Naive Bayes approach for divulgence of Spam Comment in Youtube station. Int. J. Eng. Appl. Phys. 1, 278–284 (2021) 17. Kamble, M., Dule, C.: Review spam detection using machine learning: comparative study of naive bayes, SVM, logistic regression and random forest classifiers. Int. J. Adv. Res. Sci. Technol. 7, 292–294 (2020) 18. Biesialska, K., Franch, X., Muntés-Mulero, V.: Big Data analytics in Agile software development: a systematic mapping study. Inf. Softw. Technol. 132, 106448 (2021) 19. Khurshid, F., Zhu, Y., Xu, Z., Ahmad, M., Ahmad, M.: Enactment of ensemble learning for review spam detection on selected features. Int. J. Comput. Intell. Syst. 12, 387–394 (2018)
Voronoi Diagram-Based Approach to Identify Maritime Corridors Mariem Masmoudi1(B) , Salem Chakhar2,3 , and Habib Chabchoub4 1
2
OLID Laboratory, University of Sfax, Sfax, Tunisia [email protected] Portsmouth Business School, University of Portsmouth, Portsmouth PO1 3AH, UK [email protected] 3 CORL, University of Portsmouth, Portsmouth PO1 3AH, UK 4 College of Business, Al Ain University, Al Ain, UAE [email protected]
Abstract. This paper proposes a three-phase procedure for maritime corridor generation. The main input of this procedure is a bathymetric map. It outputs a collection of potential corridors relating different start and end points. The proposed procedure is structured into three successive phases: (1) spatial data transformation; (2) construction of the connectivity graph; and (3) identification of potential corridors. The proposed approach has been implemented and applied to identify a collection of corridors for locating a maritime highway linking the archipelago of Kerkennah to Sfax city in Tunisia. Four pairs of start and end points have been considered in this application, leading to four potential corridors, each represented as a collection of linearly adjacent polygons.
Keywords: voronoi diagram maritime corridors · GIS
1
· shortest path · Dijkstra algorithm ·
Introduction
The design and implementation of maritime highway links is an important factor for the economic development of remote area in serval islands. This is particularly relevant to the archipelago of Kerkennah in Tunisia, which is lacking a fixed link relating it to Sfax City in continental Tunisia. The Sfax City is an important economic and industrial center in Tunisia. This contrasts with the nearby archipelago of Kerkennah, which remains relatively undeveloped compared to Sfax City. The existence of a fixed link between the archipelago and Sfax City will certainly support its economical, touristic, and industrial development. In this paper, we present a procedure for generating maritime corridors and apply it to the archipelago of Kerkennah case. The proposed procedure is organised into three phases. The first takes as input a bathymetric map and leads a Voronoi diagram representation of the study area. The second phase uses the Voronoi diagram and transforms it into a connectivity graph where the nodes c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 L. Troiano et al. (Eds.): ICDLAIR 2022, LNNS 670, pp. 167–177, 2023. https://doi.org/10.1007/978-3-031-30396-8_15
168
M. Masmoudi et al.
correspond to the polygons of Voronoi diagram and the edges correspond to the neighborhood relations between these nodes. The third phase applies the Dijkstra algorithm [6] to identify the potential corridors. The proposed approach has been implemented and applied to identify a collection of corridors for locating a maritime highway link between the City of Sfax in continental Tunisia and the archipelago of Kerkennah. The core of the proposed corridors generation procedure is the Voronoi diagram. Voronoi diagrams are useful in several applications such as path planning, particularly for marine vehicles [3,4,15]. The pre-computation of a Voronoi diagram is the initial step of various algorithms in computational geometry [8]. The main argument of using Voronoi diagram in this type of applications is its property of producing a roadmap with edges that are maximally distant to the generator points [3]. The rest of the paper is organised as follows. Section 2 briefly describes the considered problem. Section 3 details the corridors generation procedure. Section 4 applies this procedure to the considered problem. Section 5 ends the paper and briefly enumerates some future work topics.
2
Problem Description
The problem considered in this paper looks to identify a set of potential maritime corridors for linking the City of Sfax in continental Tunisia to the nearby archipelago of Kerkennah (see Fig. 1). Kerkennah islands have a spectacularly beautiful nature and a unique charm that makes it a special touristic destination. Kerkennah islands also, have a quite particular ecosystem. These islands are considered forming a particular bionomic and halieutic marine heritage, not only in Tunisia, but in the whole Mediterranean. Tourism and marine fishing are the resources on which the island’s economy is based. But, Kerkennah is isolated by its geography, and the current transportation infrastructure represents an obstacle to the economic development of the archipelago. The difficulty of moving to and from Kerkennah islands limits its economic growth and reduces investment opportunities. In fact, the increased transportation costs lead to higher costs for all products (especially construction) that discourages investors. The depth of the sea in the considered area varies between 0 and 5 m, but most often remains less than 2 m [7]. Kerkennah islands are surrounded by shoals, interspersed with channels that can reach a depth of 13 m, very difficult to access for vessels other than flat-bottomed boats and feluccas used since antiquity by the populations of the eastern coast of Tunisia [2]. The Grand Sfax coast is marked by a slight slope. A seabed of 5 m or less extends from La Chebba north of Sfax to the level of Kerkennah Island. The seabed—over a radius of 5 km around the Gulf of Gabes south of Sfax—does not exceed 10 m. Thus, it was necessary to dredge a channel 60 m wide by 4.5 km long with a depth of 11 m from the port entrance to the open sea to allow large boats to enter the port of Sfax [14]. The idea of the Sfax-Kerkennah route is not new. A line from Sidi Mansour in Sfax to Sidi Fredj in Kerkennah archipelago has been proposed, otherwise the
Voronoi Diagram-Based Approach to Identify Maritime Corridors
169
Fig. 1. Study Area.
starting point should be El Louza from Sfax. The itinerary of this line, have lead to many discussions (between decision makers from different concerned sectors) about the best starting and end points and the traversed marine space. Such that, a responsible from the Merchant Navy and Ports Office (OMMP) affirms that it is preferable that the new route be above the navigation lines of the boats and ships that starts from the south of Sfax (see Fig. 2). A responsible on navigation of goods and passengers at the port of Sfax agrees with this opinion that it is preferable that the new maritime highway be above these lines (i.e. from the north of Sfax), and this to avoid the crossing of these lines with the highway (in order to minimize the height of the road compared to the sea, and avoid to restrict the heights of the ships of goods and passengers, and the fishing boats and ships).
3
Corridors Generation Procedure
The corridors generation procedure takes as input a bathymetric map B and outputs a collection P of potential corridors. The proposed procedure is structured into three successive phases (see Algorithm 1): (1) a preprocessing phase of spatial data transformation; (2) construction of the connectivity graph; and (3) identification of potential corridors. These phases are detailed in the rest of this section. 3.1
Spatial Data Transformation
The objective of the Spatial Data Transformation is to generate a Delaunay triangulation T from the contour lines (or depth contours or isobaths) in the Bathymetry map B. Based on the bathymetric map B which measures the depths
170
M. Masmoudi et al.
Fig. 2. Navigation Line Between Sfax City and Kerkennah Archipelago.
Algorithm 1: Procedure Corridors
1 2 3 4 5 6 7 8
Input : B: Bathymetric map B; Output: P : Set of potential corridors; Extract a node map N from Bathymetry contours in B; Construct the Delaunay triangulation T from the node map N ; Extract the Voronoi diagram V from Delaunay triangulation T ; Extract the connectivity graph G = (U, W ) from the Voronoi diagram V ; F ← {(s, e) : s and e are possible start and end points in G}; for (each(s, e) ∈ F ) do Ps,e ← Dijkstra(G,s,e); add Ps,e to P ;
and relief of the seabed to determine the topography of the sea floor, we generate a node map N , from which we generate the Delaunay triangulation T . Then, based on the Delaunay triangulation layer T , we generate the Voronoi Diagram V , which decomposes the study area into a collection of spatial units or polygons. All points within the same spatial unit will have the same depth. All these spatial data transformation operations are commonly offered and supported by almost all available Geographic Information Systems (GIS). The Voronoi diagram and the Delaunay triangulation are duals [8]. The specification of neighborhood relation is essential in the construction of Voronoi diagrams. In this paper, two Voronoi polygons are neighbors if they share a common boundary. This holds if the generator points of these polygons are defined as neighbors (or first-order neighbors) in the Delaunay triangulation. Second-order neighbors of a point are its neighbors’ neighbors. Figure 3 illustrates these types of neighborhood relation. In our work, we considered the first-order neighbor to ensure the adjacency constraint on defining corridors.
Voronoi Diagram-Based Approach to Identify Maritime Corridors
171
Fig. 3. Neighborhood relation (Source: [10]).
3.2
Construction of the Connectivity Graph
The objective of the second phase is to extract the connectivity graph G = (U, W ) from Voronoi diagram. The set of nodes U corresponds to the polygons of the Voronoi diagram V while the set of edges W corresponds to neighborhood relations between nodes (polygons) in U , i.e., W = {(ui , uj ) ∈ U × U : ui ∧ uj are adjacent polygons}. An important step in connectivity graph extraction process is the definition of the distance between each adjacent nodes. Measuring the distance between two nodes is straightforward. However, distance calculation between two polygons is not trivial [16], as different options are available as shown in Fig. 4. Generally, measuring distances for non-point objects involves abstracting them into their representative points [11]. A common strategy used to measure the polygonto-polygon distance consists in abstracting these polygons into their centroids, and the polygon-to-polygon is then simply assumed to be equal to the distance between their centroids. Despite its simplicity, estimating the distance between two polygons thorough the distance between their corresponding centroids may lead to bias issues, as shown in a recent study by [11]. Regardless this criticism, the centroid-to-centroid approach is adopted in this paper. The two other approaches are under investigation. 3.3
Identification of Corridors
The last phase of the proposed approach consists in using the well-known shortest path algorithm Dijkstra [6] to identify the potential corridors. From a theoretical point of view, any polygon (node) in continent side of the study area (i.e. Sfax City) can serve as a departure point for the highway and any polygon (node) in
172
M. Masmoudi et al.
Fig. 4. Different Polygon-to-Polygon Distance Approaches: Centroid—Centroid (a): Centroid-Nearest Point (b); and Centroid—Farthest Point (c).
the archipelago of Kerkennah can serve as an arrival point. To cover all possible paths, one should apply Dijkstra algorithm on the Cartesian Product (denoted by F in Algorithm 1) over the set of possible start and end points. This idea is implemented by the for loop in Algorithm 1.
4
Application and Results
The proposed procedure has been applied to identify some alternative solutions to the maritime corridor location problem introduced in Sect. 2. The open source GIS, QGIS, has been used to manage the spatial data and supports all data transformation operations, while Dijkstra algorithm has been implemented using the programming language Python. The main input of the procedure is bathymetry map shown in Fig. 5. The result of the three steps of the first phase of the proposed procedure are given in Fig. 6 (node map), Fig. 7 (Delaunay triangulation) and Fig. 8 (Voronoi diagram), respectively. The second phase of the proposed procedure consists in extracting the connectivity graph from the Voronoi diagram. The generated Voronoi diagram contains 3487 (nodes) polygons. For computational efficiency, the generated connectivity graph has been represented as a tabular format. An extract from this file is in Fig. 9. The final phase of the proposed procedure is to apply Dijkstra algorithm to generate the potential corridors. For the purpose of this illustrative application, we have randomly selected four pairs of start and end points. The application of Dijkstra algorithm leads thus to four potential corridors that are shown in Fig. 10. The characteristics of the obtained corridors are given in Table 1. According to this table, corridor #2 (Yellow) is the best one.
Voronoi Diagram-Based Approach to Identify Maritime Corridors
Fig. 5. Bathymetry map
Fig. 6. Node Map.
173
174
M. Masmoudi et al.
Fig. 7. Delaunay Triangulation.
Fig. 8. Voronoi Diagram.
Voronoi Diagram-Based Approach to Identify Maritime Corridors
Fig. 9. The Connectivity Graph in Tabular Format.
Fig. 10. Potential Corridors.
175
176
M. Masmoudi et al. Table 1. Characteristics of Potential Corridors
5
# Color
Total Length (meters) Rank
1 2 3 4
37007.425 29198.205 40828.795 38666.473
Pink Yellow Orange Dark Orange
2 1 4 3
Conclusion and Future Work
We proposed a Voronoi Diagram-based approach to identify maritime corridors. The proposed approach has been implemented and applied to identify a collection of corridors for locating a maritime highway linking the City of Sfax to the nearby archipelago of Kerkennah, both in Tunisia. In this paper, we considered only the distance factor. However, several other criteria should be taken into consideration, e.g., depth, existing itinerary of ships and boats, in order to have a more realistic and feasible solution. This needs to apply an appropriate aggregation rule to combine all these factors. The multicriteria evaluation approach introduced in [1] can be adopted and applied to address the problem considered in this paper. The design of an heuristic to reduce the computation time, by reducing the number of paths to compute, is another highly recommended future research topic. In this respect, the use of the ranking version of the Dominance-based Rough Set Approach (DRSA) [9,12,13] to assess and select a reduced set of starting and end points will be investigated. Another topic concerns to use of a more comprehensive strategy to measure polygon-to-polygon distance. In this paper, the distance between two polygons is estimated through the distance between their respective centroids. An alternative solution is to use a fuzzy number-based strategy. In this way, the polygon-topolygon distance will be defined as fuzzy numbers obtained by combining the three polygon-to-polygon distance approaches introduced in Fig. 4. Then, the fuzzy Dijkstra shortest path algorithm introduced in [5] can be used to produce to potential corridors.
References 1. Aissi, H., Chakhar, S., Mousseau, V.: GIS-based multicriteria evaluation approach for corridor siting. Environ. Plann. B. Plann. Des. 39(2), 287–307 (2012). https:// doi.org/10.1068/b37085 2. Ben Haj, S.: Projet r´egional pour le d´eveloppement d’un r´eseau m´editerran´een d’aires prot´eg´ees marines et cˆ oti`eres (AMP) ` a travers le renforcement de la cr´eation et de la gestion d’AMP. Etudes et Conseil en Environnement (2017) 3. Candeloro, M., Lekkas, A., Sørensen, A.: A Voronoi-diagram-based dynamic pathplanning system for underactuated marine vessels. Control. Eng. Pract. 61, 41–54 (2017). https://doi.org/10.1016/j.conengprac.2017.01.007
Voronoi Diagram-Based Approach to Identify Maritime Corridors
177
4. Chen, P., Huang, Y., Papadimitriou, E., Mou, J., van Gelder, P.: Global path planning for autonomous ship: A hybrid approach of fast marching square and velocity obstacles methods. Ocean Eng. 214, 107793 (2020). https://doi.org/10. 1016/j.oceaneng.2020.107793 5. Deng, Y., Chen, Y., Zhang, Y., Mahadevan, S.: Fuzzy Dijkstra algorithm for shortest path problem under uncertain environment. Appl. Soft Comput. 12(3), 1231– 1237 (2012). https://doi.org/10.1016/j.asoc.2011.11.011 6. Dijkstra, E.: A note on two problems in connexion with graphs. Numer. Math. 1(1), 269–271 (1959). https://doi.org/10.1007/BF01386390 7. El Ayadi, A.: Etude des interactions entre le grand dauphin tursiops truncatus (montagu 1821) et les filets de pˆche aux ˆıles kerkennah (nord du golfe de gab`es): ´evaluation des d´egˆ ats et des pertes ´economiques. University of Sfax, Faculty of Sciences (2013) 8. Aurenhammer, F.: Voronoi diagrams—a survey of a fundamental geometric data structure. ACM Comput. Surv. 23(3), 345-405 (1991). https://doi.org/10.1145/ 116873.116880 9. Greco, S., Matarazzo, B., Slowi´ nski, R.: Rough sets theory for multicriteria decision analysis. Eur. J. Oper. Res. 129(1), 1–47 (2001). https://doi.org/10.1016/S03772217(00)00167-3 10. Mu, L., Holloway, S.: The Geographic Information Science & Technology. Body of Knowledge, 1st quarter 2019 edn., Chap. Neighborhoods. https://doi.org/10. 22224/gistbok/2019.1.11 11. Mu, W., Tong, D.: Computation of the distance between a polygon and a point in spatial analysis. Int. J. Geogr. Inf. Sci. 36(8), 1575–1600 (2022). https://doi.org/ 10.1080/13658816.2021.1988088 12. Slowi´ nski, R., Greco, S., Matarazzo, B.: Rough set analysis of preference-ordered data. In: Alpigini, J.J., Peters, J.F., Skowron, A., Zhong, N. (eds.) RSCTC 2002. LNCS (LNAI), vol. 2475, pp. 44–59. Springer, Heidelberg (2002). https://doi.org/ 10.1007/3-540-45813-1 6 13. Slowi´ nski, R., Greco, S., Matarazzo, B.: Rough sets in decision making. In: Meyers, R. (ed.) Encyclopedia of Complexity and Systems Science, pp. 7753–7787. Springer, New York (2009). https://doi.org/10.1007/978-0-387-30440-3 460 14. SONEDE: Etude preparatoire relative au projet de construction de la station de dessalement d’eau de mer ` a sfax en republique tunisienne. Societe Nationale d’Exploitation et de Distribution des Eaux (SONEDE), Tunisia (1015) 15. Tu, W., Fang, Z., Li, Q., Shaw, S.L., Chen, B.: A bi-level Voronoi diagram-based metaheuristic for a large-scale multi-depot vehicle routing problem. Transport. Res. Part E: Logist. Transport. Rev. 61, 84–97 (2014). https://doi.org/10.1016/j. tre.2013.11.003 16. Yuan, M.: GIS research to address tensions in geography. Singap. J. Trop. Geogr. 42(1), 13–30 (2021). https://doi.org/10.1111/sjtg.12344
Author Index
A AbouGrad, Hisham 154 Abubahia, Ahmed 154 Ajmain, Moshfiqur Rahman Akter, Shornaly 13
1, 36
B Benedetto, Vincenzo 111, 120, 134, 144 Bhandari, Parth 144 Boumahdi, Fatima 84 Bouramoul, Abdelkrim 53 C Chabchoub, Habib 167 Chakhar, Salem 154, 167 Chauhan, Prachi 73 Ciaparrone, Gioele 134
K Kadam, Vinod 94 Kashid, Shamal 24, 62, 73 Khemchandani, Maahi 94 Khushbu, Sharun Akter 1 khushbu, Sharun Akter 36 Kobra, Khadijatul 36 Kumar, Krishan 24, 62, 73 M Madani, Amina 84 Masmoudi, Mariem 167 Menanno, Marialuisa 111 N Negi, Alok 24, 62, 73 Noori, Sheak Rashed Haider
D Das, Rajesh Kumar 36 Dhiman, Abhishek 62
P Pérez-Fernández, Raúl 144
G Gissi, Francesco
R Rahman, Mahafozur 1 Rahman, Md. Arifur 13
111, 120, 134, 144
H Hadri, Sid Ahmed 53 Hemina, Karim 84 I Islam, Mirajul 13 Islam, Saiful 13 J Jadhav, Shivajirao
S Saini, Parul 24, 62, 73 Sammi, Samrina Sarkar 36 T Troiano, Luigi
94
120
V Villa, Elena Mejuto
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 L. Troiano et al. (Eds.): ICDLAIR 2022, LNNS 670, p. 179, 2023. https://doi.org/10.1007/978-3-031-30396-8
120
1, 36