163 68 114MB
English Pages 1030 [1020] Year 2021
Lecture Notes in Networks and Systems 371
Pandian Vasant Ivan Zelinka Gerhard-Wilhelm Weber Editors
Intelligent Computing & Optimization Proceedings of the 4th International Conference on Intelligent Computing and Optimization 2021 (ICO2021)
Lecture Notes in Networks and Systems Volume 371
Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Fernando Gomide, Department of Computer Engineering and Automation—DCA, School of Electrical and Computer Engineering—FEEC, University of Campinas— UNICAMP, São Paulo, Brazil Okyay Kaynak, Department of Electrical and Electronic Engineering, Bogazici University, Istanbul, Turkey Derong Liu, Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, USA; Institute of Automation, Chinese Academy of Sciences, Beijing, China Witold Pedrycz, Department of Electrical and Computer Engineering, University of Alberta, Alberta, Canada; Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Marios M. Polycarpou, Department of Electrical and Computer Engineering, KIOS Research Center for Intelligent Systems and Networks, University of Cyprus, Nicosia, Cyprus Imre J. Rudas, Óbuda University, Budapest, Hungary Jun Wang, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
The series “Lecture Notes in Networks and Systems” publishes the latest developments in Networks and Systems—quickly, informally and with high quality. Original research reported in proceedings and post-proceedings represents the core of LNNS. Volumes published in LNNS embrace all aspects and subfields of, as well as new challenges in, Networks and Systems. The series contains proceedings and edited volumes in systems and networks, spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems and other. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution and exposure which enable both a wide and rapid dissemination of research output. The series covers the theory, applications, and perspectives on the state of the art and future developments relevant to systems and networks, decision making, control, complex processes and related areas, as embedded in the fields of interdisciplinary and applied sciences, engineering, computer science, physics, economics, social, and life sciences, as well as the paradigms and methodologies behind them. Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science.
More information about this series at https://link.springer.com/bookseries/15179
Pandian Vasant Ivan Zelinka Gerhard-Wilhelm Weber •
•
Editors
Intelligent Computing & Optimization Proceedings of the 4th International Conference on Intelligent Computing and Optimization 2021 (ICO2021)
123
Editors Pandian Vasant Faculty of Electrical & Electronic Engineering MERLIN Research Centre, Ton Duc Thang University Hồ Chí Minh City, Vietnam
Ivan Zelinka Faculty of Electrical Engineering and Computer Science VŠB TU Ostrava Ostrava-Poruba, Czech Republic
Gerhard-Wilhelm Weber Faculty of Engineering Management Poznan University of Technology Poznan, Poland
ISSN 2367-3370 ISSN 2367-3389 (electronic) Lecture Notes in Networks and Systems ISBN 978-3-030-93246-6 ISBN 978-3-030-93247-3 (eBook) https://doi.org/10.1007/978-3-030-93247-3 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
The 4th edition of the popular as well as prestigious International Conference on Intelligent Computing and Optimization (ICO) 2021, in short ICO’2021, will be held along an “online” platform, herewith respecting the care for everyone as necessitated by the pandemic COVID-19. The physical conference is foreseen to be celebrated at G Hua Hin Resort & Mall in Hua Hin, Thailand, once the COVID-19 will be jointly overcome. Indeed, the core objective of the international conference is to bring together in the spirit of community the global research leaders, distinguished experts and scholars from the scientific areas of Intelligent Computing and Optimization gathered all over the globe to share their knowledge and experiences on the current research achievements in diverse fields, to learn from their old and new friends and together create new research ideas and designs of collaboration. This conference creates and provides a “golden chance” for the international research community to interact and introduce their newest research advances, results, innovative discoveries and inventions in the midst of their scientific colleagues and their friends. The proceedings book of ICO’2021 is published by the renowned house of Springer Nature (Lecture Notes in Networks and Systems). Almost 150 authors submitted their full papers for ICO’2021. They represent more than 40 countries, such as Algeria, Austria, Bangladesh, Bulgaria, Canada, China, Croatia, Cyprus, Ethiopia, India, Iran, Iraq, Japan, Jordan, Malaysia, Mauritius, Mexico, Morocco, Nepal, Oman, Pakistan, Peru, Philippines, Poland, Portugal, Russia, Slovenia, Spain, South Africa, Sweden, Taiwan, Thailand, Turkey, Turkmenistan, Ukraine, United Arab Emirates, USA, UK, Vietnam and others. This worldwide representation clearly demonstrates the growing interest of the global research community in our conference series. In the contemporary edition, this book of conference proceedings encloses the original and innovative, creative and recreative scientific fields on optimization and optimal control, renewable energy and sustainability, artificial intelligence and operational research, economics and management, smart cities and rural planning, meta-heuristics and big data analytics, cyber security and blockchains, IoTs and Industry 4.0, mathematical modelling and simulation, health care and medicine. The Organizing Committee of ICO’2021 cordially expresses its thanks to all the v
vi
Preface
authors and co-authors and all the diligent reviewers for their precious contributions to both the conference and this book. In fact, carefully selected and high-quality papers have been reviewed and chosen by the International Programme Committee in order to become published in the series Lecture Notes in Networks and Systems of Springer Nature. ICO’2021 presents enlightening contributions for research scholars across the planet in the research areas of innovative computing and novel optimization techniques and with the cutting-edge methodologies and applications. This conference could not have been organized without the strong support and help from the committee members of ICO’2021. We would like to sincerely thank Prof. Elias Munapo (North-West University, South Africa), Prof. Rustem Popa (Dunarea de Jos University, Romania), Professor Jose Antonio Marmolejo (Universidad Panamericana, Mexico) and Prof. Román Rodríguez-Aguilar (Universidad Panamericana, Mexico) for their great help and support in organizing the conference. We also appreciate the valuable guidance and great contribution from Dr. J. Joshua Thomas (UOW Malaysia KDU Penang University College, Malaysia), Prof. Gerhard-Wilhelm Weber (Poznan University of Technology, Poland; Middle East Technical University, Turkey), Prof. Mohammad Shamsul Arefin (Chittagong University of Engineering and Technology, Bangladesh), Prof. Mohammed Moshiul Hoque (Chittagong University of Engineering & Technology, Bangladesh), Prof. Ivan Zelinka (VSB-TU Ostrava, Czech Republic), Prof. Ugo Fiore (Federico II University, Italy), Dr. Mukhdeep Singh Manshahia (Punjabi University Patiala, India), Mr. K. C. Choo (CO2 Networks, Malaysia), Prof. Karl Andersson (Luleå University of Technology (LTU), Sweden), Prof. Tatiana Romanova (National Academy of Sciences of Ukraine, Ukraine), Prof. Nader Barsoum (Curtin University of Technology, Australia), Prof. Goran Klepac (Hrvatski Telekom, Croatia), Prof. Sansanee Auephanwiriyakul (Chiang Mai University, Thailand), Dr. Thanh Dang Trung (Thu Dau Mot University, Vietnam), Dr. Leo Mrsic (Algebra University College, Croatia) and Dr. Shahadat Hossain (City University, Bangladesh). Finally, we would like to convey our utmost sincerest thanks to Prof. Dr. Janusz Kacprzyk, Dr. Thomas Ditzinger, Dr. Holger Schaepe and Mr. Nareshkumar Mani of SPRINGER NATURE for their wonderful help and support in publishing ICO’2021 conference proceedings Book in Lecture Notes in Networks and Systems. December 2021
Pandian Vasant Ivan Zelinka Gerhard-Wilhelm Weber
Conference Committees ICO’2021
Steering Committee Elias Munapo Jose Antonio Marmolejo Joshua Thomas
North-West University, South Africa Panamerican University, Mexico UOW Malaysia, KDU Penang University College, Malaysia
General Chair Pandian Vasant
MERLIN Research Centre, TDTU, Vietnam
Honorary Chairs Gerhard W. Weber Rustem Popa Leo Mrsic Ivan Zelinka Jose Antonio Marmolejo Roman Rodriguez-Aguilar
Poznan University of Technology, Poland Dunarea De Jos University, Romania Algebra University College, Croatia Technical University of Ostrava, Czech Republic Panamerican University, Mexico Panamerican University, Mexico
TPC Chairs Joshua Thomas Jose Antonio Marmolejo
KDU Penang University College, Malaysia Panamerican University, Mexico
Special Sessions Chairs Mohammad Shamsul Arefin Mukhdeep Singh Manshahia
CUET, Bangladesh Punjabi University Patiala, India
vii
viii
Conference Committees ICO’2021
Keynote Chairs and Panel Chairs Ugo Fiore Mariusz Drabecki
Federico II University, Italy Warsaw University of Technology, Poland
Publicity and Social Media Chairs Anirban Banik Kwok Tai Chui
National Institute of Technology Agartala, India Hong Kong Metropolitan University, Hong Kong
Workshops and Tutorials Chairs Mohammed Moshiul Hoque Leo Mrsic
CUET, Bangladesh Algebra University College, Croatia
Posters and Demos Chairs Roberto Alonso González-Lezcano Iztok Fister
San Pablo CEU University, Spain University of Maribor, Slovenia
Sponsorship and Exhibition Chairs K. C. Choo Igor Litvinchev
CO2 Networks, Malaysia Nuevo Leon State University, Mexico
Publications Chairs Rustem Popa Ugo Fiore
Dunarea De Jos University, Romania Federico II University, Italy
Webinar Coordinator Joshua Thomas
UOW Malaysia, KDU Penang University College, Malaysia
Web Editor K. C. Choo
CO2 Networks, Malaysia
Conference Committees ICO’2021
ix
Reviewers The volume editors of LNNS Springer Nature of ICO’2021 would like to sincerely thank the following reviewers for their outstanding work in reviewing all the papers for ICO’2021 conference proceedings via Easychair (https://www.icico.info/). Aditya Singh Ahed Abugabah Ahmad Al Smadi Anton Abdulbasah Kamil Azazkhan Ibrahimkhan Pathan Dang Trung Thanh Danijel Kučak Elias Munapo F. Hooshmand Igor Litvinchev Jaramporn Hassamontr Jean Baptiste Bernard Pea-Assounga Jonnel Alejandrino Jose Antonio Marmolejo Leo Mrsic Mingli Song Mohammed Boukendil Morikazu Nakamura Mukhdeep Singh Manshahia Prithwish Sen Roman Rodriguez-Aguilar Ronnie Concepcion II Rustem Popa Shahadat Hossain Sinan Melih Nigdeli Stefan Ivanov Telmo Matos Thanh Hung Bui Ugo Fiore Vedran Juričić
Lovely Professional University, India Zayed University, United Arab Emirates Xidian University, China Istanbul Gelisim University, Turkey Sardar Vallabhbhai National Institute of Technology, India Thu Dau Mot University, Vietnam Algebra University College, Croatia North-West University Amirkabir University of Technology, Iran Universidad Autónoma de Nuevo León, Mexico King Mongkut’s University of Technology North Bangkok, Thailand Jiangsu University, China De La Salle University, Philippines Universidad Panamericana, Mexico Algebra University College, Croatia Zhejiang University, China LMFE, Morocco University of the Ryukyus, Japan Punjabi University Patiala, India IIIT Guwahati, India Universidad Panamericana, Mexico De La Salle University Dunarea de Jos University, Romania City University, Bangladesh Istanbul University, Turkey Technical University of Gabrovo, Bulgaria CIICESI, Portugal Thu Dau Mot University, Vietnam University of Naples Parthenope, Italy University of Zagreb, Croatia
Contents
Sustainable Artificial Intelligence Applications Low-Light Image Enhancement with Artificial Bee Colony Method . . . . Anan Banharnsakun
3
Optimal State-Feedback Controller Design for Tractor Active Suspension System via Lévy-Flight Intensified Current Search Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Thitipong Niyomsat, Wattanawong Romsai, Auttarat Nawikavatan, and Deacha Puangdownreong
14
The Artificial Intelligence Platform with the Use of DNN to Detect Flames: A Case of Acoustic Extinguisher . . . . . . . . . . . . . . . . . . . . . . . . Stefan Ivanov and Stanko Stankov
24
Adaptive Harmony Search for Cost Optimization of Reinforced Concrete Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Aylin Ece Kayabekir, Sinan Melih Nigdeli, and Gebrail Bekdaş
35
Efficient Traffic Signs Recognition Based on CNN Model for Self-Driving Cars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Said Gadri and Nour ElHouda Adouane
45
Optimisation and Prediction of Glucose Production from Oil Palm Trunk via Simultaneous Enzymatic Hydrolysis . . . . . . . . . . . . . . . . . . . Chan Mieow Kee, Wang Chan Chin, Tee Hoe Chun, and Nurul Adela Bukhari Synthetic Data Augmentation of Cycling Sport Training Datasets . . . . . Iztok Fister, Grega Vrbančič, Vili Podgorelec, and Iztok Fister Jr. Hybrid Pooling Based Convolutional Neural Network for Multi-class Classification of MR Brain Tumor Images . . . . . . . . . . . Gazi Jannatul Ferdous, Khaleda Akhter Sathi, and Md. Azad Hossain
55
65
75
xi
xii
Contents
Importance of Fuzzy Logic in Traffic and Transportation Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Aditya Singh
87
A Fuzzy Based Clustering Approach to Prolong the Network Lifetime in Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . Enaam A. Al-Hussain and Ghaida A. Al-Suhail
97
Visual Expression Analysis from Face Images Using Morphological Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 Md. Habibur Rahman, Israt Jahan, and Yeasmin Ara Akter Detection of Invertebrate Virus Carriers Using Deep Learning Networks to Prevent Emerging Pandemic-Prone Disease in Tropical Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 Daeniel Song Tze Hai, J. Joshua Thomas, Justtina Anantha Jothi, and Rasslenda-Rass Rasalingam Classification and Detection of Plant Leaf Diseases Using Various Deep Learning Techniques and Convolutional Neural Network . . . . . . . 132 Partha P. Mazumder, Monuar Hossain, and Md Hasnat Riaz Deep Learning and Machine Learning Applications Distributed Self-triggered Optimization for Multi-agent Systems . . . . . . 145 Komal Mehmood and Maryam Mehmood Automatic Categorization of News Articles and Headlines Using Multi-layer Perceptron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 Fatima Jahara, Omar Sharif, and Mohammed Moshiul Hoque Using Machine Learning Techniques for Estimating the Electrical Power of a New-Style of Savonius Rotor: A Comparative Study . . . . . . 167 Youssef Kassem, Hüseyin Çamur, Gokhan Burge, Adivhaho Frene Netshimbupfe, Elhamam A. M. Sharfi, Binnur Demir, and Ahmed Muayad Rashid Al-Ani Tree-Like Branching Network for Multi-class Classification . . . . . . . . . 175 Mengqi Xue, Jie Song, Li Sun, and Mingli Song Multi-resolution Dense Residual Networks with High-Modularization for Monocular Depth Estimation . . . . . . . . . 185 Din Yuen Chan, Chien-I Chang, Pei Hung Wu, and Chung Ching Chiang A Decentralized Federated Learning Paradigm for Semantic Segmentation of Geospatial Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 Yash Khasgiwala, Dion Trevor Castellino, and Sujata Deshmukh
Contents
xiii
Development of Contact Angle Prediction for Cellulosic Membrane . . . 207 Ahmad Azharuddin Azhari bin Mohd Amiruddin, Mieow Kee Chan, and Sokchoo Ng Feature Engineering Based Credit Card Fraud Detection for Risk Minimization in E-Commerce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 Md. Moinul Islam, Rony Chowdhury Ripan, Saralya Roy, and Fazle Rahat DCNN-LSTM Based Audio Classification Combining Multiple Feature Engineering and Data Augmentation Techniques . . . . . . . . . . . 227 Md. Moinul Islam, Monjurul Haque, Saiful Islam, Md. Zesun Ahmed Mia, and S. M. A. Mohaiminur Rahman Sentiment Analysis: Developing an Efficient Model Based on Machine Learning and Deep Learning Approaches . . . . . . . . . . . . . 237 Said Gadri, Safia Chabira, Sara Ould Mehieddine, and Khadidja Herizi Improved Face Detection System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 Ratna Chakma, Juel Sikder, and Utpol Kanti Das Paddy Price Prediction in the South-Western Region of Bangladesh . . . 258 Juliet Polok Sarkar, M. Raihan, Avijit Biswas, Khandkar Asif Hossain, Keya Sarder, Nilanjana Majumder, Suriya Sultana, and Kajal Sana Paddy Disease Prediction Using Convolutional Neural Network . . . . . . 268 Khandkar Asif Hossain, M. Raihan, Avijit Biswas, Juliet Polok Sarkar, Suriya Sultana, Kajal Sana, Keya Sarder, and Nilanjana Majumder Android Malware Detection System: A Machine Learning and Deep Learning Based Multilayered Approach . . . . . . . . . . . . . . . . . . . . . . . . . 277 Md Shariar Hossain and Md Hasnat Riaz IOTs, Big Data, Block Chain and Health Care Blockchain as a Secure and Reliable Technology in Business and Communication Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 Vedran Juričić, Danijel Kučak, and Goran Đambić iMedMS: An IoT Based Intelligent Medication Monitoring System for Elderly Healthcare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302 Khalid Ibn Zinnah Apu, Mohammed Moshiul Hoque, and Iqbal H. Sarker Internet Banking and Bank Investment Decision: Mediating Role of Customer Satisfaction and Employee Satisfaction . . . . . . . . . . . . . . . 314 Jean Baptiste Bernard Pea-Assounga and Mengyun Wu
xiv
Contents
Inductions of Usernames’ Strengths in Reducing Invasions on Social Networking Sites (SNSs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 Md. Mahmudur Rahman, Shahadat Hossain, Mimun Barid, and Md. Manzurul Hasan Tomato Leaf Disease Recognition Using Depthwise Separable Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341 Syed Md. Minhaz Hossain, Khaleque Md. Aashiq Kamal, Anik Sen, and Kaushik Deb End-to-End Scene Text Recognition System for Devanagari and Bengali Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352 Prithwish Sen, Anindita Das, and Nilkanta Sahu A Deep Convolutional Neural Network Based Classification Approach for Sleep Scoring of NFLE Patients . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360 Sarker Safat Mahmud, Md. Rakibul Islam Prince, Md. Shamim, and Sarker Shahriar Mahmud Remote Fraud and Leakage Detection System Based on LPWAN System for Flow Notification and Advanced Visualization in the Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370 Dario Protulipac, Goran Djambic, and Leo Mršić An Analysis of AUGMECON2 Method on Social Distance-Based Layout Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381 Şeyda Şimşek, Eren Özceylan, and Neşe Yalçın An Intelligent Information System and Application for the Diagnosis and Analysis of COVID-19 . . . . . . . . . . . . . . . . . . . . . 391 Atif Mehmood, Ahed Abugabah, Ahmad A. L. Smadi, and Reyad Alkhawaldeh Hand Gesture Recognition Based Human Computer Interaction to Control Multiple Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397 Sanzida Islam, Abdul Matin, and Hafsa Binte Kibria Towards Energy Savings in Cluster-Based Routing for Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407 Enaam A. Al-Hussain and Ghaida A. Al-Suhail Utilization of Self-organizing Maps for Map Depiction of Multipath Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417 Jonnel Alejandrino, Emmanuel Trinidad, Ronnie Concepcion II, Edwin Sybingco, Maria Gemel Palconit, Lawrence Materum, and Elmer Dadios
Contents
xv
Big Data for Smart Cities and Smart Villages: A Review . . . . . . . . . . . 427 Tajnim Jahan, Sumayea Benta Hasan, Nuren Nafisa, Afsana Akther Chowdhury, Raihan Uddin, and Mohammad Shamsul Arefin A Compact Radix-Trie: A Character-Cell Compressed Trie Data-Structure for Word-Lookup System . . . . . . . . . . . . . . . . . . . . . . . 440 Rahat Yeasin Emon and Sharmistha Chanda Tista Digital Twins and Blockchain: Empowering the Supply Chain . . . . . . . 450 Jose Eduardo Aguilar-Ramirez, Jose Antonio Marmolejo-Saucedo, and Roman Rodriguez-Aguilar Detection of Malaria Disease Using Image Processing and Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457 Md. Maruf Hasan, Sabiha Islam, Ashim Dey, Annesha Das, and Sharmistha Chanda Tista Fake News Detection of COVID-19 Using Machine Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467 Promila Ghosh, M. Raihan, Md. Mehedi Hassan, Laboni Akter, Sadika Zaman, and Md. Abdul Awal Sustainable Modelling, Computing and Optimization 1D HEC-RAS Modeling Using DEM Extracted River Geometry A Case of Purna River; Navsari City; Gujarat, India . . . . . . . . . . . . . . 479 Azazkhan Ibrahimkhan Pathan, P. G. Agnihotri, D. Kalyan, Daryosh Frozan, Muqadar Salihi, Shabir Ahmad Zareer, D. P. Patel, M. Arshad, and S. Joseph A Scatter Search Algorithm for the Uncapacitated Facility Location Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 488 Telmo Matos An Effective Dual-RAMP Algorithm for the Capacitated Facility Location Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495 Telmo Matos Comparative Study of Blood Flow Through Normal, Stenosis Affected and Bypass Grafted Artery Using Computational Fluid Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503 Anirban Banik, Tarun Kanti Bandyopadhyay, and Vladimir Panchenko Transportation Based Approach for Solving the Generalized Assignment Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513 Elias Munapo
xvi
Contents
Generalized Optimization: A First Step Towards Category Theoretic Learning Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525 Dan Shiebler Analysis of Non-linear Structural Systems via Hybrid Algorithms . . . . . 536 Sinan Melih Nigdeli, Gebrail Bekdaş, Melda Yücel, Aylin Ece Kayabekir, and Yusuf Cengiz Toklu Ising Model Formulation for Job-Shop Scheduling Problems Based on Colored Timed Petri Nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546 Kohei Kaneshima and Morikazu Nakamura Imbalanced Sample Generation and Evaluation for Power System Transient Stability Using CTGAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555 Gengshi Han, Shunyu Liu, Kaixuan Chen, Na Yu, Zunlei Feng, and Mingli Song Efficient DC Algorithm for the Index-Tracking Problem . . . . . . . . . . . . 566 F. Hooshmand and S. A. MirHassani Modelling External Debt Using VECM and GARCH Models . . . . . . . . 577 Naledi Blessing Mokoena, Johannes Tshepiso Tsoku, and Martin Chanza Optimization of Truss Structures with Sizing of Bars by Using Hybrid Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 592 Melda Yücel, Gebrail Bekdaş, and Sinan Melih Nigdeli Information Extraction from Receipts Using Spectral Graph Convolutional Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602 Bui Thanh Hung An Improved Shuffled Frog Leaping Algorithm with Rotating and Position Sequencing in 2-Dimension Shapes for Discrete Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613 Kanchana Daoden Lean Procurement in an ERP Cloud Base . . . . . . . . . . . . . . . . . . . . . . . 623 Adrian Chin-Hernandez, Jose Antonio Marmolejo-Saucedo, and Jania Saucedo-Martinez An Approximate Solution Proposal to the Vehicle Routing Problem Through Simulation-Optimization Approach . . . . . . . . . . . . . . . . . . . . . 634 Jose Antonio Marmolejo-Saucedo and Armando Calderon Osornio Hybrid Connectionist Models to Investigate the Effects on Petrophysical Variables for Permeability Prediction . . . . . . . . . . . . . 647 Mohammad Islam Miah and Mohammed Adnan Noor Abir
Contents
xvii
Sustainable Environmental, Social and Economics Development Application of Combined SWOT and AHP Analysis to Assess the Reality and Select the Priority Factors for Social and Economic Development (a Case Study for Soc Trang City) . . . . . . . . . . . . . . . . . . 659 Dang Trung Thanh and Nguyen Huynh Anh Tuyet Design and Analysis of Water Distribution Network Using Epanet 2.0 and Loop 4.0 – A Case Study of Narangi Village . . . . . . . . . . . . . . . 671 Usman Mohseni, Azazkhan I. Pathan, P. G. Agnihotri, Nilesh Patidar, Shabir Ahmad Zareer, D. Kalyan, V. Saran, Dhruvesh Patel, and Cristina Prieto Effect of Climate Change on Sea Level Rise with Special Reference to Indian Coastline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685 Dummu Kalyan, Azazkhan Ibrahimkhan Pathan, P. G. Agnihotri, Mohammad Yasin Azimi, Daryosh Frozan, Joseph Sebastian, Usman Mohseni, Dhruvesh Patel, and Cristina Prieto Design and Analysis of Water Distribution Network Using Watergems – A Case Study of Narangi Village . . . . . . . . . . . . . . . . . . . 695 Usman Mohseni, Azazkhan I. Pathan, P. G. Agnihotri, Nilesh Patidar, Shabir Ahmad Zareer, V. Saran, and Vaishali Rana Weight of Factors Affecting Sustainable Urban Agriculture Development (Case Study in Thu Dau Mot Smart City) . . . . . . . . . . . . 707 Trung Thanh Dang, Quang Minh Vo, and Thanh Vu Pham Factors Behind the World Crime Index: Some Parametric Observations Using DBSCAN and Linear Regression . . . . . . . . . . . . . . 718 Shahadat Hossain, Md. Manzurul Hasan, Md. Mahmudur Rahman, and Mimun Barid Object Detection in Foggy Weather Conditions . . . . . . . . . . . . . . . . . . . 728 Prithwish Sen, Anindita Das, and Nilkanta Sahu Analysis and Evaluation of TripAdvisor Data: A Case of Pokhara, Nepal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 738 Tan Wenan, Deepanjal Shrestha, Bijay Gaudel, Neesha Rajkarnikar, and Seung Ryul Jeong Simulation of the Heat and Mass Transfer Occurring During Convective Drying of Mango Slices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 751 Ripa Muhury, Ferdusee Akter, and Ujjwal Kumar Deb A Literature Review on the MPPT Techniques Applied in Wind Energy Harvesting System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 762 Tigilu Mitiku and Mukhdeep Singh Manshahia
xviii
Contents
Developing a System to Analyze Comments of Social Media and Identify Friends Category . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773 Tasfia Hyder, Rezaul Karim, and Mohammad Shamsul Arefin Comparison of Watershed Delineation and Drainage Network Using ASTER and CARTOSAT DEM of Surat City, Gujarat . . . . . . . . 788 Arbaaz A. Shaikh, Azazkhan I. Pathan, Sahita I. Waikhom, and Praveen Rathod Numerical Investigation of Natural Convection Combined with Surface Radiation in a Divided Cavity Containing Air and Water . . . . 801 Zouhair Charqui, Lahcen El Moutaouakil, Mohammed Boukendil, Rachid Hidki, and Zaki Zrikem Key Factors in the Successful Integration of the Circular Economy Approach in the Industry of Non-durable Goods: A Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 812 Marcos Jacinto-Cruz, Román Rodríguez-Aguilar, and Jose-Antonio Marmolejo-Saucedo Profile of the Business Science Professional for the Industry 4.0 . . . . . . 820 Antonia Paola Salgado-Reyes and Roman Rodríguez-Aguilar Rainfall-Runoff Simulation and Storm Water Management Model for SVNIT Campus Using EPA SWMM 5.1 . . . . . . . . . . . . . . . . . . . . . 832 Nitin Singh Kachhawa, Prasit Girish Agnihotri, and Azazkhan Ibrahimkhan Pathan Emerging Smart Technology Applications Evaluation and Customized Support of Dynamic Query Form Through Web Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 845 B. Bazeer Ahamed and Murugan Krishnamurthy Enhancing Student Learning Productivity with Gamification-Based E-learning Platform: Empirical Study and Best Practices . . . . . . . . . . . 857 Danijel Kučak, Adriana Biuk, and Leo Mršić Development of Distributed Data Acquisition System . . . . . . . . . . . . . . . 867 Bertram Losper, Vipin Balyan, and B. Groenewald Images Within Images? A Multi-image Paradigm with Novel Key-Value Graph Oriented Steganography . . . . . . . . . . . . . . . . . . . . . . 879 Subhrangshu Adhikary Application of Queuing Theory to Analyse an ATM Queuing System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 888 Kolentino N. Mpeta and Otsile R. Selaotswe
Contents
xix
A Novel Prevention Technique Using Deep Analysis Intruder Tracing with a Bottom-Up Approach Against Flood Attacks in VoIP Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 893 Sheeba Armoogum and Nawaz Mohamudally Data Mining for Software Engineering: A Survey . . . . . . . . . . . . . . . . . 905 Maisha Maimuna, Nafiza Rahman, Razu Ahmed, and Mohammad Shamsul Arefin Simulation of Load Absorption and Deflection of Helical Suspension Spring: A Case of Finite Element Method . . . . . . . . . . . . . . . . . . . . . . . 917 Rajib Karmaker, Shipan Chandra Deb Nath, and Ujjwal Kumar Deb Prediction of Glucose Concentration Hydrolysed from Oil Palm Trunks Using a PLSR-Based Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 927 Wan Sieng Yeo, Mieow Kee Chan, and Nurul Adela Bukhari Ontology of Lithography-Based Processes in Additive Manufacturing with Focus on Ceramic Materials . . . . . . . . . . . . . . . . . 938 Marc Gmeiner, Wilfried Lepuschitz, Munir Merdan, and Maximilian Lackner Natural Convection and Surface Radiation in an Inclined Square Cavity with Two Heat-Generating Blocks . . . . . . . . . . . . . . . . . . . . . . . 948 Rachid Hidki, Lahcen El Moutaouakil, Mohammed Boukendil, Zouhair Charqui, and Abdelhalim Abdelbaki Improving the Route Selection for Geographic Routing Using Fuzzy-Logic in VANET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 958 Israa A. Aljabry and Ghaida A. Al-Suhail Trends and Techniques of Biomedical Text Mining: A Review . . . . . . . 968 Maliha Rashida, Fariha Iffath, Rezaul Karim, and Mohammad Shamsul Arefin Electric Vehicles as Distributed Micro Generation Using Smart Grid for Decision Making: Brief Literature Review . . . . . . . . . . . . . . . . 981 Julieta Sanchez-García, Román Rodríguez-Aguilar, and Jose Antonio Marmolejo-Saucedo A Secured Network Layer and Information Security for Financial Institutions: A Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 992 Md Rahat Ibne Sattar, Shrabonti Mitra, Sadia Sultana, Umme Salma Pushpa, Dhruba Bhattacharjee, Abhijit Pathak, and Mayeen Uddin Khandaker Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1003
Editors
Dr. Pandian Vasant MERLIN Research Centre, TDTU, Vietnam E-mail: [email protected] Pandian Vasant is Research Associate at MERLIN Research Centre, Vietnam, and Editor in Chief of International Journal of Energy Optimization and Engineering (IJEOE). He holds PhD in Computational Intelligence (UNEM, Costa Rica), MSc (University Malaysia Sabah, Malaysia, Engineering Mathematics) and BSc (Hons, Second Class Upper) in Mathematics (University of Malaya, Malaysia). His research interests include soft computing, hybrid optimization, innovative computing and applications. He has co-authored research articles in journals, conference proceedings, presentations, special issues Guest Editor, chapters (300 publications indexed in Research-Gate) and General Chair of EAI International Conference on Computer Science and Engineering in Penang, Malaysia (2016) and Bangkok, Thailand (2018). In the years 2009 and 2015, he was awarded top reviewer and outstanding reviewer for the journal Applied Soft Computing (Elsevier). He has 30 years of working experiences at the universities. Currently, he is General Chair of International Conference on Intelligent Computing and Optimization (https://www. icico.info/) and Member of AMS (USA), NAVY Research Group (TUO, Czech Republic) and MERLIN Research Centre (TDTU, Vietnam). H-Index Google Scholar = 34; i-10-index = 143. Professor Ivan Zelinka Technical University of Ostrava (VSB-TU), Faculty of Electrical Engineering and Computer Science, Czech Republic Email: [email protected] Ivan Zelinka is currently working at the Technical University of Ostrava (VSB-TU), Faculty of Electrical Engineering and Computer Science. He graduated consequently at Technical University in Brno (1995—MSc.), UTB in Zlin (2001— PhD) and again at Technical University in Brno (2004—Assoc. Prof.) and VSB-TU (2010—Professor). Before his academic career, he was employed like TELECOM xxi
xxii
Editors
technician, computer specialist (HW+SW) and commercial bank (computer and LAN supervisor). During his career at UTB, he proposed and opened seven different lectures. He also has been invited for lectures at numerous universities in different EU countries plus the role of the keynote speaker at the Global Conference on Power, Control and Optimization in Bali, Indonesia (2009), Interdisciplinary Symposium on Complex Systems (2011), Halkidiki, Greece, and IWCFTA 2012, Dalian China. The field of his expertise is mainly on unconventional algorithms and cybersecurity. He is and was responsible Supervisor of three grants of fundamental research of Czech grant agency GAČR, Co-supervisor of grant FRVŠ—laboratory of parallel computing. He was also working on numerous grants and two EU project like Member of the team (FP5—RESTORM), Supervisor (FP7—PROMOEVO) of the Czech team and Supervisor of international research (founded by TACR agency) focused on the security of mobile devices (Czech—Vietnam). Currently, he is Professor at the Department of Computer Science and in total, and he has been Supervisor of more than 40 MSc. and 25 Bc. diploma thesis. He is also Supervisor of doctoral students including students from abroad. He was awarded by Siemens Award for his PhD thesis, as well as by journal Software news for his book about artificial intelligence. He is Member of British Computer Society, Editor in Chief of Springer book series: Emergence, Complexity and Computation (http://www. springer.com/series/10624), Editorial board of Saint Petersburg State University Studies in Mathematics, a few international programme committees of various conferences and international journals. He is Author of journal articles as well as of books in the Czech and English language and one of three founders of TC IEEE on big data http://ieeesmc.org/about-smcs/history/2014-archives/44-about-smcs/ history/2014/technical-committees/204-big-data-computing/. He is also head of research group NAVY http://navy.cs.vsb.cz. Professor Gerhard-Wilhelm Weber Poznan University of Technology, Poznan, Poland Email: [email protected] G.-W. Weber is Professor at Poznan University of Technology, Poznan, Poland, at Faculty of Engineering Management, and Chair of Marketing and Economic Engineering. His research is on OR, financial mathematics, optimization and control, neuro- and bio-sciences, data mining, education and development; he is involved in the organization of scientific life internationally. He received his Diploma and Doctorate in mathematics and economics/business administration, at RWTH Aachen, and his Habilitation at TU Darmstadt. He held professorships by proxy at the University of Cologne, and TU Chemnitz, Germany. At IAM, METU, Ankara, Turkey, he was Professor in the programmes of Financial Mathematics and Scientific Computing and Assistant to Director, and he has been Member of further graduate schools, institutes, and departments of METU. Further, he has affiliations at the universities of Siegen, Ballarat, Aveiro, North Sumatra, and Malaysia University of Technology, and he is “Advisor to EURO Conferences”.
Editors
xxiii
Professor Elias Munapo North West University, South Africa Email: [email protected] Elias Munapo is Professor of Operations Research, and he holds a BSc. (Hons) in Applied Mathematics (1997), MSc. in Operations Research (2002) and a PhD in Applied Mathematics (2010). All these qualifications are from the National University of Science and Technology (N.U.S.T.) in Zimbabwe. In addition, he has a certificate in outcomes-based assessment in higher education and open distance learning, from the University of South Africa (UNISA) and another certificate in University Education Induction Programme from the University of KwaZulu-Natal (UKZN). He is Professional Natural Scientist certified by the South African Council for Natural Scientific Professions (SACNASP) in 2012. He has vast experience in university education and has worked for five (5) institutions of higher learning. The institutions are Zimbabwe Open University (ZOU), Chinhoyi University of Technology (CUT), University of South Africa (UNISA), University of KwaZulu-Natal (UKZN) and North-West University (NWU). He is currently Professor at NWU. In addition to teaching, Professor Munapo was in charge of research activities in the faculty and had the chance to manage over 100 doctoral students, over 800 master’s students. He has successfully supervised/co-supervised ten doctoral students and over 20 master’s students to completion. He has published over 100 research articles. Of these publications, one is a book, several are book chapters and conference proceedings, and the majority are journal articles. In addition, he has been awarded the North-West University Institutional Research Excellence Award (IREA) thrice, is Editor of a couple of journals, has edited several books and is Reviewer of a number of journals. He is Member of the Operations Research Society of South Africa (was ORSSA—Executive Committee Member in 2012 and 2013), South African Council for Natural Scientific Professions (SACNASP) as Certified Natural Scientist, European Conference on Operational Research (EURO) and the International Federation of Operations Research Societies (IFORS). In addition, he is Member of the organizing committee for ICO conference held every year. Professor Jose Antonio Marmolejo Panamerican University, Mexico Email: [email protected] Professor Jose Antonio Marmolejo is Professor at Panamerican University, Mexico. His research is on operations research, large-scale optimization techniques, computational techniques and analytical methods for planning, operations and control of electric energy and logistic systems. He received his Doctorate in Operations Research (Hons) at National Autonomous University of Mexico. At present, He has the third highest country-wide distinction granted by the Mexican National System of Research Scientists for scientific merit (SNI Fellow, Level 1).
xxiv
Editors
He is Member of the Network for Decision Support and Intelligent Optimization of Complex and Large Scale Systems and Mexican Society for Operations Research. He has co-authored research articles in science citation index journals, conference proceedings, presentations and chapters.
Sustainable Artificial Intelligence Applications
Low-Light Image Enhancement with Artificial Bee Colony Method Anan Banharnsakun(&) Computational Intelligence Research Laboratory (CIRLab), Computer Engineering Department, Faculty of Engineering at Sriracha, Kasetsart University Sriracha Campus, Chonburi 20230, Thailand [email protected]
Abstract. Images taken in low-light environments tend to show incomplete detail because most of the information is masked in low-visibility areas, which is the major part that deteriorates the image quality. Improving the clarity of the image to reveal the complete detail in the image remains a challenging task for researchers. Moreover, a good quality image is essential for image processing tasks in various fields such as medical imaging, remote sensing, and computer vision applications. To improve the visual quality of low-light images, an effective image enhancement technique based on optimizing gamma correction by using the artificial bee colony algorithm is proposed in this work. Quality evaluation methods for enhanced images obtained from the proposed technique are detailed, and comparisons of the proposed technique with other recent techniques are presented. Experiments show that the effectiveness of the proposed technique can be serving as an alternative method for low light image enhancement. Keywords: Artificial Bee Colony (ABC) Image contrast enhancement Gamma correction Entropy Grayscale image
1 Introduction Since contrast is one of many factors that are important in determining image quality, enhancement of image contrast is thus one of the most important processes in image processing, which is used in many fields of science and engineering, such as medical image analysis for therapeutic, aerial image processing, remote sensing, and computer vision applications [1]. The good quality of an image resulting from proper contrast helps humans to better perceive and understand the image and also makes it easier to take advantage of the image in other automated image processing tasks. However, it is often found that the resulting image has too low or too high contrast when the image is captured from an under-lit or overexposed environment. Thus, enhancing image contrast has been a much attractive and challenging task for researchers in recent times. Over the past decades, many methods have been proposed, such as gray transformation methods, histogram equalization methods, and frequency-domain methods [2].Image enhancement method using adaptive sigmoid transfer function was proposed by Srinivas and Bhandari [3] to preserve the naturalness and bright region information © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 3–13, 2022. https://doi.org/10.1007/978-3-030-93247-3_1
4
A. Banharnsakun
effectively with minimum distortions of low light images. To improve the overall visual quality of images, the contrast enhancement approach using texture regions-based histogram equalization was introduced by Singh et al. [4]. Their idea is based on suppressing the impact of pixels in non-textured areas and exploiting texture features for the computation of histogram in the process of histogram equalization. A contrast enhancement method based on dual-tree complex wavelet transform was presented by Jung et al. [5]. A logarithmic function was employed in their work for global brightness enhancement based on the nonlinear response of human vision to luminance and the local contrast was enhanced by contrast limited adaptive histogram equalization in lowpass sub-bands to make image structure clearer. Although the existing algorithms can effectively enhance low-light images and achieve good results, they all have certain disadvantages [6], such as loss of detail, color distortion, or high computational complexity, as well as they cannot guarantee the performance of a vision system in a low-light environment. Thus, developing an effective method for low-light image enhancement still remains a challenge. Over the past two decades, biology-inspired algorithms in previous research have shown great potential to deal with many problems in science and engineering [7, 8]. Particularly, there are a number of techniques that use biologically inspired algorithms to deal with problems in the image enhancement domain [9]. A contrast enhancement method based on genetic algorithm (GA) was proposed by Hashemi et al. [10]. Their proposed method is based on using a simple chromosome structure and genetic operators to increase the visible details and contrast of low illumination images. To increase the information content and enhance the details of an image, swarm intelligence-based particle swarm optimization (PSO) was employed by Kanmani and Narsimhan [11] in order to estimate an optimal gamma value in the gamma correction approach. A bat algorithm for optimizing the control parameters of contrast stretching transformation was introduced by Asokan et al. [12] to preserve the brightness levels in the satellite image processing. However, no specific technique satisfies all the need for image enhancement, so the need to create different algorithmic approaches continues to motivate researchers to propose new algorithms to achieve solutions more efficient to enhance the quality of the image. The Artificial Bee Colony (ABC) method proposed by Karaboga [13] is one of many popular methods used to find optimal solutions to numerical optimization problems [14]. The ABC method mimics the natural process of obtaining good quality food for bees. Previous research [15–17] has shown that the ABC method can be used to find the optimal solutions to a wide range of optimization problems with more efficiency and effectiveness as compared to other methods. In this work, we consider the image enhancement as an optimization problem and solved using the ABC method. The contribution of this work is to show that the ABC method, a simple and efficient method based on imitation from bee foraging in nature, can apply and serve as a useful method in the image enhancement domain. The remainder of the paper is organized as follows. The background and knowledge, including image contrast and the artificial bee colony algorithm, are briefly described in Sect. 2. Enhancement of image contrast by using ABC is proposed in Sect. 3. Experimental settings and results are presented and discussed in Sect. 4. Finally, Sect. 5 concludes this paper.
Low-Light Image Enhancement with Artificial Bee Colony Method
5
2 Image Contrast Contrast [18] is the difference in luminance (darkest and lightest) or color presented in an image that makes it possible to distinguish elements in the image within the same field of view. An image with the proper contrast allows all the details contained in the image to be seen. The example of an image in a low contrast compared with high contrast can be illustrated in Fig. 1.
Fig. 1. A low contrast (left) and high contrast (right) image.
Let G be the gray level that specifies the size of the co-occurrence matrices and the example of co-occurrence matrix Md,h (i,j) obtained from spatial distance parameter d = 1; then, angles h = 0° can be constructed, as shown in Fig. 2. Let md,h (i,j) be the number of occurrences of gray values, with i and j being neighbors with the distance d in the direction of h; thus, the probability distributions (Pij) of an entry md,h (i,j) in Md,h (i,j) can be expressed by Eq. (1). md;h ði; jÞ Pij ¼ G1 G1 P P md;h ði; jÞ
ð1Þ
i¼0 j¼0
Contrast is a variance of the gray level determined by measuring the intensity between a pixel and its neighbor over the whole image. It can be calculated by Eq. (2). Contrast ¼
G1 XG1 X i¼0
j¼0
ði jÞ2 Pij
ð2Þ
6
A. Banharnsakun
0
1
2
3
4
5
6
7
2
1
0
3
3
0
0
0
0
0
1
0
0
0
0
5
4
2
4
2
1
1
1
1
0
0
3
0
0
0
7
3
2
1
4
7
2
0
3
1
0
1
0
0
0
1
4
6
3
2
2
3
1
0
2
2
0
1
1
0
5
1
1
4
3
6
4
0
0
2
2
0
0
1
1
6
4
3
3
5
4
5
0
1
0
0
2
0
0
0
6
0
0
0
1
1
0
0
0
7
0
0
0
1
0
0
0
0
image size 6×6
Co-occurrence matrix M1,0 (i,j) size 8×8 Fig. 2. Example of co-occurrence matrix construction [19], Left: Matrix representation of grayscale image size 6 6 with gray levels 0 to 7 (G = 8), Right: Co-occurrence matrix M1,0 (i, j), size 8 8
The classical methods used to enhance the image contrast based on the histogram equalization technique have been proposed in previous literature. However, they are not providing satisfying results for images that suffer from gamma distortion [20]. Gamma correction [21], one of the histogram modification techniques, is considered as an appropriate method to solve the gamma distortion issue. The transformed gamma correction (TGC) of an image is calculated by Eq. (3). TGC ¼ Imax ð
Iin Imax
Þc
ð3Þ
where Iin is an actual intensity value of the input image and Imax is the maximum intensity value of the input image. The intensity value of each pixel is transformed using Eq. (3) by substituting the gamma value (c). The gamma value thus ranges between 0 and infinity. A gamma value of 1 means that the resulting output image is the same as the input image. If the gamma value is less than 1.0, the image will be brighter and a gamma value greater than 1.0 will darken the image. However, using fixed gamma values for different types of images will show the same change in intensity. Therefore, it is necessary to select the optimal gamma value depending on the image in order to obtain a better quality image. Optimizing gamma value can thus be considered as one of the optimization problems.
Low-Light Image Enhancement with Artificial Bee Colony Method
7
3 Enhancement of Image Contrast by Using ABC In order to discover the optimal gamma value in the image enhancement process effectively, the use of ABC method for finding the optimal gamma value is proposed. The value of the gamma is considered as a parameter that will be optimized in the optimization process of the proposed ABC method. In other words, the objective is to find the optimal gamma value that maximizes the fitness function, as proposed in Eq. (4). The applied ABC algorithm is illustrated in Fig. 3. argt maxðFitnessÞ ¼
1 þ E 0:01ðSF 4EÞ2 2
ð4Þ
where SF is spatial frequency and E is the entropy of the image. Spatial frequency (SF) [22], which is used to measure the overall activity level of an image and can be used to reflect the clarity of an image [23]. For M x N image block I, with gray values I(i,j) at position (i,j), the SF is defined as follows: SF ¼
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðRFÞ2 þ ðCFÞ2 þ ðMDFÞ2 þ ðSDFÞ2
ð5Þ
where RF, CF, MDF, SDF are the four first-order gradients along four directions defined as: vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u M X N u 1 X RF ¼ t ½Iði; jÞ Iði; j 1Þ2 M N i¼1 j¼2 vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u N X M u 1 X CF ¼ t ½Iði; jÞ Iði 1; jÞ2 M N j¼1 i¼2 vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u M X N u 1 X MDF ¼ twd ½Iði; jÞ Iði 1; j 1Þ2 M N i¼2 j¼2 vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u N 1 X M u 1 X SDF ¼ twd ½Iði; jÞ Iði 1; j þ 1Þ2 M N j¼1 i¼2
ð6Þ
ð7Þ
ð8Þ
ð9Þ
where the distance weight wd is p1ffiffi2. The entropy (E) of the image is defined as follows: E¼
255 X
Pi log2 ðPi Þ
i¼0
where Pi is the probability of occurrence of ith intensity of the image.
ð10Þ
8
A. Banharnsakun
Start
Initialize the solutions
Evaluate fitness of solution
Evaluate fitness of solution
Update new solutions for employed bees
Each onlooker bee selects solution from employed bees and updates it
Re-generate new solution for scout bee Yes Abandoned solution? No
No End
Show global solution
Yes Criterion satisfied?
Fig. 3. ABC algorithm flowchart for finding the optimal gamma value in image enhancement
As seen in Fig. 3, first, the initial solutions (gamma value) which are treated as the food source positions are generated by randomization for the bee agents. After the food source positions are generated, the bee agents in the artificial bee colony algorithm will perform three major tasks including updating feasible food source positions by employed bees, selecting feasible food source positions by onlooker bees, and avoiding further unimproved quality of food sources by scout bees. During the first task, the employed bee will search for a new food source position by using Eq. (11). This updating is based on comparing their own food source position with their other neighborhood bees of the previously selected food source position. vij ¼ xij þ /ij xij xkj
ð11Þ
where vij is a new feasible solution that is modified from its previous solution value (xij) based on a comparison with a randomly selected position from its neighboring solution (xkj), /ij is a random number between [-1,1] that is used to randomly adjust the old solution to become a new solution in the next iteration, and k 2 {1,2,3,..,SN} ^ k 6¼ i and j 2 {1,2,3,..,D} are randomly chosen indexes. The difference between xij and xkj is a difference of position in a particular dimension. The old food source position in an employed bee’s memory will be replaced by a new candidate food source position if the new position has a better fitness value. Employed bees will return to their hive and share the fitness value of their new food sources with the onlooker bees. In the second task, each onlooker bee selects one of the proposed food sources depending on the fitness value obtained from the employed bees. The probability that a proposed food source will be selected can be obtained from Eq. (12) below: Pi ¼
fiti SN P
ð12Þ
fiti
i¼1
where fiti is the fitness value of the food source i, which is calculated by using Eq. (4).
Low-Light Image Enhancement with Artificial Bee Colony Method
9
The probability of a proposed food source being selected by the onlooker bees increases as the fitness value of the food source increases. After the food source is selected, the onlooker bees will go to the selected food source and select a new candidate food source position in the neighborhood of the selected food source. The new candidate food source can be calculated and expressed by Eq. (11). In the third task, any food source position that does not have an improved fitness value will be abandoned and replaced by a new position that is randomly determined by a scout bee. This helps avoid suboptimal solutions. The new random position chosen by the scout bee will be calculated by Eq. (13) below: xij ¼ xmin þ rand½0; 1ðxmax xmin j j j Þ;
ð13Þ
where xmin and xmax are the lower bound and the upper bound of the food source j j position in dimension j, respectively. The number of iterations is defined as a termination criterion. The three major tasks described above will be repeated until the number of iterations equals the determined value.
4 Experimental Settings and Results In this section, the performance evaluation of the proposed method for enhancing image contrast is presented. In order to test the effectiveness of the proposed method, the other approaches based on biologically inspired algorithms designed for image enhancement, including the genetic algorithm (GA) [10], the particle swarm optimization (PSO) [11], and the bat algorithm (BA) [12] were used for comparison with the image enhancement results that were obtained from a number of different images. The experiment was conducted on the standard image dataset obtained from the LowLight dataset (LOL) [24] as shown in Fig. 4. The contrast and the measure of enhancement (EME) [25] are used as measurement indicators in comparison to the efficacy of the proposed method and other aforementioned methods. It can be interpreted that the greater the contrast and the EME mean the better the image enhancement.
(a)
(b)
(c)
(d)
Fig. 4. LOL image set: (a) LOL1, (b) LOL2, (c) LOL3, (d) LOL4
10
A. Banharnsakun
All methods in this experiment were programmed in C++, and all experiments were run on a PC with an Intel Core i7 CPU, 2.8 GHz and 16 GB memory. The number of iterations for each method was set to 50. For the ABC methods, the number of employed and onlooker bees was set to 20. For the parameter settings of the PSO method, the number of particles was 20, and the parameters used in the PSO were defined as: c1 = c2 = 2, x = 0.7. For the parameter settings of the GA method, the number of population (NP) was 20, the crossover probability (CR) was 0.3, and the mutation rate (MR) was 0.15. For the parameter settings of the BA method, the number of bats was 20, and the parameters used in the BA were defined as: fmax = 2, fmin = 0, a = c = 0.9. Note that these parameter settings were found to be appropriate for our image sets in the preliminary study of this work. Tables 1 and 2 show that the proposed method yields higher average contrast values than the GA, the PSO, and the BA methods while the average EME results produced by the proposed method also give higher values than the aforementioned method. This indicates that the best quantitative evaluation results can be achieved by our proposed method. The improvement of the average contrast value using the proposed method when compared to the GA, the PSO, and the BA methods was 31.46%, 26.26%, and 22.44%, respectively and the improvement of the average EME value using the proposed method when compared to the GA, the PSO, and the BA methods was 5.64%, 4.43%, and 3.19%, respectively. Figure 5 illustrates a clear comparison of the results obtained from the various algorithms being presented. Table 1. Proposed and existing methods contrast comparison with the LOL image set Image LOL1 LOL2 LOL3 LOL4 Average
GA 0.0471 0.2589 0.1525 0.1821 0.1602
PSO 0.0499 0.2615 0.1648 0.1911 0.1668
BA 0.0739 0.2652 0.1623 0.1867 0.1720
Proposed ABC 0.2043 0.2710 0.1692 0.1979 0.2106
Table 2. Proposed and existing methods EME comparison with the LOL image set Image LOL1 LOL2 LOL3 LOL4 Average
GA 88.77 28.43 15.62 25.23 39.51
PSO 88.91 28.75 16.44 25.76 39.97
BA 89.17 29.21 17.13 26.28 40.45
Proposed ABC 91.85 30.80 17.52 26.77 41.74
Low-Light Image Enhancement with Artificial Bee Colony Method
11
As shown in Fig. 5, the quality of results obtained from the image enhancement processed by using the proposed method is noticeably higher than other methods and it can be seen that the detail in the enhanced image obtained by the proposed method is clearly visible more than the aforementioned methods. In addition, all of the enhanced images of the LOL image set processed by the proposed ABC method are illustrated in Fig. 6.
(a)
(b)
(c)
(d)
Fig. 5. Enhanced LOL1 image yielded from: (a) GA, (b) PSO, (c) BA, (d) proposed ABC
(a)
(b)
(c)
(d)
Fig. 6. Enhanced images of the LOL image set by using proposed ABC method: (a) LOL1, (b) LOL2, (c) LOL3, (d) LOL4
5 Conclusions In this work, image enhancement using an ABC-based gamma correction method is proposed. Success in developing an effective method for improving low-light image, in which the parameter of the gamma correction is optimized in a supervised process by the proposed ABC technique, is the major contribution of this work. In our experiments, a detailed benchmarking between the proposed technique and other methods using biologically inspired based algorithms, including the GA, the PSO, and the BA methods is presented. The experiment was performed on the LOL image data set. It can be found that the proposed technique is a highly effective method of delivering good results in terms of contrast and the EME. Therefore, it can be concluded that the proposed ABC method can be an option used to improve the low-light image.
References 1. Gu, K., Zhai, G., Lin, W., Liu, M.: The analysis of image contrast: From quality assessment to automatic enhancement. IEEE Trans. Cybern. 46(1), 284–297 (2015) 2. Park, S., Kim, K., Yu, S., Paik, J.: Contrast enhancement for low-light image enhancement: A survey. IEIE Trans. Smart Process. Comput. 7(1), 36–48 (2018)
12
A. Banharnsakun
3. Srinivas, K., Bhandari, A.K.: Low light image enhancement with adaptive sigmoid transfer function. IET Image Proc. 14(4), 668–678 (2019) 4. Singh, K., Vishwakarma, D.K., Walia, G.S., Kapoor, R.: Contrast enhancement via texture region based histogram equalization. J. Mod. Opt. 63(15), 1444–1450 (2016) 5. Jung, C., Yang, Q., Sun, T., Fu, Q., Song, H.: Low light image enhancement with dual-tree complex wavelet transform. J. Vis. Commun. Image Represent. 42, 28–36 (2017) 6. Wang, W., Wu, X., Yuan, X., Gao, Z.: An experiment-based review of low-light image enhancement methods. IEEE Access 8, 87884–87917 (2020) 7. Yang, X.S.: Nature-inspired optimization algorithms: challenges and open problems. J. Comput. Sci. 46, 101104 (2020) 8. Tzanetos, A., Dounias, G.: Nature inspired optimization algorithms or simply variations of metaheuristics? Artif. Intell. Rev. 54(3), 1841–1862 (2020). https://doi.org/10.1007/s10462020-09893-8 9. Dhal, K.G., Ray, S., Das, A., Das, S.: A survey on nature-inspired optimization algorithms and their application in image enhancement domain. Arch. Comput. Methods Eng. 26(5), 1607–1638 (2019) 10. Hashemi, S., Kiani, S., Noroozi, N., Moghaddam, M.E.: An image contrast enhancement method based on genetic algorithm. Pattern Recogn. Lett. 31(13), 1816–1824 (2010) 11. Kanmani, M., Narsimhan, V.: An image contrast enhancement algorithm for grayscale images using particle swarm optimization. Multimed. Tools Appl. 77(18), 23371–23387 (2018). https://doi.org/10.1007/s11042-018-5650-0 12. Asokan, A., Popescu, D.E., Anitha, J., Jude Hemanth, D.: Bat algorithm based non-linear contrast stretching for satellite image enhancement. Geosciences 10(2), 78 (2020). https:// doi.org/10.3390/geosciences10020078 13. Karaboga, D.: An Idea Based on Honey Bee Swarm for Numerical Optimization. Technical Report-TR06, Erciyes University, Engineering Faculty, Computer Engineering Department, Turkey (2005) 14. Karaboga, D., Akay, B.: A comparative study of artificial bee colony algorithm. Appl. Math. Comput. 214(1), 108–132 (2009) 15. Karaboga, D., Gorkemli, B., Ozturk, C., Karaboga, N.: A comprehensive survey: artificial bee colony (ABC) algorithm and applications. Artif. Intell. Rev. 42(1), 21–57 (2012). https:// doi.org/10.1007/s10462-012-9328-0 16. Banharnsakun, A.: Artificial bee colony algorithm for solving the knight’s tour problem. In: Proceedings of the International Conference on Intelligent Computing & Optimization 2018, pp. 129–138 (2018) 17. Banharnsakun, A.: Feature point matching based on ABC-NCC algorithm. Evol. Syst. 9(1), 71–80 (2017). https://doi.org/10.1007/s12530-017-9183-y 18. Manjunath, B.S., Ma, W.Y.: Texture features for browsing and retrieval of image data. IEEE Trans. Pattern Anal. Mach. Intell. 18(8), 837–842 (1996) 19. Banharnsakun, A.: Artificial bee colony algorithm for content-based image retrieval. Comput. Intell. 36(1), 351–367 (2020) 20. Amiri, S.A., Hassanpour, H.: A preprocessing approach for image analysis using gamma correction. Int. J. Comput. Appl. 38(12), 38–46 (2012) 21. Huang, S.C., Cheng, F.C., Chiu, Y.S.: Efficient contrast enhancement using adaptive gamma correction with weighting distribution. IEEE Trans. Image Process. 22(3), 1032–1041 (2012) 22. Eskicioglu, A.M., Fisher, P.S.: Image quality measures and their performance. IEEE Trans. Commun. 43(12), 2959–2965 (1995) 23. Li, S., Kwok, J.T., Wang, Y.: Combination of images with diverse focuses using the spatial frequency. Information fusion 2(3), 169–176 (2001)
Low-Light Image Enhancement with Artificial Bee Colony Method
13
24. Wei, C., Wang, W., Yang, W., Liu, J.: Deep retinex decomposition for low-light enhancement. In: Proceedings of British Machine Vision Conference 2018, pp. 127–136 (2018) 25. Agaian, S.S., Panetta, K., Grigoryan, A.M.: A new measure of image enhancement. In: Proceedings of IASTED International Conference on Signal Processing & Communication, pp. 19–22 (2000)
Optimal State-Feedback Controller Design for Tractor Active Suspension System via Lévy-Flight Intensified Current Search Algorithm Thitipong Niyomsat1, Wattanawong Romsai2, Auttarat Nawikavatan3, and Deacha Puangdownreong3(&) 1
Department of Industrial Engineering, Rajapark Institute, Bangkok, Thailand 2 National Telecom Public Company Limited: NT, Bangkok, Thailand 3 Department of Electrical Engineering, Southeast Asia University, Bangkok, Thailand [email protected]
Abstract. This paper proposes the optimal state-feedback controller design for the tractor active suspension system via the Lévy-flight intensified current search (LFICuS) algorithm. As one of the newest and most efficient metaheuristic optimization search techniques, the LFICuS algorithm is formed from the behavior of the electrical current in the electric networks associated with the random drawn from the Lévy-flight distribution, the adaptive radius (AR) mechanism and the adaptive neighborhood (AN) mechanism. In this paper, the LFICuS algorithm is applied to optimally design the state-feedback controller for the tractor active suspension system to eliminate the transmitted vibrations to the driver’s cabin caused by road roughness. Results obtained by the LFICuS algorithm will be compared with those obtained by the conventional pole-placement method. As results, the LFICuS algorithm can successfully provide the optimal state-feedback controller for the tractor active suspension system. The tractor active suspension system controlled by the state-feedback controller designed by the LFICuS algorithm yields very satisfactory response with smaller oscillation and faster regulating time than that designed by the poleplacement method, significantly. Keywords: State-feedback controller Tractor active suspension Lévy-flight intensified current search Metaheuristic optimization
1 Introduction Thailand is one of the agricultural countries located in Southeast Asia region. It has an area of approximately 513,000 km2 (approximately 321 million Rais) and a population of over 66 million people. More than 9 million Thais are farmers who usually use tractors in their fields of rice, corn, cassava, sugar cane, palm and rubber, etc., on area of approximately 220,542 km2 (approximately 138 million Rais) or approximately 43% of overall area [1]. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 14–23, 2022. https://doi.org/10.1007/978-3-030-93247-3_2
Optimal State-Feedback Controller Design for Tractor Active Suspension System
15
A tractor is one of the heavy-duty machineries and vehicles commonly used in the farms. Using tractors for a long period harm drivers due to great-continuously vibration. Mechanical vibration is transmitted to tractor’s driver caused by the unevenness of the road or soil profile. Also, moving elements within the machine or devices can cause physiological and psychological harm effects. Normally, the endurance limit of human body in vertical acceleration is in the range of 4–8 Hz and root-mean-square (RMS) acceleration is less than 1 m/s2 [2–4]. During ploughing and harrowing periods, farm tractors’ drivers are subjected to such vibrations. With continuous exposure to whole body vibration, it can cause severe discomfort and injuries including low back and pain disorders, hernia, abscess, colon, testicular and prostate cancers. Vehicle active suspension systems are needed in modern tractors to improve both ride quality and handling vibration performance. Following the literatures, the active suspension control system for road vehicles has been quite challenging over the last 20 years [5]. Various control strategies have been proposed for such the systems, such as linear quadratic regulation (LQR) [6], robust control [7], sliding mode control [8] and statefeedback control [9]. Recently, the control system design has been changed from the conventional method to modern optimization using the potential metaheuristic search technique as an optimizer [10]. One of the newest and most efficient metaheuristic optimization search techniques is the Lévy-flight intensified current search (LFICuS) algorithm formed from the flowing behavior of the electrical current in the electric networks [11]. The LFICuS algorithm utilizes the random drawn from the Lévy-flight distribution for generating the elite solutions in each search round. In addition, it possesses the adaptive radius (AR) mechanism and the adaptive neighborhood (AN) mechanism to speed up the search process. The LFICuS algorithm has performed the effectiveness against many benchmark functions [11] and applied to design the proportional-integralderivative (PID) controller for the car active suspension system [12], the PID controller for the brushless direct current (BLDC) motor speed control system [13] and the PID controller for antenna azimuth position control system [14]. In this paper, the LFICuS algorithm is applied to design the optimal state-feedback controller for the tractor active suspension system based on the state-space model representation and modern optimization. This paper consists of five sections. After an introduction is given in Sect. 1, the dynamic model of the tractor active suspension system is described in Sect. 2. Problem formulation of the LFICuS-based statefeedback controller design optimization is illustrated in Sect. 3. Results and discussions are detailed in Sect. 4. Finally, conclusions are provided in Sect. 5.
2 Dynamic Model of Tractor Active Suspension System The tractor active suspension system can be represented by the schematic diagram as shown in Fig. 1. The front and rear suspensions are lumped by single wheel and axle connected to the quarter portion of the tractor body through an active spring-damper combination, where M1 is the tractor mass, M2 is the suspension mass, xs is the displacement of tractor body, xw is the displacement of the suspension mass, k1 and k2 are the spring coefficients, and b1 and b2 are the damper coefficients, respectively.
16
T. Niyomsat et al.
Fig. 1. Schematic diagram of tractor active suspension system (modified from [9]).
Based on the Newton’s law, the equation of vertical motion of the tractor active suspension system shown in Fig. 1 can be formulated as expressed in (1) and (2), where u is the control force from the actuator regarded as the input of the system and r is the road disturbance. d 2 xs 1 dxw dxs ¼ b1 þ k1 ðxw xs Þ þ u M1 dt2 dt dt
ð1Þ
d 2 xw 1 dxs dxw dr dxw ¼ b1 þ k1 ðxs xw Þ þ b2 þ k2 ðr xw Þ u ð2Þ M2 dt dt2 dt dt dt 3 2 M1 ðb1 þ b2 Þ þ b1 M2 x_ 1 M1 M2 6 x_ 7 6 1 6 27 6 6 6 7¼6 4 x_ 3 5 4 0 2
x_ 4
y1 y2
M1 ðk1 þ k2MÞ þ1 MM2 2 k1 þ b1 b2 0
0 "
¼
0 M1 b2 M1 M 2
M1 þ M2 M1 M2 M 1 k2 M1 M2
b2 M 1 M2
0
1 0 2 3 x # 1 k2 6x 7 M1 M 2 6 2 7 6 7 0 4 x3 5 x4
b1 kM2 1þMb22 k1 0 0 1
32 3 2 3 9 > x1 1 Mk11 kM22 > 76 7 6 7 > > > x2 7 6 0 7 u > 0 7 > 6 76 7 þ 6 7 > > 74 x 5 4 0 5 r > > 0 5 3 > > > = 0 x4 0 > > > > > > > > > > > > > > ;
ð3Þ
Regarding to the modern control system, the state-space model representation can be formulated from (1) and (2) as stated in (3), where x1 = xs, x2 = x_ s , x3 = y and x4 = y_ are the state variables, [u r]T are the input variables, y = (xs – xw) is the output variable, y1 = y is the output depending on the control force u (r = 0) and y2 = y is the output depending on the road disturbance r (u = 0) [9].
Optimal State-Feedback Controller Design for Tractor Active Suspension System
17
3 Problem Formulation From the state-space model of the tractor active suspension system in (3), a new state R variable x5 ¼ ydt is added into the system model to achieve zero dynamic. Once the system response reaches the steady-state interval, this integral action will produce zero steady-state error. The closed-loop state-space model representation for the full-state feedback controller is performed in (4) [9], where ½x1 ; x2 ; x3 ; x4 ; x5 T ¼ ½xs ; x_ s ; y ¼ R ðxs xw Þ; y_ ; ydtT . The model in (4) shows that after the tractor tire is subjected to the road disturbance, x3 = y = (xs – xw) will ultimately reach to equilibrium point. 39 0 > > > 2 3 B6 b1 b2 > þ M1 M2 Mb11 bM22 Mk11 07 x_ 1 > B6 M1 M2 7> > > B 7 6 6 x_ 7 B6 > 7> > 6 2 7 B6 b 2 b1 b1 b2 > 7 0 1 0 6 7 ¼ B6 M 2 > M1 M2 M2 7> 4 x_ 3 5 B6 > 7> > B6 k2 7> > 0 Mk11 Mk12 Mk22 0 05> @4 M 2 x_ 4 > > > > > 0 0 1 0 0 > > > 3 1 2 > > > 0 0 > > 2 3 b b 1 C x1 > 6 > 1 2 7 > > C 6 M1 M2 7 M1 > > 7 C 6 6 7 1 x b2 7 27> > C 6 6 0 M2 7 6 ½ K1 K2 K3 K4 K5 C6 7 > > > 7 0 C4 x3 5 > 6 > > 7 C 6 M1 þ M2 Mk22 5 A x = 4 M1 M2 4 > 0 0 > > > 3 2 > > 0 0 > > > > b1 b2 7 1 6 > 6 M1 M1 M2 7 > > > 7 6 > u > b 2 7 6 > 0 > þ6 M2 7 > 7 r > 6 > > 6 M1 þ M2 k2 7 > > 4 M1 M2 > M2 5 > > > > > 0 0 > 2 3> > > x1 > > > 6x 7> > 6 27> > y ¼ ½ 0 0 1 0 0 6 7 > > 4 x3 5 > > > > ; x4 02
0
1 0
b21 M12
b21
0 þ
0 Mb11
ð4Þ The control objective of the tractor active suspension system is to create control force u from the actuator in such a way that the output y will be able to regulate the road disturbance with smallest overshoot and shortest regulating time. With this control objective, the sum-squared error (SSE) between the input r (r = 0) and the output y is set as the objective function f(K) of the LFICuS-based state-feedback controller design optimization as expressed in (5), where N is the number of data. The objective function f(K) in (5) will be minimized by the LFICuS algorithm by searching for the optimal
18
T. Niyomsat et al.
five values of gain K = [K1, K2, K3, K4, K5] in (4) within their corresponding boundaries and giving very satisfactory responses to meet the inequality constraints as stated in (6), where Mp is the maximum percent overshoot, Mp_max is the maximum allowance of Mp, treg is the regulating time, treg_max is the maximum allowance of treg, ess is the steady-state error, ess_max is the maximum allowance of ess, K1_min and K1_max are the boundaries of K1, K2_min and K2_max are the boundaries of K2, K3_min and K3_max are the boundaries of K3, K4_min and K4_max are the boundaries of K4, K5_min and K5_max are the boundaries of K5, respectively. Min
f ðKÞ ¼
N X
½ri yi 2 ¼
i¼1
Subject to Mp Mp max ; treg treg max ; ess ess max ; K1 min K1 K1 K2 min K2 K2 K3 min K3 K3 K4 min K4 K4 K5 min K5 K5
N X
½yi 2
ð5Þ
i¼1
max ;
9 > > > > > > > > > > =
max ; > >
> > max ; > > > > max ; > > ;
ð6Þ
max
The LFICuS algorithm [11–14] uses the random drawn from the Lévy-flight distribution to generate the neighborhood members as the elite solutions in each iteration. The Lévy-flight random distribution L can be calculated by (7), where s is step length, k is an index and C(k) is the Gamma function as expressed in (8). Also, the AR and AN mechanisms are conducted in the LFICuS algorithm by reducing the search radius R and the number of neighborhood members n to speed up the search process. The LFICuS algorithm for designing an optimal state-feedback controller of the tractor active suspension system can be described step-by-step as follows. kCðkÞ sinðpk=2Þ 1 1þk p s Z 1 CðkÞ ¼ tk1 et dt
L
ð7Þ ð8Þ
0
Step-0
Step-1 Step-2
Initialize the objective function f(K) in (5) and constraint functions in (6), search space X = [K1_min, K1_max], [K2_min, K2_max], [K3_min, K3_max], [K4_min, K4_max] and [K5_min, K5_max], memory lists (ML) W, Ck and N = ∅, maximum allowance of solution cycling jmax, number of initial solutions N, number of neighborhood members n, search radius R = X, k = j = 1. Uniformly random initial solution Xi = {K1, K2, K3, K4 and K5} within X. Evaluate f(Xi) via (5) and (6), then rank and store Xi in W. Let x0 = Xk as selected initial solution. Set Xglobal = Xlocal = x0.
Optimal State-Feedback Controller Design for Tractor Active Suspension System
Step-3
Step-4 Step-5 Step-6 Step-7 Step-8 Step-9 Step-10
19
Generate new solutions xi = {K1, K2, K3, K4 and K5} by Lévy-flight random in (7) and (8) around x0 within R. Evaluate f(xi) via (5) and (6), and set the best one as x*. If f(x*) < f(x0), keep x0 into Ck, update x0 = x* and set j = 1. Otherwise, keep x* into Ck and update j = j+1. Activate AR mechanism by R = qR, 0 < q < 1 and invoke AN mechanism by n = an, 0 < a < 1. If j jmax, go back to Step-3. Update Xlocal = x0 and keep Xglobal into N. If f(Xlocal) < f(Xglobal), update Xglobal = Xlocal. Update k = k+1 and set j = 1. Let x0=Xk as selected initial solution. If k N, go back to Step-2. Otherwise, stop the search process and report the best solution Xglobal = {K1, K2, K3, K4 and K5} found.
4 Results and Discussions Referring to model in (4), the numerical values of the suspension model parameters of Kubota M110X tractor [9] are conducted as follow, M1 = 700 kg, M2 = 90 kg, k1 = 62,000 N/m, k2 = 570,000 N/m, b1 = 500 N.s/m and b2 = 22,500 N.s/m. The state-feedback controller for the Kubota M110X tractor active suspension system was designed by the pole-placement method [9] as stated in (9). K ¼ ½ 250
500 300 200
150
ð9Þ
To design an optimal state-feedback controller for the tractor active suspension system, the LFICuS algorithm was coded by MATLAB version 2017b (License No. #40637337) run on Intel(R) Core(TM) i7-10510U [email protected] GHz, 2.30 GHz, 16.0 GB-RAM. The road disturbances were simulated by the step function representing the step road and the sinusoidal function representing the bumpy and pothole roads. The search parameters of the LFICuS are set from the preliminary study, i.e., R (initial search radius) = X (search spaces) = [K1_min, K1_max], [K2_min, K2_max], [K3_min, K3_max], [K4_min, K4_max] and [K5_min, K5_max], step length s = 0.01, index k = 0.3, number of initial neighborhood members n = 100 and number of search directions N = 50. Each search direction will be terminated by the maximum iteration (Max_Iter) of 200. Number of states for activating the AR and AN mechanisms h = 2, state-(i): at the 100th iteration, R = 25% of X and n = 50, state-(ii): at the 150th iteration, R = 5% of X and n = 25. The constraint functions in (6) are set as stated in (10). 50-trials are run to obtain the optimal values of of gain K = [K1, K2, K3, K4, K5].
20
T. Niyomsat et al.
Subject to
9 Mp 10:00%; > > > > treg 5:00 sec; > > > > ess 0:01%; > > = 1; 000 K1 5; 000; 1; 000 K2 5; 000; > > > 1; 000 K3 5; 000; > > > > > 100 K4 1; 000; > > ; 100 K5 1; 000
ð10Þ
Once the search process stopped over 50-trial runs, the LFICuS can successfully provide the optimal state-feedback controller for the tractor active suspension system as stated in (11). The convergent rates over 50-trial runs are plotted in Fig. 2. K ¼ ½ 3; 814:22
3; 693:58
3; 172:14
127:49 271:86
ð11Þ
Fig. 2. Convergent rates of LFICuS-based state-feedback controller design over 50-trial runs.
The step responses of the tractor active suspension system without controller (passive suspension system), with the state-feedback controller designed by the poleplacement method in (9) and with the state-feedback controller designed by the LFICuS algorithm in (11) are depicted in Fig. 3, where the step road profile is plotted by thinsolid blue line. It was found that, the tractor passive suspension system (thin-dotted black line) yields great oscillation and slow response with Mp = 29.07%, treg = 9.78 s. and ess = 0.00%. The tractor active suspension system with the state-feedback controller designed by the pole-placement method in (9) (thin dash-dotted black line) gives smaller oscillation and faster response than the passive suspension system with Mp = 6.02%, treg = 4.57 s. and ess = 0.00%. For the tractor active suspension system with the state-feedback controller designed by the LFICuS algorithm (11) (thick-solid black
Optimal State-Feedback Controller Design for Tractor Active Suspension System
21
line), it provides smaller oscillation and faster response than the active suspension system with the state-feedback controller designed by the pole-placement method with Mp = 2.42%, treg = 2.16 s. and ess = 0.00%. The sinusoidal responses of the tractor active suspension controlled system are plotted in Fig. 4, where the sinusoidal road profile is plotted by thin-solid blue line. It can be observed that, the tractor passive suspension system (thin-dotted black line) yields great oscillation and slow response with Mp = 48.07%, treg = 9.69 s. and ess = 0.00%. The tractor active suspension system with the state-feedback controller designed by the pole-placement method in (9) (thin dash-dotted black line) gives smaller oscillation and faster response than the passive suspension system with Mp = 8.98%, treg = 4.53 s. and ess = 0.00%. For the tractor active suspension system with the state-feedback controller designed by the LFICuS algorithm (11) (thick-solid black line), it provides smaller oscillation and faster response than the active suspension system with the state-feedback controller designed by the pole-placement method with Mp = 7.54%, treg = 2.01 s. and ess = 0.00%. From overall results in Fig. 3 and Fig. 4, it can be noticed that the tractor active suspension system controlled by the state-feedback controller designed by the LFICuS algorithm provides very satisfactory response superior to the tractor passive suspension system and the tractor active suspension system controlled by the state-feedback controller designed by the pole-placement method, significantly.
Fig. 3. Step responses of the tractor active suspension controlled system.
22
T. Niyomsat et al.
Fig. 4. Sinusoidal responses of the tractor active suspension controlled system.
5 Conclusions The application of the Lévy-flight intensified current search (LFICuS) algorithm to design an optimal state-feedback controller for the tractor active suspension system has been proposed in this paper. Based on the modern optimization, the LFICuS algorithm utilizing the random drawn from the Lévy-flight distribution as well as AR and AN mechanisms has been applied to optimize the values of the feedback gains of the statefeedback controller to eliminate the tractor vibrations due to road roughness. From simulation results by MATLAB, the LFICuS algorithm could provide the optimal statefeedback controller for the tractor active suspension system. By comparison once the road disturbances (step and bumpy-pothole roads) are assumed to be occurred in the system, the tractor active suspension system controlled by the state-feedback controller designed by the LFICuS algorithm provided very satisfactory response with smaller oscillation and faster regulating time than that designed by the pole-placement method, significantly. For future research, applications of the LFICuS algorithm will be extended to other control system design optimization problems including the PIDA, FOPID and FOPIDA controllers for more complicated real-world systems.
References 1. National Statistical Office Homepage. http://www.nso.go.th 2. Deprez, K., Moshou, D., Anthonnis, J., De Baerdemaeker, J., Ramon, H.: Improvement of vibrational comfort on agricultural vehicles by passive and semiactive cabin suspensions. Comput. Electron. Agric. 49, 431–440 (2005) 3. Muzammil, M., Siddiqui, S.S., Hasan, F.: Physiological effect of vibrations on tractor drivers under variable ploughing conditions. J. Occup. Health 46, 403–409 (2004)
Optimal State-Feedback Controller Design for Tractor Active Suspension System
23
4. Scarlett, A.J., Price, J.S., Stayner, R.M.: Whole-body vibration: evaluation of emission and exposure levels arising from agricultural tractors. J. Terrramech. 44, 65–73 (2007) 5. Hrovat, D.: Survey of advanced suspension developments and related optimal control applications. Automatica 33(10), 1781–1817 (1997) 6. Zhen, L., Cheng, L., Dewen, H.: Active suspension control design using a combination of LQR and backstepping. In: 25th IEEE Chinese Control Conference, pp. 123–125. Harbin, Heilongjiang, China (2006) 7. Yousefi, A., Akbari, A., Lohmann, B.: Low order robust controllers for active vehicle suspensions. In: The IEEE International Conference on Control Applications, pp. 693–698, Munich, Germany (2006) 8. Chamseddine, A., Noura, H., Raharijaona, T.: Control of linear full vehicle active suspension system using sliding mode techniques. In: The IEEE International Conference on Control Applications, pp. 1306–1311. Munich, Germany (2006) 9. Shamshiri, R., Ismail, W.I.W.: Design and analysis of full-state feedback controller for a tractor active suspension: implications for crop yield. Int. J. Agric. Biol. 15, 909–914 (2013) 10. Zakian, V.: Control Systems Design: A New Framework. Springer-Verlag (2005) 11. Romsai, W., Leeart, P., Nawikavatan, A.: Lévy-flight intensified current search for multimodal function minimization. In: The 2020 International Conference on Intelligent Computing and Optimization (ICO’2020), pp. 597–606. Hua Hin, Thailand (2020) 12. Romsai, W., Nawikavatan, N., Puangdownreong, D.: Application of Lévy-flight intensified current search to optimal PID controller design for active suspension system. Int. J. Innov. Comput. Inf. Control 17(2), 483–497 (2021) 13. Leeart, P., Romsai, W., Nawikavatan, A.: PID controller design for BLDC motor speed control system by Lévy-flight intensified current search. In: The 2020 International Conference on Intelligent Computing and Optimization (ICO’2020), pp. 1176–1185. Hua Hin, Thailand (2020) 14. Romsai, W., Lurang, K., Nawikavatan, A., Puangdownreong, D.: Optimal PID controller design for antenna azimuth position control system by Lévy-flight intensified current search algorithm. In: The 18th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON 2021), pp. 858– 861. Chiang Mai, Thailand (2021)
The Artificial Intelligence Platform with the Use of DNN to Detect Flames: A Case of Acoustic Extinguisher Stefan Ivanov(&) and Stanko Stankov Department of Automation, Information and Control Systems, Technical University of Gabrovo, Hadji Dimitar 4, 5300 Gabrovo, Bulgaria [email protected]
Abstract. In practice, it is possible to combine an acoustic extinguishing of flames with their detection using artificial intelligence, which is the main aim of this article. This paper presents the possibility of using DNN (Deep Neural Network) in an autonomous acoustic extinguisher for flame detection. It is a developed robotized mobile platform that is applied to test algorithms for fire detection and evaluation of the fire source. Experimental results show that DNN can be used in the autonomous acoustic fire extinguisher. Based on the research work, it is feasible to apply multiple DNN algorithms and models in a single intelligent and autonomous acoustic fire extinguisher (a new approach to fire protection). Keywords: Deep Neural Networks (DNN) Fire detection acoustic extinguisher Low-cost sensor Machine learning
High-power
1 Introduction In addition to the danger to human and animal life and material losses, fires are a significant cause of devastation of the environment. They result in deforestation, desertification, and air pollution. It is estimated that the occurrence of fires results in 20% of the total CO2 emissions to the atmosphere. Besides, fires cause the recirculation of heavy metals and radionuclides [1]. Consequently, the efforts of scientists are aimed at finding innovative and effective ways to detect fires as soon as possible (this is a key issue especially in forests and sparsely populated areas), so that firefighting action can be taken before the fire spreads. The solutions described in [2, 3] may be useful for data transmission from hard-to-reach areas. On a global scale, satellite imaging can be used for fire detection. Chitade and Katiyar present color segmentation capabilities for segmenting satellite images. This technique may be applicable for fire detection [4]. However, a weakness of tracking fire locations using satellites is that the spatial resolution and time scale are too low, which unfortunately prevents the effective use of this knowledge on a local scale [5, 6]. This is essential because the fire caused by wind gusts spreads very quickly, causing substantial financial losses, environmental degradation, and, above all, often the loss of human life. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 24–34, 2022. https://doi.org/10.1007/978-3-030-93247-3_3
The Artificial Intelligence Platform with the Use of DNN
25
Artificial intelligence, including a subdiscipline of machine learning that does not require human control – deep learning – is helpful there. It finds its application, in particular, in places where the use of typical sensors is very difficult or impossible (especially in open spaces due to the limited range of classical sensors) [7–18]. Regardless of whether conventional sensors or artificial intelligence are applied, the primary goal is the detection of flames, which is related to the simultaneous detection of the location of the fire. A new scientific development in recent years has been the use of acoustic waves to extinguish flames. Since sound waves are not chemical, they do not pollute the environment. Low-cost intelligent sensors may be installed in the acoustic extinguisher, so that extinguishing can be started immediately after flame detection (without unnecessary time delay and without human intervention). This technique can become an element supporting the safety of industrial halls, warehouses, flammable liquid tanks (no barrier to the propagation of acoustic waves), e.g., as a stationary (permanently installed) fire extinguishing system. However, the use of portable extinguishers is also possible, but sometimes problematic. The advantage of these systems is usually a high data processing speed (less than 10 ms). Such research is currently being conducted within the framework of cooperation in Bulgaria and Poland, but, in general, research is being conducted in the United States, Europe and Asia, for example, [19–25]. This has resulted in many patents such as [26–34]. It is interesting in terms of analyzing the extinguishing capabilities of acoustic waves. Research work carried out in Central and Eastern Europe, with the support of professional services that deal with fire protection, resulted in the development of an acoustic extinguisher that can be equipped with an intelligent module for flame detection (scientific novelty). Implementing such a module may contribute to reducing to a minimum the time delay to start the firefighting action because the system can be activated automatically as soon as flames are detected (without the need for its manual activation by a human). This is particularly important in the case of extinguishing a fire, before the flames even have time to spread. During the research work, a prototype model of an autonomous high-power acoustic fire extinguisher was developed (Fig. 1). Several new patents can be found in this area [31–34].
Fig. 1. Simplified 3D model of the prototype of the autonomous acoustic fire extinguisher.
26
S. Ivanov and S. Stankov
Since the authors propose a method of flame detection using deep neural network in this paper, they do not insist on the presentation of the extinguishing system, but only on the flame detection module. Such robots may be used in crisis management [35]. To analyze the influence of various factors on the possibility of the occurrence of a given phenomenon, various models can be applied, based on real data, as is done, among others, in mathematical sciences [36–38]. This paper is divided into several sections. In Section II, it is shown that it is possible to design a system based on artificial vision systems using mobile robots that have artificial human-like visual attention. This allows them to support the work of firefighters in difficult-to-reach places. This section presents the possibilities of flame detection using deep neural networks. In this context, an intelligent module can be part of the acoustic extinguisher equipment, which is a novel scientific approach. The architecture of the neural networks used and the operation algorithm implemented on the robotized platform are presented in Section III. In Section IV, a short summary is provided, which synthesizes the most important information from the article and directions for future research.
2 The Use of Deep Neural Networks for Flame Detection Undoubtedly, in the era of recent years, intelligent computing and optimization are important in the development of society and innovative solutions. Work in this area is being carried out by many scientific and academic centers around the world [39–41]. Research on the use of deep neural networks for flame detection is part of this trend. The architecture of broadband information systems for crisis management is essential [42], as is familiarity with digital image recognition [43]. The intelligent module can be part of the acoustic extinguisher equipment. This is especially important because the acoustic system does not require human presence, and thus the human is not exposed to the influence of low-frequency acoustic field, which may cause various health problems [44, 45]. In addition to wide open space (for example, forests and wildlands), neural networks allow for flame detection in buildings, means of transport, and places where environmental conditions (ambient temperature, dust) significantly affect the effectiveness of the use of other fire detection techniques. Examples are, among others, sand dryers, foundries, and heat treatment plants. In addition to the well-known and available methods, the benefits of using deep neural networks include low cost of purchasing components and high performance (fire detection efficiency is typically over 90%). In this paper, Fig. 2 shows an example of a robot, constructed by us, detecting flames using deep neural networks. This robotized mobile platform simulates the operation of the acoustic fire extinguisher. All electronic hardware and algorithms for fire detection, motor control, fire approaching, and detection of fake fire sources can be used directly in the autonomous fire extinguisher.
The Artificial Intelligence Platform with the Use of DNN
27
Fig. 2. A flame detection robot using deep neural networks.
The developed robot uses Jetson Nano as the main platform for fire detection. A Logitech C310 USB camera is connected to the Jetson Nano and provides a resolution of 1280 720 pixels. The proposed system incorporates an LCD display for visualization of a video stream from a camera and for drawing the contours of the fire sources and flames that are found in the video signal. The control of the robot drive is performed with the help of a specially designed control board based on the ESP32 microcontroller. The board controls two DC motors and gathers data from temperature, ultrasound, gas, and flame sensors. The control board receives commands from Jeston Nano for its movements in the space and returns sensor data to Jetson Nano that can be used to better determine the distance to the fire source. To detect fire using the robotic platform, two types of neural networks were applied in the tests, which have the following advantages and disadvantages of their architectures: • SSD MobileNet – has high performance but low accuracy when searching for small objects compared to other architectures. When searching for large objects, a MobileNet SSD can have higher accuracy than R-CNN; • Mask R-CNN – based on R-CNN, it has the ability to return the location of an object and apply a mask on its pixels. Unlike SSD MobileNet, this neural network has a relatively longer response time.
3 Architecture of Neural Networks Used MobileNet is a neural network architecture designed for use in embedded applications running on a device with limited computational power. MobileNet utilizes so-called depth-wise separable filters. The SSD (Single Shot Detector) may be used to detect
28
S. Ivanov and S. Stankov
multiple objects within an image. It is based on a convolutional network which works in the following way – initially extracts feature maps, and after that utilizes the convolution filter to detect objects. The combination of MobileNet and SSD represents SSD MobileNet, which is applied in current research. The Mask R-CNN can be described as an improvement of Faster R-CNN network because it also returns the object mask of the detected object. The network initially extracts feature maps from the images, which are then passed through a Region Proposal Network (RPN), which returns the coordinates of the bounding boxes of the detected objects. Finally, using segmentation, the network generates masks on the detected objects, in our case the fire source. Neural network training data is collected from a variety of freely available sources on the Internet. The original array contains about 300 images. With the use of a ‘labeling’ tool in the image preprocessing process, the fire coordinates are localized in each image and a label is assigned. With the help of a Python script that performs rotation, translation, and scaling of images, a new array with about 3000 processed images is generated. The larger the database, the higher the expected accuracy of the neural network. For this purpose, new images are additionally generated by changing the brightness, and this new array already has about 15,000 images. This array is used in the training process of the two selected neural networks. The Transfer Learning method was used for the training of neural networks. For this purpose, a trained model of a neural network is taken, which has multiple layers and recognizes a large number of classes. The last layers, which are fully connected layers, depending on the number of classes, are cut out and replaced with a new fully connected layer, which is trained to detect fire. Two types of neural networks were trained – the architectures mentioned above: SSD MobileNet and Mask-RCNN. In the learning process, it is necessary that all images provided for training be scaled and adjusted to a size corresponding to the input layer of the neural network. The standard size of the developed system is 300x300 for MobileNet and 800x600 for Mask-RCNN. The training itself was performed on a personal computer with an NVIDIA GeForce GTX 1080 Ti video card using the TensorFlow library. The trained neural networks were tested on a set of 1000 images of fires in different fire positions and different overall illumination of the images themselves. To verify the successful training, it is necessary to examine how the neural network behaves when we have objects resembling fire in the image. Figure 3 shows an image of a fire in a closed room, as in the image there are objects with a color similar to the fire. The image shows that the recognition is successful.
The Artificial Intelligence Platform with the Use of DNN
29
Fig. 3. Detection of fire indoors.
Another important aspect of successful fire detection is the ability to detect it in daylight. Figure 4 shows an image of a burning roof, and the recognition is also successful.
Fig. 4. Detection of fire on a burning building.
Since two types of neural networks are used for fire detection, MobileNet and Mask-RCNN, differences in their way of working can be observed between them. The trained models which are used are downloaded to the Jetson Nano board. The algorithm implemented on the robotized platform is the following: Initialization of resources of Jetson Nano; Reading of image from the USB camera; Scaling of image according to the size expected by SSD MobileNet of MaskRCNN, respectively; The neural network recognizes the current image and returns the coordinates of the detected fire; Depending on the coordinates of fire, the robotized platform is rotated so that the fire is situated in the center of the images sent by the USB camera;
30
S. Ivanov and S. Stankov
The robot is moved towards the fire source; The robot decides when to stop in front of the fire source based on the video image, information from an ultrasonic sensor, and readings from two temperature sensors; The robot switches on a signal that simulates the control signal for activation of the acoustic fire extinguisher. The method is described and the experimental results are credible. When used by Jetson Nano, the speed with which they perform recognition as well as their accuracy are shown in Table 1. Table 1. Recognition speed and accuracy of SSD MobileNet vs Mask R-CNN. Heading level SSD MobileNet Mask R-CNN Speed of recognition [ms] 103 308 Accuracy [%] 79.4 96.1
The data show that the Mask R-CNN has higher recognition accuracy, at the expense of lower recognition speed. Some of the test images are taken by the USB camera onboard. In this way, it is possible to evaluate how the system behaves using the resolution of its own camera. The conclusion is that the USB camera has good quality video signal and that the real fire sources are successfully detected. Figure 5 shows an image from the robot camera, in which the actual source of the fire is recognized using a MobileNet network.
Fig. 5. Recognition of a fire source by a mobile robot.
The Artificial Intelligence Platform with the Use of DNN
31
The ability to recognize real fire from static fire images displayed in front of a camera is added to the developed mobile robot control software. The software also includes an algorithm to make a rough determination of the distance to the fire source. The developed electronic board for robot control uses two temperature sensors, as well as a flame sensor, to protect the robotic platform (and in the future the autonomous acoustic fire extinguisher) from too close positioning to the fire.
4 Conclusion A deep learning system that realizes an acoustic fire extinguisher is described and evaluated. While acoustic technology shows promise, new research is needed to improve the ability of acoustic waves to effectively extinguish flames (currently the range of this technology is about 2 m), as well as to prepare, test, and improve (if necessary) flame detection algorithms (the second issue was the subject of this article). In this paper, a mobile platform was used to detect the flame by using deep neural networks. The results obtained are encouraging and show the possibility of using deep neural networks effectively for fire detection. All algorithms and know-how achieved during the research work can be applied in the autonomous fire extinguisher. In practice, the research work continues with the use of high-power acoustic waves without exceeding safe sound pressure levels (110 dB). There is some limitation involved, due to the provision of human safe sound pressure levels. In addition, there is a need to analyze the extinguishing capabilities of acoustic waves for different substances and fuels, depending on the specified wave parameters. On the other hand, it becomes possible to extinguish flames without human intervention if both techniques (acoustic extinguishing with artificial intelligence) are combined. The advantage is that there is no time delay between the detection of the flame and the start of extinguishing. This is a new approach to fire protection that motivates the authors of this article to collaborate with other researchers in acoustic fire protection (especially from Poland due to the first high-power acoustic fire extinguisher). The projected direction of future research is also to analyze the possibilities of extinguishing acoustic waves taking into account the multipoint distribution of sound sources so that the acoustic stream is directed as precisely as possible to the source of flames [21, 23, 46]. In this analyzed process, DNN as well as other machine learning approaches can be implemented. Moreover, the authors, on the basis of achieved results, conclude that autonomous acoustic fire extinguishers can be realized using low-cost hardware platforms and implementing algorithms for artificial intelligence based on deep neural networks.
References 1. Toulouse, T., Rossi, L., Akhloufi, M., Çelik, T., Maldague, X.: Benchmarking of wildland fire colour segmentation algorithms. IET Image Proc. 9(12), 1064–1072 (2015) 2. Šerić, L., Stipaničev, D., Krstinić, D.: ML/AI in intelligent forest fire observer network. In: 3rd International Conference on Management of Manufacturing Systems. EAI, Dubrovnik (2018)
32
S. Ivanov and S. Stankov
3. Šerić, L., Stipaničev, D., Štula, M.: Observer network and forest fire detection. Information Fusion 12(3), 160–175 (2011) 4. Chitade, A.Z., Katiyar, S.K.: Colour based image segmentation using k-means clustering. Int. J. Eng. Sci. Technol. 2(10), 5319–5325 (2010) 5. San-Miguel-Ayanz, J., Ravail, N.: Active fire detection for fire emergency management: potential and limitations for the operational use of remote sensing. Nat. Hazards 35, 361–376 (2005) 6. Wilk-Jakubowski, J.: Information systems engineering using VSAT networks. Yugosl. J. Oper. Res. 31(3), 409–428 (2020) 7. Szegedy, Ch., Toshev, A., Erhan, D.: Deep neural networks for object detection. In: Proceedings of the 26th International Conference on Neural Information Processing Systems, pp. 2553–2561. Curran Associates Inc., New York (2013) 8. Chen, T., Wu, P., Chiou, Y.: An early fire-detection method based on image processing. In: Proceedings of International Conference on Image Processing, pp. 1707–1710. IEEE Press, Singapore (2004) 9. Foley, D., O’Reilly, R.: An evaluation of convolutional neural network models for object detection in images on low-end devices. In: Proceedings for the 26th AIAI Irish Conference on Artificial Intelligence and Cognitive Science, pp. 350–361. Trinity College Dublin, Dublin (2018) 10. Janků, P., Komínková Oplatková, Z., Dulík, T.: Fire detection in video stream by using simple artificial neural network. Mendel 24(2), 55–60 (2018) 11. Kurup, A.R.: Vision based fire flame detection system using optical flow features and artificial neural network. Int. J. Sci. Res. 3(10), 2161–2168 (2014) 12. Zhang, X.: Simple understanding of mask RCNN. Medium (2018), https://medium.com/ @alittlepain833/simple-understanding-of-mask-rcnn-134b5b330e95 13. Rossi, L., Akhloufi, M., Tison, Y.: On the use of stereovision to develop a novel instrumentation system to extract geometric fire fronts characteristics. Fire Saf. J. 46(1–2), 9–20 (2011) 14. Li, Z., Mihaylova, L.S., Isupova, O., Rossi, L.: Autonomous flame detection in videos with a Dirichlet process Gaussian mixture color model. IEEE Trans. Industr. Inf. 14(3), 1146–1154 (2018) 15. Çelik, T.: Fast and efficient method for fire detection using image processing. Electronics and Telecommunications Research Institute Journal 32(6), 881–890 (2010) 16. Toulouse, T., Rossi, L., Campana, A., Çelik, T., Akhloufi, M.: Computer vision for wildfire research: an evolving image dataset for processing and analysis. Fire Saf. J. 92, 188–194 (2017) 17. Marbach, G., Loepfe, M., Brupbacher, T.: An image processing technique for fire detection in video images. Fire Saf. J. 41(4), 285–289 (2006) 18. Horng, W.-B., Peng, J.-W., Chen, C.-Y.: A new image-based real-time flame detection method using color analysis. In: IEEE International Conference on Networking. Sensing and Control, pp. 100–105. IEEE Press, Tucson (2005) 19. Friedman, A.N., Stoliarov, S.I.: Acoustic extinction of laminar line-flames. Fire Saf. J. 93, 102–113 (2017) 20. Węsierski, T., Wilczkowski, S., Radomiak, H.: Wygaszanie procesu spalania przy pomocy fal akustycznych. Bezpieczeństwo i Technika Pożarnicza 30(2), 59–64 (2013) 21. Wilk-Jakubowski, J.: Analysis of flame suppression capabilities using low-frequency acoustic waves and frequency sweeping techniques. Symmetry 13(7), 1299 (2021) 22. Stawczyk, P., Wilk-Jakubowski, J.: Non-invasive attempts to extinguish flames with the use of high-power acoustic extinguisher. Open Eng. 11(1), 349–355 (2021)
The Artificial Intelligence Platform with the Use of DNN
33
23. Niegodajew, P., Gruszka, K., Gnatowska, R., Šofer, M.: Application of acoustic oscillations in flame extinction in a presence of obstacle. In: XXIII Fluid Mechanics Conference. IOP, Zawiercie (2018) 24. Niegodajew, P., et al.: Application of acoustic oscillations in quenching of gas burner flame. Combust. Flame 194, 245–249 (2018) 25. Radomiak, H., Mazur, M., Zajemska, M., Musiał, D.: Gaszenie płomienia dyfuzyjnego przy pomocy fal akustycznych. Bezpieczeństwo i Technika Pożarnicza 40(4), 29–38 (2015) 26. Methods and systems for disrupting phenomena with waves, by: Tran, V., Robertson, S. (Nov. 24, 2015). Patent US, no application: W02016/086068 27. Fire extinguishing appliance and appended supplementary appliances, by: Davis, Ch.B. (Apr. 13, 1987). Patent US, no application 07/040393 28. Remote lighted wick extinguisher, by: Thigpen, H.D. (Oct. 29, 1997). Patent US, no application: 08/960,372 29. Sposób gaszenia płomieni falami akustycznymi (The method of extinguishing flames with acoustic waves, in Polish), by: Wilczkowski, S., Szecówka, L., Radomiak, H., Moszoro, K. (Dec. 18, 1995). Patent PL, PAT.177792, no application: P.311909 30. Urządzenie do gaszenia płomieni falami akustycznymi (System for suppressing flames by acoustic waves, in Polish), by: Wilczkowski, S., Szecówka, L., Radomiak, H., Moszoro, K. (Dec. 18, 1995). Patent PL, PAT.177478, no application: P.311910 31. Urządzenie do gaszenia płomieni falami akustycznymi (System for suppressing flames by acoustic waves, in Polish), by: Wilk-Jakubowski, J. (Feb. 13, 2018). Small patent PL, RWU.070441, no application: W.127019 32. Urządzenie do gaszenia płomieni falami akustycznymi (Device for flames suppression with acoustic waves, in Polish), by: Wilk-Jakubowski, J. (Nov. 30, 2018). Patent PL, PAT.233026, no application: P.428002 33. Urządzenie do gaszenia płomieni falami akustycznymi (Device for flames suppression with acoustic waves, in Polish), by: Wilk-Jakubowski, J. (Nov. 30, 2018). Patent PL, PAT.233025, no application: P.427999 34. Urządzenie do gaszenia płomieni falami akustycznymi (Device for flames suppression with acoustic waves, in Polish), by: Wilk-Jakubowski, J. (Jan. 18, 2019). Patent PL, PAT.234266, no application: P.428615 35. Harabin, R., Wilk-Jakubowski, G., Ivanov, S.: Robotics in crisis management: a review of the literature. Technol. Soc. 2021 (under review). University of Social Sciences in Łódź & Varna University of Management, Łódź-Varna (2020) 36. Marek, M.: Wykorzystanie ekonometrycznego modelu klasycznej funkcji regresji liniowej do przeprowadzenia analiz ilościowych w naukach ekonomicznych. Rola informatyki w naukach ekonomicznych i społecznych. Innowacje i implikacje interdyscyplinarne. Wydawnictwo Wyższej Szkoły Handlowej im. B. Markowskiego w Kielcach, Kielce (2013) 37. Wilk-Jakubowski, J.: Predicting satellite system signal degradation due to rain in the frequency range of 1 to 25 GHz. Pol. J. Environ. Stud. 27(1), 391–396 (2018) 38. Wilk-Jakubowski, J.: Total signal degradation of Polish 26–50 GHz satellite systems due to Rain. Pol. J. Environ. Stud. 27(1), 397–402 (2018) 39. Intelligent Computing & Optimization, Conference proceedings 2018 (ICO 2018). https:// www.springer.com/gp/book/9783030009786 40. Intelligent Computing and Optimization, Proceedings of the 2nd International Conference on Intelligent Computing and Optimization 2019 (ICO 2019). https://www.springer.com/gp/ book/9783030335847 41. Intelligent Computing and Optimization Proceedings of the 3rd International Conference on Intelligent Computing and Optimization 2020 (ICO 2020). https://link.springer.com/book/ 10.1007/978-3-030-68154-8
34
S. Ivanov and S. Stankov
42. Wilk-Jakubowski, J.: Overview of broadband information systems architecture for crisis management. Informatyka, Automatyka, Pomiary w Gospodarce i Ochronie Środowiska 10 (2), 20–23 (2020) 43. Wilk, J.Ł: Techniki cyfrowego rozpoznawania krawędzi obrazów. Wydawnictwo Stowarzyszenia Współpracy Polska-Wschód. Oddział Świętokrzyski, Kielce (2009) 44. Tempest, W.: Infrasound and Low Frequency Vibration. Academic Press Inc., London (1976) 45. Noga, A.: Przegląd obecnego stanu wiedzy z zakresu techniki infradźwiękowej i możliwości wykorzystania fal akustycznych do oczyszczania urządzeń energetycznych. Zeszyty Energetyczne 1, 225–234 (2014) 46. Yi, E.-Y., Bae, M.-J: A study on the directionality of sound fire extinguisher in electric fire. Convergence Research Letter of Multimedia Services Convergent with Art, Humanities and Sociology 3, 1449–1452 (2017)
Adaptive Harmony Search for Cost Optimization of Reinforced Concrete Columns Aylin Ece Kayabekir1(&), Sinan Melih Nigdeli2, and Gebrail Bekdaş2 1
2
Department of Civil Engineering, Istanbul Gelisim University, 34310 Avcılar, Istanbul, Turkey [email protected] Department of Civil Engineering, Istanbul University-Cerrahpaşa, 34320 Avcılar, Istanbul, Turkey {melihnig,bekdas}@iuc.edu.tr
Abstract. The performance of metaheuristic algorithms used in engineering optimization is evaluated via the robustness of the method. To make a better algorithm for specific problems, investigations have to be continued by applying methods to new problems. In the present study, adaptive harmony search (AHS) that automatically updates the algorithm parameters is presented for optimum cost design of reinforced concrete (RC) columns. The results were compared with the approach that constant parameters within the optimization process. Results proved that AHS is not greatly affected by the choice of initial algorithm parameters. Keywords: Optimization search
Reinforced concrete Metaheuristics Harmony
1 Introduction Metaheuristics are the algorithms that are used to solve challenging problems by using a process that applies the tasks in order. The heuristic starts with the first people since it employs the human mind [1], and algorithms started to be generated via formulating processes as metaphors. One of the oldest algorithms called tabu search used the human mind [2], and then evaluation [3–5], several processes [6–10], swarm-intelligence [11– 13] and nature [14–18] had been used in inspiration. These algorithms are greatly helpful for problems that can not be solved or can be only solve via numerical iterations [19–21]. Civil engineering and especially structural engineering have a lot of these kinds of problems and metaheuristics is a popular tool in optimization and analysis problems [22]. The great number of studies using metaheuristic in structural engineering are optimization problems including truss structures [23–26], structural control tuning [27–35] and optimum design of reinforced concrete (RC) members [36–56], but metaheuristics can be also used in structural analysis of complex and non-linear systems [57–64]. In the present study, adaptive harmony search is presented to optimize dimension and reinforcement variables of RC columns to minimize the total material cost. The adaptive version of the algorithm was compared with the classical one to show the advantage of the adaptive one in the choice of algorithm parameters. The reason for the © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 35–44, 2022. https://doi.org/10.1007/978-3-030-93247-3_4
36
A. E. Kayabekir et al.
usage of an adaptive version of the algorithm is to avoid the parameter setting process, and the results showed that adaptive algorithms are not dramatically affected by the parameters while it is vice-versa for the classical form of the algorithm.
2 Methodology In this section, the optimization process based on the Adaptive Harmony Search (AHS) metaheuristic algorithm for cost minimization of reinforced concrete (RC) columns is introduced. The loading and reinforcement conditions of the RC column under the axial (Nz) and bending moment (My) can be seen in Fig. 1.
y
Nz My
Fig. 1. RC column under uniaxial bending moment
x
Adaptive Harmony Search for Cost Optimization
37
In the analysis of RC members, several assumptions are done. The assumptions done in this optimization study is as follows: 1- Firstly, plane sections normal to the axis are also assumed as planes after the deformation via bending. Due to that, the strain at any point in the cross-section is proportional to the distance from the neutral axis. 2- For the ductile design of columns, the axial force must be limited with the proposed values in the design regulations. 3- The relationship between the stress-strain distribution for the concrete is assumed as a parabolic one, and then, it is idealized as an equivalent rectangular stress block. 4- The tensile strength of the concrete is very low, and it is ignored in calculations in design. 5- The buckling and second-order effects are not considered for the column. The investigation is not suitable for slender columns. 6- The column is subjected to a bending moment only around a single axis. The Harmony Search (HS) algorithm, inspired by a musician’s process of searching for the most appropriate combination of notes (harmony), is a metaheuristic algorithm developed by Geem et al. [8]. The optimization process via HS can be summarized with five steps. Step 1: Design constants (Table 1), range of design variables (Table 2) of the optimization problem, algorithm-specific parameters such as harmony memory size (HMS) in other words population number (pn), initial harmony memory considering rate (HMCRin), initial pitch adjusting rate (PARin) and stopping criterion of the optimization (the numbers of maximum iterations: MI) are defined. Step 2: An initial HM matrix is generated as in Eq. (1). This matrix consists of totally pn candidate solution sets that include values of each design variable (Xi, i = 1–N). The problem has three design variables as seen in Table 2. 2
X 1;1 X 2;1 .. .
6 6 6 HM ¼ 6 6 4 X N1;1 X N;1
X 1;2 X 2;2 .. . X N1;2 X N;2
X 1;pn X 2;pn .. .
3
7 7 7 7 7 X N1;pn 5 X N;pn
ð1Þ
Initial values of design variables are randomly generated between maximum Xi(max) and minimum Xi(min) limits defined in Step 1 (Eq. 2). X i ¼ X iðminÞ þ randðX iðmaxÞ X iðminÞ Þ In Eq. (2), rand is a random number between 0 and 1.
ð2Þ
38
A. E. Kayabekir et al.
Step 3: The objective function of the problem is calculated for each solution set and design constants (Table 3 as calculated according to ACI 318: Building Code Requirement for reinforced concrete [65]) are checked. For the objective function value of the solutions that do not provide design constants, a penalization value is assigned. Then all values of objectives are stored in a vector. The objective function (OF) as given in Eq. (3) determines the total material cost and to ensure that this function is minimum during the optimization process, an appropriate solution set is searched. OF ¼ C c V c þ C s W s
ð3Þ
In Eq. (3), Cc and Cs are cost of per unit volume of concrete and cost of per unit weight of reinforcing steel respectively. Volume of the concrete and weight of the reinforcing steel are symbolized with Vc and Ws respectively. Step 4: New solution set is generated. According to the HS algorithm rules, the new value of each design variable (Xi(new)) can be generated via two equations given in Eqs. (4) and (5). The first equation (Eqs. (4)) is similar to generating the initial solutions. X iðnewÞ ¼ X iðminÞ þ randðX iðmaxÞ X iðminÞ Þ
ð4Þ
The second equation (Eqs. (5)) randomly generates new values within the range obtained by multiplying Pitch Adjusting Rate (PAR) and differences between ultimate limits of design variable (Xi(max) – Xi(min)). This generation is done for a randomly selected solution set. XiðnewÞ ¼ Xi;k þ randPARðX iðmaxÞ X iðminÞ Þ
ð5Þ
In Eq. (5), Xi,k express the value of a design variable in the selected solution set. Which of these two equations to use is decided according to Harmony Memory Considering Rate (HMCR). In AHS, HMCR and PAR values are updated according to Eqs. (6) and (7) concerning the current iteration (IN). PAR ¼ PARin ð1
IN Þ MI
HMCR ¼ HMCRin ð1
IN Þ MI
ð6Þ ð7Þ
Adaptive Harmony Search for Cost Optimization
39
Step 5: Comparisons are done between the new solution set and the existing solutions. In terms of the objective function, in the case of the new solution is better than the worst solution in the existing solution matrix, the worst solution is replaced by the new solution. Otherwise, no modification is done in the solution matrix. The last two steps are continued until the stopping criterion is satisfied.
3 Numerical Example In Tables 1 and 2, the numerical example data for design constants and variables are presented. The HS parameters HMS, HMCRin and PARin are defined as 30, 0.5 and 0.25, respectively for classical HS and case 1 of AHS. As the second case of AHS, HMCRin and PARin are both taken as 1 to validate that AHS is not very dependent on the specific parameters. In classic HS, the algorithm is only a random search method when both parameters are taken as 1. Also, this case is presented as the random search (RS).
Table 1. Design constants of the optimization Definition Symbol Flexural moment Mu Axial force Nu Length of column L Strain corresponding ultimate stress of concrete ec Max. aggregate diameter Dmax Yield strength of steel fy 0 Compressive strength of concrete fc Elasticity modulus of steel Es Specific gravity of steel cs Cost of the concrete per m3 Cc Cost of the steel per ton Cs
Value 300 kNm 2000 kN 3m 0.003 16 mm 420 MPa 30 MPa 200000 MPa 7.86 t/m3 40 $ 700 $
Table 2. Design variables of the optimization Definition Breadth of column Height of column Reinforcement ratio
Symbol b h q
Value 250 mm < b < 400 mm 300 mm < h < 500 mm 0.01 < q < 0.06
40
A. E. Kayabekir et al.
In the process, a maximum iteration number is used for the stopping criterion and this iteration number is taken as 10000 for the numerical example. 10 cycles of the optimization process were done for evaluation of the results. The optimum results are presented in Table 4 with minimum (OFmin), average (OFave), maximum (OFmax) and standard deviation (std) of these 10 cycles.
Table 3. Design constraints of the optimization Definition Maximum axial force (Nmax) Minimum steel area, Asmin Maximum steel area, Asmax Flexural strength capacity, Md Axial force capacity, Nd
Constraint 0
Nd Nmax = 0.5 f c bh As Asmin = 0.01bh As Asmax = 0.06bh (seismic design) Md Mu Nd Nu
Table 4. Optimum results RS b (mm) 399.862 h (mm) 499.7068 q 0.01267 OFmin ($) 65.7658 OFave ($) 67.2248 OFmax ($) 68.4724 std 7.93E-01
HS 400 500 0.012409 64.9650 64.9663 64.9683 9.76E-04
AHS (Case 1) AHS (Case 2) 400 400 500 500 0.012409 0.012409 64.96500 64.9650 64.96504 64.9653 64.96515 64.9662 4.58E-05 3.58E-04
4 Conclusion In the present study, the cost optimization of RC columns was investigated for AHS and the results were compared with classical HS. Two cases of HS parameters were investigated. For the first case of parameters that are tested as the best combination for the HS algorithm, HS can find the optimum results with a slight difference with AHS that uses these parameters as the initial value. In that situation, the std value of 10 runs of the optimization process is very different for HS and AHS. In that case, AHS is robust according to HS and this situation can be easily followed via close average and maximum values of OF to the best one. As known, these algorithm parameters are very effective in the optimum design including convergence and sensibility. For that reason, a second case is done. In this case, HMCR and PAR values are taken as 1. That means the only generation of all candidate solutions will be generated via solution range and the HS approach can be mentioned as a random search. In AHS, the initial values taken as 1 will be updated
Adaptive Harmony Search for Cost Optimization
41
according to iteration. This case is the worst choice for these parameters and RS results are not effective to find the best optimum solution reported in Case 1. The std value of RS is also very big compared to the other. Whereas AHS is also effective with these parameters and the best solution can be found within 10 cycles and an acceptable std value is obtained and it is smaller than Case 1 results of HS. As conclusion, parameter setting is very important for metaheuristic algorithms. This situation needs the validation of results for different parameters, but this situation is not urgently needed for adaptive algorithms that use an active change of these parameters during the optimization process. Acknowledgments. This study was funded by Scientific Research Projects Coordination Unit of Istanbul University-Cerrahpasa. Project number: FYO-2019–32735.
References 1. Sörensen, K., Sevaux, M., Glover, F.: A history of metaheuristics. Handbook of heuristics, pp. 1–18 (2018) 2. Glover, F.: Future paths for integer programming and links to artificial intelligence. Comput. Oper. Res. 13(5), 533–549 (1986) 3. Goldberg, D.E., Samtani, M.P.: Engineering optimization via genetic algorithm. In: Proceedings of Ninth Conference on Electronic Computation. ASCE, New York, NY, pp. 471–482 (1986) 4. Holland, J.H.: Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor, Michigan (1975) 5. Storn, R., Price, K.: Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. J. Global Optim. 11(4), 341–359 (1997) 6. Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by simulated annealing. Science 220(4598), 671–680 (1983) 7. Erol, O.K., Eksin, I.: A new optimization method: big bang–big crunch. Adv. Eng. Softw. 37 (2), 106–111 (2006) 8. Geem, Z.W., Kim, J.H., Loganathan, G.V.: A new heuristic optimization algorithm: harmony search. Simulation 76, 60–68 (2001) 9. Rao, R.V., Savsani, V.J., Vakharia, D.P.: Teaching–learning-based optimization: a novel method for constrained mechanical design optimization problems. Comput. Aided Des. 43 (3), 303–315 (2011) 10. Rao, R.: Jaya: a simple and new optimization algorithm for solving constrained and unconstrained optimization problems. Int. J. Ind. Eng. Comput. 7(1), 19–34 (2016) 11. Kennedy, J., Eberhart, R.C.: Particle swarm optimization. In: Proceedings of IEEE International Conference on Neural Networks No. IV, November 27-December 1, pp. 1942– 1948. Perth Australia (1995) 12. Dorigo, M., Maniezzo, V., Colorni, A.: The ant system: optimization by a colony of cooperating agents. IEEE Trans. Syst. Man Cybern. B 26, 29–41 (1996) 13. Karaboga, D., Basturk, B.: A powerful and efficient algorithm for numerical function optimization: artificial bee colony (ABC) algorithm. J. Global Optim. 39(3), 459–471 (2007) 14. Yang, X.S., Deb, S.: Engineering optimisation by cuckoo search. Int. J. Math. Model. Numer. Optim. 1(4), 330–343 (2010)
42
A. E. Kayabekir et al.
15. Yang, X.S.: Firefly algorithm, stochastic test functions and design optimisation. Int. J. of Bio-Inspir. Com. 2(2), 78–84 (2010) 16. Yang, X. S.: A new metaheuristic bat-inspired algorithm. In: Nature Inspired Cooperative Strategies for Optimization (NICSO 2010), pp. 65–74. Springer, Berlin, Heidelberg (2010) 17. Yang, X.S.: Flower pollination algorithm for global optimization. In: International Conference on Unconventional Computing and Natural Computation, pp. 240–249. Springer, Berlin, Heidelberg (2012) 18. Mirjalili, S., Mirjalili, S.M., Lewis, A.: Grey wolf optimizer. Adv. Eng. Softw. 69, 46–61 (2014) 19. Vasant, P., Zelinka, I., Weber, G.W. (eds.): Intelligent Computing & Optimization. In: Proceedings of the 2nd International Conference on Intelligent Computing and Optimization 2018 (ICO 2018). Springer (2018) 20. Vasant, P., Zelinka, I., Weber, G.W. (eds.): Intelligent Computing & Optimization. In: Proceedings of the 2nd International Conference on Intelligent Computing and Optimization 2019 (ICO 2019). Springer (2019) 21. Vasant, P., Zelinka, I., Weber, G.W. (eds.): Intelligent Computing & Optimization. Proceedings of the 3rd International Conference on Intelligent Computing and Optimization 2020 (ICO 2020). Springer (2020) 22. Toklu, Y.C., Bekdas, G., Nigdeli, S.M.: Metaheuristics for Structural Design and Analysis. John Wiley & Sons (2021) 23. Talatahari, S., Goodarzimehr, V.: A discrete hybrid teaching-learning-based optimization algorithm for optimization of space trusses, J. Struct. Eng. Geo-Techniques. 9(1) (2019) 24. Salar, M., Dizangian, B.: Sizing optimization of truss structures using ant lion optimizer. In: 2nd International Conference on Civil Engineering, Architecture and Urban Management in Iran. August-2019 Tehran University (2019) 25. Bekdaş, G., Yucel, M., Nigdeli, S.M.: Evaluation of metaheuristic-based methods for optimization of truss structures via various algorithms and lèvy flight modification. Buildings 11(2), 49 (2021) 26. Leung, A.Y.T., Zhang, H.: Particle swarm optimization of tuned mass dampers. Eng. Struct. 31(3), 715–728 (2009) 27. Bekdaş, G., Nigdeli, S.M.: Estimating optimum parameters of tuned mass dampers using harmony search. Eng. Struct. 33, 2716–2723 (2011) 28. Pourzeynali, S., Salimi, S., Kalesar, H.E.: Robust multi-objective optimization design of tmd control device to reduce tall building responses against earthquake excitations using genetic algorithms. Sci. Iran. 20(2), 207–221 (2013) 29. Arfiadi, Y.: Reducing response of structures by using optimum composite tuned mass dampers. Procedia Eng. 161, 67–72 (2016) 30. Farshidianfar, A., Soheili, S.: ABC optimization of tmd parameters for tall buildings with soil structure interaction. Interaction and Multiscale Mechanics 6(4), 339–356 (2013) 31. Bekdaş, G., Nigdeli, S.M., Yang, X.S.: A novel bat algorithm based optimum tuning of mass dampers for improving the seismic safety of structures. Eng. Struct. 159, 89–98 (2018) 32. Yucel, M., Bekdaş, G., Nigdeli, S.M., Sevgen, S.: Estimation of optimum tuned mass damper parameters via machine learning. J. Build. Eng. 26, 100847 (2019) 33. Ulusoy, S., Bekdas, G., Nigdeli, S.M.: Active structural control via metaheuristic algorithms considering soil-structure interaction. Struct. Eng. Mech. 75(2), 175–191 (2020) 34. Ulusoy, S., Nigdeli, S.M., Bekdaş, G.: Novel metaheuristic-based tuning of PID controllers for seismic structures and verification of robustness. J. Build. Eng. 33, 101647 (2021) 35. Ulusoy, S., Bekdaş, G., Nigdeli, S.M., Kim, S., Geem, Z.W.: Performance of optimum tuned PID controller with different feedback strategies on active-controlled structures. Appl. Sci. 11(4), 1682 (2021)
Adaptive Harmony Search for Cost Optimization
43
36. Coello, C.C., Hernandez, F.S., Farrera, F.A.: Optimal design of reinforced concrete beams using genetic algorithms. Expert Syst. Appl. 12, 101–108 (1997) 37. Govindaraj, V., Ramasamy, J.V.: Optimum detailed design of reinforced concrete continuous beams using genetic algorithms. Comput. Struct. 84, 34–48 (2005). https://doi. org/10.1016/j.compstruc.2005.09.001 38. Fedghouche, F., Tiliouine, B.: Minimum cost design of reinforced concrete T-beams at ultimate loads using Eurocode2. Eng. Struct. 42, 43–50 (2012). https://doi.org/10.1016/j. engstruct.2012.04.008 39. Leps, M., Sejnoha, M.: New approach to optimization of reinforced concrete beams. Comput. Struct. 81, 1957–1966 (2003). https://doi.org/10.1016/S0045-7949(03)00215-3 40. Akin, A., Saka, M.P.: Optimum detailed design of reinforced concrete continuous beams using the harmony search algorithm. In: The tenth international conference on computational structures technology, pp. 131 (2010) 41. Bekdaş, G., Nigdeli, S.M.: Cost optimization of t-shaped reinforced concrete beams under flexural effect according to ACI 318. In: 3rd European Conference of Civil Engineering (2012) 42. Bekdaş, G., Nigdeli, S.M.: Optimization of T-shaped RC flexural members for different compressive strengths of concrete. Int. J. Mech. 7, 109–119 (2013) 43. Bekdaş, G., Nigdeli, S.M., Yang, X.: Metaheuristic Optimization for the Design of Reinforced Concrete Beams under Flexure Moments 44. Bekdaş, G., Nigdeli, S.M.: Optimum design of reinforced concrete beams using teachinglearning-based optimization. In: 3rd International Conference on Optimization Techniques in Engineering (OTENG’15), p. 7–9 (2015) 45. Kayabekir, A.E., Bekdaş, G., Nigdeli, S.M.: Optimum design of t-beams using jaya algorithm. In: 3rd International Conference on Engineering Technology and Innovation (ICETI), Belgrad, Serbia (2019) 46. Koumousis, V.K., Arsenis, S.J.: Genetic algorithms in optimal detailed design of reinforced concrete members. Comput-Aided Civ. Inf. 13, 43–52 (1998) 47. Yepes, V., Martí, J.V., García-Segura, T.: Cost and CO2 emission optimization of precast– prestressed concrete U-beam road bridges by a hybrid glowworm swarm algorithm. Autom. Constr. 49, 123–134 (2015) 48. Rafiq, M.Y., Southcombe, C.: Genetic algorithms in optimal design and detailing of reinforced concrete biaxial columns supported by a declarative approach for capacity checking. Comput. Struct. 69, 443–457 (1998) 49. Gil-Martin, L.M., Hernandez-Montes, E., Aschheim, M.: Optimal reinforcement of RC columns for biaxial bending. Mater. Struct. 43, 1245–1256 (2010) 50. Camp, C.V., Pezeshk, S.,Hansson, H.H.: Flexural Design of reinforced concrete frames using a genetic algorithm. J. Struct. Eng.-ASCE. 129, 105–11 (2003) 51. Govindaraj, V., Ramasamy, J.V.: Optimum detailed design of reinforced concrete frames using genetic algorithms. Eng. Optimiz. 39(4), 471–494 (2007) 52. Ceranic, B., Fryer, C., Baines, R.W.: An application of simulated annealing to the optimum design of reinforced concrete retaining structures. Comput. Struct. 79, 1569–1581 (2001) 53. Camp, C.V., Akin, A.: Design of retaining walls using big bang–big crunch optimization. J Struct. Eng.-ASCE. 138(3), 438–448 (2012) 54. Kaveh, A., Abadi, A.S.M.: Harmony search based algorithms for the optimum cost design of reinforced concrete cantilever retaining walls. Int. J. Civ. Eng. 9(1), 1–8 (2011) 55. Talatahari, A., heikholeslami, R., Shadfaran, M., Pourbaba,M.: Optimum design of gravity retaining walls using charged system search algorithm. Math. Probl. Eng. Article ID 301628 (2012)
44
A. E. Kayabekir et al.
56. Sahab, M.G., Ashour, A.F., Toropov, V.V.: Cost optimisation of reinforced concrete flat slab buildings. Eng. Struct. 27, 313–322 (2005). https://doi.org/10.1016/j.engstruct.2004.10.002 57. Toklu, Y.C.: Nonlinear analysis of trusses through energy minimization. Comput. Struct. 82 (20–21), 1581–1589 (2004) 58. Nigdeli, S.M., Bekdaş, G., Toklu, Y.C.: Total potential energy minimization using metaheuristic algorithms for spatial cable systems with increasing second order effects. In: 12th International Congress on Mechanics (HSTAM2019), pp. 22–25 (2019) 59. Bekdaş, G., Kayabekir, A.E., Nigdeli, S.M., Toklu, Y.C.: Advanced energy‐based analyses of trusses employing hybrid metaheuristics. Struct. Des. Tall and Spec. Build. 28(9), e1609 (2019) 60. Toklu, Y.C., et al.: Total potential optimization using metaheuristic algorithms for solving nonlinear plane strain systems. Appl. Sci. 11(7), 3220 (2021) 61. Toklu, Y.C., Bekdaş, G., Kayabekir, A.E., Nigdeli, S.M., Yücel, M.: Total potential optimization using hybrid metaheuristics: a tunnel problem solved via plane stress members. In: Advances in Structural Engineering—Optimization, pp. 221–236. Springer, Cham (2021) 62. Toklu, Y.C., Kayabekir, A.E., Bekdaş, G., Nigdeli, S.M., Yücel, M.: Analysis of plane-stress systems via total potential optimization method considering nonlinear behavior. J. Struct. Eng. 146(11), 04020249 (2020) 63. Toklu, Y.C., Bekdaş, G., Kayabekir, A.E., Nigdeli, S.M., Yücel, M.: Total potential optimization using metaheuristics: analysis of cantilever beam via plane-stress members. In: International Conference on Harmony Search Algorithm, pp. 127–138. Springer, Singapore (2020) 64. Kayabekir, A.E., Toklu, Y.C., Bekdaş, G., Nigdeli, S.M., Yücel, M., Geem, Z.W.: A novel hybrid harmony search approach for the analysis of plane stress systems via total potential optimization. Appl. Sci. 10(7), 2301 (2020) 65. ACI Committee, American Concrete Institute, & International Organization for Standardization: Building code requirements for structural concrete (ACI 318-05) and commentary. American Concrete Institute (2008)
Efficient Traffic Signs Recognition Based on CNN Model for Self-Driving Cars Said Gadri(&) and Nour ElHouda Adouane Laboratory of Informatics and Its Applications of M’sila LIAM, Department of Computer Science, Faculty of Mathematics and Informatics, University Mohamed Boudiaf of M’sila, 28000 M’sila, Algeria {said.kadri,nourelhouda.adouane}@univ-msila.dz
Abstract. Self-Driving Cars or Autonomous Cars provide many benefits for humanity, such as reduction of deaths and injuries in road accidents, reduction of air pollution, increasing the quality of car control. For this purpose, some cameras or sensors are placed on the car, and an efficient control system must be set up, this system allows to receive images from different cameras and/or sensors in real-time especially those representing traffic signs, and process them to allows high autonomous control and driving of the car. Among the most promising algorithms used in this field, we find convolutional neural networks CNN. In the present work, we have proposed a CNN model composed of many convolutional layers, max-pooling layers, and fully connected layers. As programming tools, we have used python, Tensorflow, and Keras which are currently the most used in the field. Keywords: Machine learning Deep learning Traffic signs recognition Convolutional neural networks Autonomous driving Self-driving cars
1 Introduction One of the applications of Machine Learning ML and Deep Learning DL is in the field of autonomous driving or self-driving cars. It is a new high technology that might operate the self-driving of future cars. As a baseline algorithm, CNN (Convolutional Neural Network) model is used to predict the control command from the video frames. One interesting task of this control system is to recognize different traffic signs present on the road to guaranty safe driving [1]. For this purpose, a CNN model is trained on map pixels from processed images taken from cameras and sensors placed on the car. This kind of model proved its performance in many other works such as medical imaging, pattern recognition (text, speech, etc.), computer vision, and other interesting applications [2]. Several benefits can be achieved using this high technology, notably: the reduction of deaths and injuries in road accidents, reduction of air pollution, increasing the quality of car control, etc. in one word the main objective is to achieve the safety of humans. In this way, an automatic system of detection helps the driver to recognize the different signs quickly and consequently some risks, especially when this driver is in a bad mental state or drives his car in a crowded city or any other complex environment, which can cause the driver to overlook messages sent from the traffic © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 45–54, 2022. https://doi.org/10.1007/978-3-030-93247-3_5
46
S. Gadri and N. E. Adouane
signs put on the side of the road. Thus, the sign is to report correct messages as soon as possible to the driver and then reduce the burden of the driver and increase the safety of driving and decrease the risk of accidents. The present paper is organized as follows: Sect. 1 is a short introduction presenting the area of our work and its advantages and benefits. Section 2 is a detailed overview of the related works in the same area. In the third section, we described our proposed model. The fourth section presents the experimental part we have done to validate our proposed model. In Sect. 5, we illustrated the obtained results when applying our new model. In Sect. 6, we discussed the results obtained in the previous section. In the last section, we summarized the realized work and suggested some perspectives for future researches.
2 Related Work The idea of autonomous driving started at the end of the 1920s, but the first autonomous car appeared in the 1980s. some promising projects have been realized in this period such as the autonomous car called NAVLAB in 1988 and its control system ALVINN [3, 4]. Among the most important tasks in the field of self-driving cars or autonomous vehicles, we find the traffic signs recognition. For this purpose, several methods based on feature extraction have been developed, including Scale-Invariant Feature Transformation (SIFT) [5], Histogram of Oriented Gradient (HOG) [6], and Speed Up Robust Feature (SURF) [7]. The use of ANN in autonomous driving is not new, Pomerleau used in ALVINN system a fully connected neural network with a single hidden layer with 29 neurons to predict steering commands for the vehicle. The rise in machine learning and DL, especially, the famous CNN models helped to improve significantly the performance of the traffic signs detection and surprising results have been achieved [8]. In 2004, the company DARPA seeded a project named DAVE or DARPA autonomous vehicle (Net-scale Technologies 2004) based on the use of the CNN model. Many years later, some new methods based on CNNs have been also developed, it is the case of the method called semantic segmentation aware SSA [9] and the DP-KELM method [10, 11]. More recently, the Nvidia team trained large CNN mapping images obtained from driving a real car to steering commands [12]. Today, CNN models have been developed and applied on many other interesting applications, notably: AlexNet [13], VGG [14], GoogleNet [15], ResNet [16], R-CNN series [17, 18], Yolo [19], SDD [20], R-FCN [21]. These models are widely used by researchers in different areas of object recognition and gave an excellent performance on most of the available datasets. This encourages researchers in the field of traffic signs recognition and self-driving cars to develop new models more performant and accurate. For instance, GoogleNet structure which is a multilabel CNN neural network has been used for road scene recognition [22]. Similarly, ResNet architecture based on a multilevel CNN network has been used to classify traffic signs [23]. [24] used A hybrid method that combines CNN and RNN networks to extract deep features based on: object semantics, global and contextual appearances, then used them for scene recognition. [25] realized a system based on a feed-forward CNN model to detect objects. [26] developed an efficient system based on stacked encoder-decoder 2D CNNs to perform contextual aggregation and to predict a disparity map. [27] proposed
Efficient Traffic Signs Recognition Based on CNN Model
47
a new model called Generative Adversarial Network GANs. They used: source data, a prediction map, and a ground truth label as input for lane detection. [28] proposed a new method that combines CNN and RNN in cascade for feature learning and lane prediction. [29] proposed a contextual deconvolution model by combining two contextual modules: the channel and the spatial modules. The model also used global and local features for semantic segmentation.
3 The Proposed Model In the present work, we have developed an automatic system that allows us to detect and classify some given images representing traffic signs panels. For this purpose, we have proposed a CNN model composed of many convolutional layers, max-pooling layers, and a fully connected layer. As programming tools, we have used python, Tensorflow, and Keras which are the most used in the field. Figure 1 presents a detailed diagram of the proposed CNN model to improve the performance of the classification task. Conv1 16 filters of Size 3x3
INPUT image 28 x 28 x 1
Conv3 64 filters of Size 3x3
MaxPooling of Size 2x2
MaxPooling of Size 2x2
Conv2 32 filters of Size 3x3
Full-connected (flatten) layer
Output 0 1 2 . . . . 60 61
Fig. 1. The architecture of the proposed CNN model
4 Experimental Work 4.1
Used Dataset
In our experiments, we have used the Belgium traffic signs dataset which is a collection of images usually written in French or Dutch because these two languages are the official and the most spoken languages in Belgium. The collection can be divided into six (06) categories of traffic signs: warning signs, priority signs, prohibitory signs,
48
S. Gadri and N. E. Adouane
mandatory signs, parking and standing on the road signs, designatory signs. After downloading Belgium traffic signs files (training and testing files), we take a look at the folder structure of this data set, we can see that the training, as well as the testing data folders, contain 62 subfolders, which present 62 types of traffic signs used for classification. All images have the format (.ppm: Portable PixMap). Thus, the performed task is to classify a given image into one of 62 classes representing traffic signs panels. Figure 2 represents an illustration of some traffic sign panels issued of the Belgium traffic signs dataset, Fig. 3 gives the distribution of these panels by class (type/group).
Fig. 2. Examples of images in Belgium Traffic Signs Dataset
Fig. 3. Distribution of panels by labels in Belgium Traffic Signs Dataset
Efficient Traffic Signs Recognition Based on CNN Model
4.2
49
Programming Tools
Python: Python is currently one of the most popular languages for scientific applications. It has a high-level interactive nature and a rich collection of scientific libraries which lets it a good choice for algorithmic development and exploratory data analysis. It is increasingly used in academic establishments and also in industry. It contains a famous module called the scikit-learn tool integrating a large number of ML algorithms for supervised and unsupervised problems such as decision trees, logistic regression, naïve bayes, KNN, ANN, etc. This package of algorithms allows to simplify ML to non-specialists working on a general-purpose. Tensorflow: TensorFlow is a multipurpose open-source library for numerical computation using data flow graphs. It offers APIs for beginners and experts to develop for desktop, mobile, web, and cloud. TensorFlow can be used from many programming languages such as Python, C++, Java, Scala, R, and Runs on a variety of platforms including Unix, Windows, iOS, Android. We note also that Tensorflow can be run on single machines (CPU, GPU, TPU) or distributed machines of many 100s of GPU cards. Keras: Keras is the official high-level API of TensorFlow which is characterized by many important characteristics: Minimalist, highly modular neural networks library written in Python, Capable of running on top of either TensorFlow or Theano, Large adoption in the industry and research community, Easy production of models, Supports both convolutional networks and recurrent networks and combinations of the two, Supports arbitrary connectivity schemes (including multi-input and multi-output training), Runs seamlessly on CPU and GPU. 4.3
Evaluation
To validate the different ML algorithms, and obtain the best model, we have used the cross-validation method consisting in splitting our dataset into 10 parts, train on 9 and test on 1, and repeat for all combinations of train/test splits. For the CNN model, we have used two parameters which are: loss value and accuracy metric. 1. Accuracy metric: This is a ratio of the number of correctly predicted instances divided by the total number of instances in the dataset multiplied by 100 to give a percentage (e.g., 90% accurate). 2. Loss value: used to optimize an ML algorithm or DL model. It must be calculated on training and validation datasets. Its simple interpretation is based on how well the ML algorithm or the DL built model is doing in these two datasets. It gives the sum of errors made for each example in the training or validation set.
5 Illustration of the Obtained Results To build an efficient predictive model and achieve a higher accuracy rate, we have performed the following task:
50
S. Gadri and N. E. Adouane
Designing a CNN (Convolutional Neural Network) model composed of many layers as it was presented in Sect. 3 and Fig. 1. We can also describe our proposed model as follows: • The first convolutional layer Conv1 constituted of 16 filters of size (3 3). • A Max-Pooling layer of size (2 2) allowing the reduction of dimensions (weigh, high) of images issued of the previous layer after applying the different filters of Conv1. • A second convolutional layer Conv2 constituted of 32 filters of size (3 3). • A Max-Pooling layer of size (2 2) allowing the reduction dimensions (weigh, high) of images issued of the previous layer after applying the different filters of Conv2. • A third convolutional layer Conv3 constituted of 64 filters of size (3 3). • A flatten Layer. • A full connected layer FC of size 100 allowing to transform the output of the previous layer into a mono-dimensional vector. • An output layer represented by a reduced mono-dimensional vector having as size the number of traffic signs classes (62). • For all the previous layers a “Relu” activation function and a “softmax” function are used to normalize values obtained in each layer.
Table 1. Description of the proposed CNN model Layer type conv2d_1 (Conv2D) max_pooling2d_1 (MaxPooling2) conv2d_2 (Conv2D) max_pooling2d_2 (MaxPooling2) conv2d_3 (Conv2D) flatten_1 (Flatten) dense_1 (Dense) Total parameters Trainable parameters Non-trainable parameters
Output (None, (None, (None, (None, (None, (None, (None,
shape 26, 26, 16) 13, 13, 16) 11, 11, 32) 5, 5, 32) 3, 3, 64) 576) 62)
Nb. parameters 160 0 4640 0 18496 0 35774 59,070 59,070 0
To validate our CNN model, we have used two parameters which are: loss value and accuracy metric. Below pseudocode written in Tensorflow and Keras which allowed us to build our model (Table 1).
Efficient Traffic Signs Recognition Based on CNN Model
51
model = Sequential() model.add(Convolution2D(16,(3,3),activation= 'relu',kernel_initializer='he_uniform', input_shape=(28,28,1))) model.add(MaxPooling2D(2, 2)) model.add(Convolution2D(32, (3,3), activation='relu', kernel_initializer='he_uniform')) model.add(MaxPooling2D(2, 2)) model.add(Convolution2D(64, (3,3), activation='relu', kernel_initializer='he_uniform')) model.add(Dropout(0.25)) model.add(Flatten()) model.add(Dense(100, activation='relu', kernel_initializer='he_uniform')) model.add(Dense(62, activation='softmax'))
Table 2 below summarize the obtained results after applying the proposed CNN model.
Table 2. Loss value and accuracy value obtained when applying the proposed model Loss value Accuracy value Training set 0.0921 97,31% Test set 0.0169 99,80%
Fig. 4. Training loss Vs Validation loss of the CNN model
52
S. Gadri and N. E. Adouane
Fig. 5. Training accuracy Vs Validation accuracy of the CNN model
6 Discussion Table 2 presents the obtained results when applying the proposed CNN model on the training set and the test set. Two performance measures are considered in this case, the loss value which calculates the sum of errors after training the model, and the accuracy value which gives the rate of correctness. It seems clear that the loss value is very low against the accuracy which is very high and depends on the size of the used set. It is the reason for which the accuracy of the training set is higher than the accuracy of the test set (in our case they are very closest). In the same way, Fig. 4 shows the evaluation of training loss and validation loss over time in terms of the number of epochs. It begins very high for the training and the test sets and ends very low when increasing the number of epochs. Similarly, Fig. 5 plots the evolution of training accuracy and validation accuracy in terms of the number of epochs. Contrary to the loss value, the accuracy starts very low and ends very high. This property is clearer with the training set because of its large size. Finally, we can also underline that the performance of our proposed model (the classification accuracy) is very high (99.80%) compared to other realized models cited in the literature section (Sect. 2) which helps to increase the quality of car control by recognizing all traffic signs placed on the road, and consequently, to guaranty more safety for humans and vehicles. We can extend this work to other object detection such as pedestrians, animals, and other complex obstacles.
7 Conclusion and Future Suggestions In the last years, traffic signs detection is based essentially on ML approach that gives high performance. Many years later, some important progress in the ML area has been made especially with the apparition of a new subfield called deep learning. It is mainly
Efficient Traffic Signs Recognition Based on CNN Model
53
based on the use of many neural networks of simple interconnected units to extract meaningful patterns from a large amount of data to solve some complex problems such as medical image classification, fraud detection, character recognition, etc. currently, we can use larger datasets to learn powerful models, and better techniques to avoid overfitting and underfitting. Until our days, the obtained results in this area of research are very surprising in different domains. We talk about very high values of accuracy which often exceed the threshold of 90%. For example, the accuracy rate on the digits set is over 97%. In the present paper, we have performed a task of classification on a traffic signs dataset. For this purpose, we have built a CNN model to perform the same task of classification. The achieved performance is very surprising. As perspectives of this promising work, we propose to improve these results by improving the architecture of the built CNN model by changing some model parameters such as the number of filters, the number of convolution and max-pooling layers, the size of each filter, the number of training epochs and the size of data batches. Another suggestion that seems important, is to combine CNN with recurrent networks ResNets and other types of ANN. We can also extend our model to detect other objects such as pedestrians, animals, and other complex obstacles.
References 1. Intelligent Computing and Optimization. In: Conference proceedings ICO 2018, Springer, Cham, ISBN: 978-3-030-00978-6. https://www.springer.com/gp/book/9783030009786 2. Intelligent Computing and Optimization. In: Proceedings of the 2nd International Conference on Intelligent Computing and Optimization 2019 (ICO 2019), Springer International Publishing, ISBN: 978-3-030-33585-4. https://www.springer.com/gp/book/ 9783030335847 3. Thorpe, M.H., Hebert, T., Kanade, S.A.: Shafer: vision and navigation for the CarnegieMellon Navlab. IEEE Trans. Pattern Anal. Mach. Intell. 10(3), 362–373 (1988) 4. Pomerleau, D.A.: ALVINN: an Autonomous Land Vehicle in a Neural Network. Technical Report, Carnegie Mellon University, Computer Science Department (1989) 5. Nassu, B.T., Ukai, M.: Automatic recognition of railway signs using sift features. In: Intelligent Vehicles Symposium, pp. 348–354 (2010) 6. Creusen, I.M., Wijnhoven, R.G.J., Herbschleb, E., P.H.N.D.: With: color exploitation in hogbased traffic sign detection. In: IEEE International Conference on Image Processing, pp. 2669–2672 (2010) 7. Duan, J., Viktor, M.: Real time road edges detection and road signs recognition. In: International Conference on Control, Automation and Information Sciences, pp. 107–112 (2015) 8. Intelligent Computing and Optimization. In: Proceedings of the 3rd International Conference on Intelligent Computing and Optimization 2020 (ICO 2020). https://link.springer.com/ book/10.1007/978-3-030-68154-8 9. Gidaris, S., Komodakis, N.: Object detection via a multi-region and semantic segmentationaware CNN model. In: IEEE International Conference on Computer Vision, pp. 1134–1142 (2015) 10. Zeng, X., Ouyang, W., Wang, X.: Multi-stage contextual deep learning for pedestrian detection. In: IEEE International Conference on Computer Vision, pp. 121–128 (2013)
54
S. Gadri and N. E. Adouane
11. Zeng, Y., Xu, X., Shen, D., Fang, Y., Xiao, Z.: Traffic sign recognition using kernel extreme learning machines with deep perceptual features. IEEE Trans. Intell. Transp. Syst. 18(6), 1647–1653 (2017) 12. Bojarski, M., et al.: End to end learning for self-driving cars. arXiv preprint arXiv:1604. 07316 (2016) 13. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems, pp. 1097–1105 (2012) 14. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Comput. Sci. (2014) 15. Szegedy, C., et al.: Going deeper with convolutions. In: Computer Vision and Pattern Recognition, pp. 1–9 (2015) 16. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Computer Vision and Pattern Recognition, pp. 770–778 (2016) 17. Girshick, R.: Fast R-CNN. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) 18. Ren, S., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017) 19. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Computer Vision and Pattern Recognition, pp. 779–788 (2016) 20. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: SSD: Single Shot MultiBox Detector. Springer International Publishing (2016) 21. Dai, J., Li, Y., He, K., Sun, J.: R-fcn: object detection via region-based fully convolutional networks. In: 30th Conf. Neural Info. Proc. Syst (NIPS 2016), Barcelona, Spain (2016) 22. Chen, L., Zhan, W., Tian, W., He, Y., Zou, Q.: Deep integration: a multi-label architecture for road scene recognition. IEEE Trans. Image Process. 2019(28), 4883–4898 (2019) 23. Zhang, L., Li, L., Pan, X., Cao, Z., Chen, Q., Yang, H.: Multi-level ensemble network for scene recognition. Multimed. Tools Appl. 2019(78), 28209–28230 (2019). https://doi.org/ 10.1007/s11042-019-07933-2 24. Sun, N., Li, W., Liu, J., Han, G., Wu, C.: Fusing object semantics and deep appearance features for scene recognition. IEEE Trans. Circuits Syst. Video Technol. 29, 1715–1728 (2019) 25. Parmar, Y., Natarajan, S., Sobha, G.: Deep range – deep-learning-based object detection and ranging in autonomous driving. IET Intell. Trans. Syst. 2019(13), 1256–1264 (2019) 26. Nguyen, T.P., Jeon, J.W.: Wide context learning network for stereo matching. Signal Process. Image Commun. 2019(78), 263–273 (2019) 27. Zou, Q., Jiang, H., Dai, Q., Yue, Y., Chen, L., Wang, Q.: Robust lane detection from continuous driving scenes using deep neural networks. IEEE Trans. Veh. Technol. 2020(69), 41–54 (2020) 28. Ghafoorian, M., Nugteren, C., Baka, N., Booij, O., Hofmann, M.: EL-GAN: embedding loss driven generative adversarial networks for lane detection. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11129, pp. 256–272. Springer, Cham (2019). https://doi.org/10. 1007/978-3-030-11009-3_15 29. Fu, J., et al.: Contextual deconvolution network for semantic segmentation. Pattern Recognit. 101, 107152 (2020)
Optimisation and Prediction of Glucose Production from Oil Palm Trunk via Simultaneous Enzymatic Hydrolysis Chan Mieow Kee1(&), Wang Chan Chin1, Tee Hoe Chun1, and Nurul Adela Bukhari2 1
Centre for Bioprocess Engineering, Faculty of Engineering and the Built Environment, SEGi University, Jalan Teknologi, Kota Damansara, 47810 Petaling Jaya, Selangor Darul Ehsan, Malaysia [email protected] 2 Energy and Environment Unit, Engineering and Processing Research Division, Malaysian Palm Oil Board (MPOB), 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
Abstract. Malaysia is the second-largest palm oil producer in the world. Nevertheless, limited research was found on using oil palm trunk (OPT) for glucose production. The objective of this study is to optimise the glucose production from OPT via simultaneous enzymatic process. Response Surface Methodology (RSM) was adopted to optimise the mass of OPT, stirring speed, and the hydrolysis time for glucose production. All the three parameters were significant with p < 0.001. Quadratic regression model well described the experiment data with predicted R2 = 0.9700 and adjusted R2 = 0.8828. Meanwhile, artificial neuron network (ANN) predicted the data with correlation R = 0.9612 and mean square error = 0.0021. The highest concentration of glucose, 30.1 mmol/L, was produced by using 30 g of OPT, 225 rpm for 16 h at 60 °C. The prediction from both RSM and ANN are comparable and highly accurate. Keywords: Oil palm trunk Starch Optimisation
Simultaneous enzymatic process Glucose
1 Introduction Oil palm industry contributed huge lignocellulosic biomass in Malaysia as we are the second-largest palm oil producer in the world after Indonesia [1] with 19.47 million tonne of palm oil produced in 2020 [2, 3]. Replantation, milling activities produce large amount of biomass, includes oil palm fronds, oil palm trunk (OPT), empty fruit bunches, mesocarp fibers and palm kernel shells. According to [4], 100 g of dried OPT consists of 27 g of starch, which could be converted to glucose which is a high value product. The latest global market analysis released by United States Department of
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 55–64, 2022. https://doi.org/10.1007/978-3-030-93247-3_6
56
C. M. Kee et al.
Agriculture (USDA) indicated that sugar production is forecast up 6 million tons to 186 million. This is responded to the high sugar demand in China and India [5]. There are three commonly used hydrolysis process to produce glucose from starch, which are acid hydrolysis, sequencing enzymatic hydrolysis and simultaneous enzymatic hydrolysis. Conventionally, glucose is produced by using acid hydrolysis which using the strong acids such as hydrochloric acid and sulphuric acid to break down the structure of the starch molecules into disaccharide or monosaccharide [6]. This method is simple, however the formation of undesired byproducts, low yield and high process temperature are the disadvantages of the acid hydrolysis. Recently, acid hydrolysis is replaced by enzymatic hydrolysis, which is environmentally friendly, no-inhibitory byproduct formed due to specificity of enzyme [7]. Starch hydrolysis to produce glucose via sequencing enzymatic hydrolysis is performed by three process which are gelatinization, liquefaction and saccharification processes [8]. Gelatinization process is to weaken the hydrogen bond between the starch molecules for easier the downstream process. After that, starch is degraded to disaccharides by liquefaction process and following the saccharification process to degrade the disaccharides to monosaccharides which using a-amylase and glucoamylase enzymes. Sequencing enzymatic process is two-stage process which converting the glucose by liquefaction following by saccharification process. On another hand, simultaneous enzymatic process is combined liquefaction and saccharification process by mixing the two enzymes and added into starch solution [9]. Simultaneous enzymatic hydrolysis is preferable as the process required single vessel for reaction, reduce residence time and capital cost. To date, studies have been done on producing glucose from cassava starch [10], native rye starch [11] and sago hampas [12] via the enzymatic process. Nevertheless, as a country with abundant of oil palm biomass residues, limited research was found on using OPT as the source of starch for glucose production. Process optimization is important in process improvement to ensures the process design and cost remain cost competitive [13]. Prediction helps engineers to estimate the performance of a process without involve the time consuming and high-cost experimental analysis. Mathematic statistical model such as Response Surface Methodology (RSM) and Artificial Neuron Network (ANN) are adopted by the researchers to develop prediction model by recognises the pattern of data distribution and perform prediction without the explicit, rule-based programming. The objective of this study is to find out the performance of simultaneous enzymatic hydrolysis in glucose production, using a-amylase and glucoamylase. The impact of three factors, namely stirring speed, mass of OPT and hydrolysis time was studied, and the optimum condition was also identified. The collected data was analysed and served as the inputs to develop prediction models via ANN and RSM approach.
Optimisation and Prediction of Glucose Production from Oil Palm Trunk
57
2 Methodology 2.1
Material
The oil palm trunk powder was kindly supplied by Malaysia Oil Palm Board. Potassium iodide, iodine crystals, sodium alginate and calcium chloride were purchased from Merck Sdn Bhd while chitosan powder was purchased from Thermo Fisher Scientific Company. a-Amylase and glucoamylase were purchased from Shaanxi Yuantai Biological Technology and Xi’an Lyphar Biotech Co., Ltd. All the chemicals were used as received. 2.2
Preparation of Immobilized Beads of Enzymes
About 0.2 g of a-amylase and 0.2 g of glucoamylase were immobilized in alginate beads by mixing the enzymes with 0.2 g of sodium alginate in 10 ml of reverse osmosis (RO) water. The mixture was added dropwise into 0.2 M of calcium chloride solution. The beads were allowed to be immersed in the solution for 2 h. The beads were washed thoroughly by using RO water and immersed into a mixture of chitosan and glacial acetic acid for 1 h. After the coating process, the beads were washed with RO water and stored at 4 °C for 24 h for complete solidification [14] before use. 2.3
Extraction of Starch and Enzymatic Hydrolysis
Starch was extracted by heating appropriate amount of OPT in RO water, as shown in Table 1 at 100 °C for 20 min [4]. The mixture was sieved and cooled to room temperature before dispersing the immobilized enzyme into the starch mixture. The presence of starch was determined by using iodine test [15]. The hydrolysis experiment was carried out according to condition in Table 1, at 60 °C. The concentration of glucose was measured by using One Touch Select Simple Glucometer [16]. 2.4
Experimental Design and Statistical Analysis by Response Surface Methodology (RSM)
The three important variables, namely stirring speed (A) ranged from 150 to 300 rpm, mass of OPT as substrate (B), ranged from 5 to 20 g and hydrolysis time (C), ranged from 8 to 24 h; in affecting the enzymatic hydrolysis process were optimized by using Design Expert 11 (Version: 11.1.2.0). RSM-Central composite design (CCD) was applied to find out the interaction between the three parameters on the glucose production via simultaneously enzymatic hydrolysis of OPT. 20 experiments, which consisted of six central points were designed with the three parameters (A, B and C) and the concentration of glucose was adopted as the response.
58
C. M. Kee et al.
Table 1. Experiment design and the respective response in terms of glucose concentration Number of runs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
2.5
Factor 1 A: Stirring speed (rpm) 150 300 150 300 150 300 150 300 98.87 351.13 225 225 225 225 225 225 225 225 225 225
Factor 2 B: Mass of OPT (g) 5 5 20 20 5 5 20 20 12.5 12.5 0 25.11 12.5 12.5 12.5 12.5 12.5 12.5 12.5 12.5
Factor 3 C: Hydrolysis time (h) 8 8 8 8 24 24 24 24 16 16 16 16 2.55 29.45 16 16 16 16 16 16
Response 1 Glucose concentration (mmol/L) 2.1 4.3 9.8 12.9 3.7 5.1 12.4 17.2 5.1 12.5 0 19.8 5.7 10 13.2 13.6 12.8 13.6 13.2 13.2
Glucose Prediction by RSM and Artificial Neuron Network (ANN)
It is desirable to work out a mathematical model which can predict the glucose concentration with inputs considering stirring speed (factor A), mass of OPT (factor B) and hydrolysis time (factor C) as shown in Table 1. However, by examining the correlation between the output response and input factors based on the experimental data set from Table 1, the output response tends to exhibit complex nonlinear relationship with the three independent input variables which is difficult for researchers to identify appropriate trend line equations using conventional spreadsheet tools. Thus, an approach with DOE (Design of Experiments) by RSM and curve fitting capability by ANN regression analysis are common for mathematical modelling by researchers [17].
3 Result and Discussion 3.1
Morphology of Immobilised Enzymes
Figure 1 presented the morphology of pure alginate bead and immobilised enzymes in alginate bead at varied magnifications. It is notable that the surface roughness of alginate beads increased due to the presence of enzymes. This finding is consistent with
Optimisation and Prediction of Glucose Production from Oil Palm Trunk
59
Fig. 1. Cross sectional morphology of (a) pure alginate bead and (b) immobilised enzymes in alginate bead at (i) 50x, (ii) 1000x and (iii) 10kx.
the SEM images published by Kumar et al. [18], where rough surface was observed on the xylanase entrapped-alginate bead. This rough surface also indicated that the enzymes were successfully immobilised on the alginate beads. 3.2
Hydrolysis Performance of Immobilised Enzymes
Result in Table 1 showed that the highest amount of glucose was produced by the immobilised enzymes, when 25.11 g of OPT was used as the substrate to produce starch, under the stirring speed at 225 rpm for 16 h. Since the optimised mass of OPT exceeded the pre-defined range, which was 5–20 g, thus it is reasonable to deduce that the enzymatic activity was yet to be optimised. Additional experiment was conducted by increasing the OPT to 40 g. The result showed that as the mass of OPT increased from 25.11 g to 30 g, the amount of glucose produced was increased from 19.8 to 30.1 mmol/L, as presented in Table 2. However, when the mass of OPT further increased to 40 g, only 22.8 mmol/L of glucose was produced. It could be due to substrate inhibition [19]. Table 2. Hydrolysis performance of immobilised enzyme at high OPT condition. Factor 1 A: Stirring speed (rpm) 225 225 225
Factor 2 B: Mass of OPT (g) 25.11 30 40
Factor 3 C: Hydrolysis time (h) 16 16 16
Response 1 Glucose concentration (mmol/L) 19.8 30.1 22.8
Result in Table 3 showed that all the three factors, namely stirring speed, mass of OPT and time were significant, where p < 0.05 was observed [20]. Additionally, A2, B2 and C2 were also contributed significant impact to the model, as p < 0.0001 was recorded.
60
C. M. Kee et al. Table 3. ANOVA surface response analysis result Source Model A-stirring speed B-mass of OPT C-time AB AC BC A2 B2 C2 Residual Lack of fit Pure error Cor Total
Sum of squares 525.48 41.98 362.90 20.01 2.31 0.1013 2.53 37.34 21.48 54.55 8.44 7.99 0.4533 533.92
df Mean square F-value 9 58.39 69.16 1 41.98 49.73 1 362.90 429.84 1 20.01 23.70 1 2.31 2.74 1 0.1013 0.1199 1 2.53 3.00 1 37.34 44.23 1 21.48 25.44 1 54.55 64.61 10 0.8443 5 1.60 17.62 5 0.0907 19
p-value x1 > 0:0093 > = < > x2 5 0:1404 > x3 > 0:3624 > ; : > 1
Here, the pre-processing procedure of linear conversion from {min ! max} to {0 ! 1} for each parameter is preferred for ANN to enhance performance of regression analysis when tanh and sigmoid activation functions are applied. 3.3.3 Comparison of Fitting Results The predicted output glucose concentration y values from RSM and trained ANN methods are plotted against the experimental data for comparison as shown in Fig. 3. RSM quadratic model fits well the experimental results in satisfying the pre-designed conditions of input factors and output responses on the 6 central points, whereas ANN model has obtained good accuracy on predicting outputs by regression analysis with 4neutron shallow model. A deviation from experimental data with lower y value found for data point 12 in ANN model is due to the data point appears as noise to exist in the lower range cluster instead of higher range cluster after examining the data coordinates.
(a) Curve fitting errors
(b) Predicted vs. Actual results
Fig. 3. Fitting results from RSM and ANN methods. (y: glucose concentration)
Optimisation and Prediction of Glucose Production from Oil Palm Trunk
63
4 Conclusion The ANOVA analysis showed that stirring speed, mass of OPT and hydrolysis time contributed significant impact in glucose production. The highest glucose concentration of 30.1 mmol/L was recorded, when 30 g of OPT was used as substate at stirring speed of 225 rpm for 16 h hydrolysis time, under 60 °C. Equations developed from both RSM, and ANN approached predicted the glucose production well R2 of 0.96 to 0.97. Acknowledgement. The authors are grateful for the support by SEGi University and Malaysian Palm Oil Board (MPOB) for providing the OPT sample.
References 1. Shahbandeh, M.: Palm Oil Export Volume Worldwide 2020/21, by Country: Statista Dossier on the Palm Oil Industry in Malaysia. Statistical Report, Statista (2021) 2. Leslie, C.K.O., et al.: SureSawit™ true-to-type-a high throughput universal single nucleotide polymorphism panel for DNA fingerprinting, purity testing and original verification in oil palm. J. Oil Palm Res. 31, 561–571 (2019) 3. Parveez, G.K.A., et al.: Oil palm economic performance in Malaysia and R&D progress in 2020. J. Oil Palm Res. 33, 181–214 (2021) 4. Eom, I.Y., Yu, J.H., Jung, C.D., Hong, K.S.: Efficient ethanol production from dried oil palm trunk treated by hydrothermolysis and subsequent enzymatic hydrolysis. Biotechnol. Biofuels 83, 1–11 (2015) 5. USDA: Sugar Production Up Globally in 2021/22, Stable in the United States and Mexico: Sugar: World Market and Trade. Technical Report, USDA FDS (2021) 6. Azmi, A.S., Malek, M.I.A., Puad, N.I.M.: A review on acid and enzymatic hydrolyses of sago starch. Int. Food Res. J. 24, 265–273 (2017) 7. Wang, T., Lü, X.: Overcome saccharification barrier: advances in hydrolysis technology. In: Lü, X. (ed.) Advances in 2nd Generation of Bioethanol Production, pp. 137–159. Woodhead Publishing, England (2021) 8. Pervez, S., Aman, A., Iqbal, S., Siddiqui, N.N., Qader, S.A.U.: Saccharification and liquefaction of cassava starch: an alternative source for the production of bioethanol using amylolytic enzymes by double fermentation process. BMC Biotechnol. 14, 49 (2014) 9. Marulanda, V.A., Gutierrez, C.D.B., Alzate, C.A.C.: Thermochemical, biological, biochemical, and hybrid conversion methods of bio-derived molecules into renewable fuels. In: Hosseini, M. (ed.) Advanced Bioprocessing for Alternative Fuels, Biobased Chemicals, and Bioproducts: Technologies and Approaches for Scale-Up and Commercialization, pp. 59– 81. Woodhead Publishing, England (2019) 10. Sumardiono, S., Budiarti, G., Kusmiyati: Conversion of cassava starch to produce glucose and fructose by enzymatic process using microwave heating. In: The 24th Regional Symposium on Chemical Engineering, 01024. MATEC Web Conf., France (2017) 11. Strąk-Graczyk, E., Balcerek, M.: Effect of pre-hydrolysis on simultaneous saccharification and fermentation of native rye starch. Food Bioprocess Technol. 13, 923–936 (2020). https:// doi.org/10.1007/s11947-020-02434-9 12. Husin, H., Ibrahim, M.F., Bahrin, E.K., Abd-Aziz, S.: Simultaneous saccharification and fermentation of sago hampas into biobutanol by Clostridium acetobutylicum ATCC 824. Energy Sci. Eng. 7, 66–75 (2019)
64
C. M. Kee et al.
13. Magnússon, A.F., Al, R., Sin, G.: Development and application of simulation-based methods for engineering optimization under uncertainty. Comput. Aided Chem. Eng. 48, 451–456 (2020) 14. Raghu, S., Pennathur, G.: Enhancing the stability of a carboxylesterase by entrapment in chitosan coated alginate beads. Turk. J. Biol. 42, 307–318 (2018) 15. Elzagheid, M.I.: Laboratory activities to introduce carbohydrates qualitative analysis to college students. World J. Chem. Educ. 6, 82–86 (2018) 16. Philis-Tsimikas, A., Chang, A., Miller, L.: Precision, accuracy, and user acceptance of the one touch select simple blood glucose monitoring system. J. Diabetes Sci. Technol. 5, 1602– 1609 (2011) 17. Betiku, E., Okunsolawo, S.S., Ajala, S.O., Odedele, O.S.: Performance evaluation of artificial neural network coupled with generic algorithm and response surface methodology in modeling and optimization of biodiesel production process parameters from Shea tree (Vitellaria paradoxa) nut butter. Renew. Energy 76, 408–417 (2015) 18. Kumar, S., Haq, I., Yadav, A., Prakash, J., Raj, A.: Immobilization and biochemical properties of purified xylanase from Bacillus amyloliquefaciens SK-3 and its application in kraft pulp biobleaching. J. Clin. Microbiol. 2, 26–34 (2016) 19. Aslanzadeh, S., Ishola, M.M., Richards, T., Taherzadeh, M.J.: An overview of existing individual unit operations. In: Qureshi, N., Hodge, D., Vertes, A. (eds.) Biorefineries: Integrated Biochemical Processes for Liquid Biofuels, pp. 3–36. Woodhead Publishing, England (2019) 20. Betiku, E., Akindolani, O.O., Ismaila, A.R.: Enzymatic hydrolysis optimization of sweet potato (Ipomoea batatas) peel using a statistical approach. Braz. J. Chem. Eng. 30, 467–476 (2013) 21. Li, L.J., Xia, W.J., Ma, G.P., Chen, Y.L., Ma, Y.Y.: A Study on the enzymatic properties and reuse of cellulase immobilized with carbon nanotubes and sodium alginate. AMB Express 9, 112 (2019)
Synthetic Data Augmentation of Cycling Sport Training Datasets Iztok Fister , Grega Vrbanˇciˇc , Vili Podgorelec , and Iztok Fister Jr.(B) University of Maribor, Koroˇska Ul. 43, 2000 Maribor, Slovenia [email protected] Abstract. Planning sport sessions automatically is becoming a very important aspect of improving an athlete’s fitness. So far, many Artificial Intelligence methods have been proposed for planning sport training sessions. These methods depend largely on test data, where Machine Learning models are built, yet evaluated later. However, one of the biggest concerns of Machine Learning is dealing with data that are not present in the training dataset, but are unavoidable for predicting the further improvements. This also represents a bottleneck in the domain of Sport Training, where algorithms can hardly predict the future training sessions that are compiled with the attributes of features presented in a training dataset. Usually, this results in an under-trained trainee. In this paper we look on this problem, and propose a novel method for synthetic data augmentation applied on the original dataset. Keywords: Synthetic data augmentation TRIMP
1
· Cycling sport training ·
Introduction
The performance of Machine Learning (ML) approaches, methods and techniques is largely dependent on the characteristics of the target dataset, which consists of numerous training samples. Such datasets, that contain a higher number of samples, which are also more diverse, in general, offer more expressive power to the learning algorithms applied in various ML models, e.g. classification, regression. When the datasets are not too diverse or the samples in datasets are not representing all possible features in the real world, algorithms may suffer in their ability to deal with their tasks, and, consequently, cannot deliver the optimal outcome for a particular classification/regression problem. That behavior can also be pointed out as a drawback of ML [14]. The lack of diverse datasets and small number of samples in datasets in the field of Data Science is, nowadays, commonly addressed using various data augmentation methods. These methods are able to increase the amount of data by either adding slightly modified copies of already existing data, or creating new synthetic data from existing ones. Beside the primary goal, the data augmentation techniques also attempt to increase the generalization abilities of ML models by reducing the overfitting and expanding the decision boundary of the c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 65–74, 2022. https://doi.org/10.1007/978-3-030-93247-3_7
66
I. Fister et al.
model [12,21]. There are many techniques for increasing the generalization of ML models, such as dropout [23], batch normalization [11] or transfer learning [19,25]. However, the data augmentation techniques address the mentioned problem from the root, which is the training dataset. While for certain ML problems, such as image classification, the established practices of data augmentation exist, this is not the case when dealing with time-series datasets or sports datasets, which are commonly in structured form, with features extracted from time-series datasets. With this specific of sports datasets in mind, the common existing approaches to data augmentation would not result in an expected performance improvement of the predictive ML model. Therefore, the need arises for a domain-specific method for synthetic data augmentation. Sport is becoming a very interesting venue for ML researchers, where they are confronted with building ML models based on past training data in order to plan future sport training sessions for a particular person. Several solutions exist in the literature [6,20,22]. Artificial Sport Trainer (AST) [5] is an example of a more complex solution for assisting athletes with automatic planning of sport training sessions. Besides the planning, AST is also able to cover the other phases of sport training, i.e. realization, control, and evaluation. The main weakness of the AST is that it is unable to identify training sessions with intensities beyond those presented in an archive. This paper introduces a synthetic data augmentation method for generating uncommon sport training sessions. This means that the method is appropriate to prescribe training sessions which ensure that athletes in training will improve their fitness further. The method consists of more steps, in which features need to be identified first. Then, the interdependence among these are captured using the new Training Stress Measure (TSM). This metric serves for recognizing the most intensive training sessions, serving as a basis for generation of the uncommon training session using synthetic data augmentation. Finally, the set of uncommon training sessions enrich the existing archive. The proposed method was applied on an archive of sport training sessions generated for an amateur cyclist by the SportyDataGen generator [7]. The results showed that the method can also be used for enriching the archive of the sport training sessions in practice. The contributions of this paper are: – to propose a method to augment the training dataset of existing sport activities with synthetic data augmentation, – to evaluate the method on data suitable for an amateur cyclist.
2
Problem Statement
Two subjects are important for understanding the subjects that follow: – data augmentation, – basics of cycling training. The former highlights the principles of data augmentation, with emphasis on the synthetic data augmentation, while the latter reveals the basics of cycling sport training.
Synthetic Data Augmentation of Cycling Sport Training Datasets
2.1
67
Data Augmentation
In general, the term data augmentation covers various strategies, approaches and techniques which are trying to increase the size of the training dataset artificially, or to make the utilized dataset more diverse in order to represent the real life distribution of the training data better. For the purpose of tackling image classification tasks, the data augmentation is an already established practice, especially when utilizing deep Convolutional Neural Networks (CNN) [17]. Dating back to 2009, when one of the first well-known CNN architectures, AlexNet [16], was presented, which was utilizing techniques for cropping, mirroring, and color augmentation of training datasets in order to improve the generalization capabilities of the trained ML model, as well as reducing the problem of overfitting [12]. The approaches and techniques for augmentation of images could be divided in two categories: classical image data augmentation also known as basic data augmentation and deep learning data augmentation [15]. The first group of approaches consists primarily of techniques which are manipulating the existing training dataset with geometric transformations and photometric shifting. The geometric transformation techniques include flipping images horizontally or vertically, rotating, shearing, cropping, and translating the images, while the photometric techniques cover color space shifting, adding image filters and introducing the noise to the images [15]. The more advanced approach to image data augmentation is with utilization of deep learning. Such approaches could be split into three groups: generative adversarial networks [10], neural style transfer [13], and meta metric learning [8]. The last group includes the techniques which are utilizing the concept of utilization of a neural network to optimize neural networks. Some of the representatives of this kind of data augmentation techniques are neural augmentation [24], auto augmentation [4], and smart augmentation [18]. While the approaches for the image data augmentation are well explored, and are, in general, a part of a standard procedure, this is not yet the case for the augmentation for time-series datasets. However, the researches in this field have been gaining new momentum in recent years. Similar to the image data augmentation techniques, some of the common approaches for augmentation of time-series datasets are based on random transformations of the training data, such as adding noise, scaling or cropping. However, the problem with utilization of such approaches when dealing with time-series datasets is that there is a diverse amount of time-series which each have different features, and not every transformation is therefore suitable for application on every dataset. Based on those specifics of time-series datasets, alternative approaches were developed, such as synthesis of time-series. The group of synthesis based approaches includes a number of different methods and techniques, from pattern mixing, generative models, and decomposition methods [12]. While the presented approaches are proven to work well on specific target tasks, the sports data are somewhat specific in terms of data augmentation. Since, in general, such datasets are commonly derived from the recorded training sessions of different athletes, which are in fact time-series datasets, one could try
68
I. Fister et al.
to utilize the known techniques for time-series dataset augmentation and extract the features afterwards. However, such an approach would not be able to provide us with the data inferred from the training sessions beyond those presented in the original dataset. In other words, with such an approach, we could not augment the original dataset in such a manner that we would obtain features for more uncommon sport training sessions, but could only obtain a slightly more diverse variation of already present training sessions. 2.2
Basics of Cycling Training
The goal of sport training is to achieve a competitive performance for an athlete that is typically tested in competitions. The process is directed by some principles, where the most important are the following two: The principle of progressive load and the principle of reversibility [9]. According to the first, only a steadily increasing the amount of training can lead to an increase in an athlete’s fitness, while, according to the second, the fitness is lost by the athlete’s inactivity, rest, or recovery. The athlete’s body senses the increased amount of training as a physical stress. When this stress is not too high, fatigues are experienced by the athlete after the hard training. These can be overcome by introducing the rest phase into the process or performing the less intensive workouts. On the other hand, if the stress is too high, we can talk about overtraining. Consequently, this demands the recovery phase, in which the athlete usually needs the help of medical services to recover. The amount of training typically prescribed by a sport trainer, is determined by a training load. The training load specifies the volume of training and the intensity at which it must be realized. Thus, the volume is simply a product of duration and frequency. With these concepts in mind, the training plan can be expressed mathematically as a triple: Training plan = Duration, Intensity, Frequency.
(1)
Duration denotes the amount of the training. Typically, the cycling training is specified by a duration in minutes. Intensity can be determined by various measurements, but mostly the following two are accepted in cycling: Heart Rate (HR) and Power (PWR). Although the PWR presents accurate intensity measurements [1], this study focuses on the HR due to a lack of experimental data including it. Based on the so-called Functional Threshold Heart Rate (FTHR), specific for each individual athlete, the HR intensity zones are defined that help trainers to simplify the planning process. Let us mention that the FTHR is an approximation of the maximum HR (max HR) denoting an Anaerobic Threshold (AnT), where a lactate drop in the blood started to accumulate in the body due to insufficient oxygen intake [3]. The frequency is a response to the question of how many times the training is performed during the same period. An example of the HR intensity zones suitable for an athlete with FTHR = 180 (40 year old athlete) is presented in Table 1, from which it can be seen that there are seven HR zones with their corresponding names. The column
Synthetic Data Augmentation of Cycling Sport Training Datasets
69
Table 1. HR zones suitable for a 40-year-old athlete with FTHR = 180. HR zone Name
% FTHR FTHR
Zone-1
Active recovery
92% to > 99%. After that, the testing accuracy is retained this constant value up to fold, k = 10. In case of training loss, the loss is decreased gradually from fold, k = 1 to k = 4 with value < 0.00025 to almost zero. After fold, k = 4, the model shows a superior performance in multi-classification task.
84
G. J. Ferdous et al. Table 4. Confusion matrices of three tumor classes. Tumor Type Samples TP Meningioma 708 698 Pituitary 930 927 Glioma 1426 1423
TN FP FN 2353 10 3 2129 3 5 1630 3 8
Table 5. Performance evaluation matrices of proposed model. Tumor classes Classification report Precision (%) Recall (%) F1-Score (%) Meningioma 99.46 97.68 98.08 Pituitary 99.57 99.59 99.57 Glioma 98.44 99.79 99.61
Fig. 5. Accuracy and loss plot in training and testing stage with the variation of number of folds for the proposed hybrid pooling classifier.
Fig. 6. Training time and testing accuracy plot of single and hybrid pooling based classifier.
Hybrid Pooling Based Convolutional Neural Network
85
The time requirement during the training stage is also investigated for the proposed classifier using only single or hybrid pooling methods. From Fig. 6 it is shown that the hybrid pooling based classifier requires an additional 0.19 min training time from max pooling and 0.22 min from average pooling based classifiers. Despite the requirement of additional training time, the classification accuracy of the proposed hybrid pooling model is increased by 0.56% and 0.49% from the max and average pooling based models respectively.
4 Conclusion In this paper, a simple brain tumor classification network is developed that utilizes a hybrid pooling layer to reduce the dimensionality of the feature maps without discarding the pixels in the tumor region. To retain the tumor features without any distortion, the pixel-wise summation of the max-pooled and average pooled images is performed. The experimental results show that the proposed hybrid pooling based CNN classifier outperformed single pooling based models. However, the equal probability, b = 0.7 of mixing maximum and average pooled features to construct hybrid pooled features provides superior classification accuracy on the brain tumor dataset.
References 1. Sung, H., Ferlay, J., Siegel, R.L., Laversanne, M., Soerjomataram, I., Jemal, A., Bray, F.: Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: A Cancer J. Clin. 71(3), 209–249 (2021). https://doi. org/10.3322/caac.21660 2. Sultan, H.H., Salem, N.M., Al-Atabany, W.: Multi-classification of brain tumor images using deep neural network. IEEE Access 7, 69215–69225 (2019) 3. Anaraki, A.K., Ayati, M., Kazemi, F.: Magnetic resonance imaging-based brain tumor grades classification and grading via convolutional neural networks and genetic algorithms. Biocybern. Biomed. Eng. 39(1), 63–74 (2019) 4. Das, S., Aranya, O.R.R., Labiba, N N.: Brain tumor classification using convolutional neural network. In: 2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT), pp. 1–5, IEEE (2019) 5. Mzoughi, H., et al.: Deep multi-scale 3D convolutional neural network (CNN) for MRI gliomas brain tumor classification. J. Digit. Imaging 33, 903–915 (2020) 6. Paul, J.S., Plassard, A.J., Landman, B.A., Fabbri, D.: Deep learning for brain tumor classification. In: Medical Imaging 2017: Biomedical Applications in Molecular, Structural, and Functional Imaging, vol. 10137, p. 1013710, nternational Society for Optics and Photonics (2017) 7. Cheng, J.: Brain tumor dataset (version 5) (2017). https://doi.org/10.6084/m9.figshare. 1512427.v5 8. Nasrabadi, N.M: Pattern recognition and machine learning. J. Electron. Imag. 16(4), 049901 (2007) 9. Chen, J., Hua, Z., Wang, J., Cheng, S.: A convolutional neural network with dynamic correlation pooling. In: 2017 13th International Conference on Computational Intelligence and Security (CIS), pp. 496–499, IEEE (2017)
86
G. J. Ferdous et al.
10. Tong, Z., Tanaka, G.: Hybrid pooling for enhancement of generalization ability in deep convolutional neural networks. Neurocomputing 333, 76–85 (2019) 11. Tong, Z., Aihara, K., Tanaka, G.: A hybrid pooling method for convolutional neural networks. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds.) ICONIP 2016. LNCS, vol. 9948, pp. 454–461. Springer, Cham (2016). https://doi.org/10.1007/9783-319-46672-9_51
Importance of Fuzzy Logic in Traffic and Transportation Engineering Aditya Singh(&) School of Civil Engineering, Lovely Professional University, Phagwara, India [email protected] Abstract. In this paper, the way traffic problems can be minimized with the help of fuzzy logic, which is a part of artificial intelligence, is discussed. It also highlighted the issues with the conventional traffic system practiced in an Indian state. It talked about the various benefits of using advanced traffic systems including fuzzy logic along with its numerous advantages as well as disadvantages. It also highlighted the present scenario of accidents and fatalities in an Indian state following the old conventional system of traffic. Keywords: Flexible
Inaccurate Fuzzification Gaussian Defuzzification
1 Introduction When things are ambiguous or unclear, the word fuzzy is used. In physical world, most of the times people face situations which are indeterminate to be classified as false or true, in those cases fuzzy logic is beneficial for people because its flexible reasoning to those problems. Through this people are able to take into consideration of uncertain as well as inaccurate problems faced in any circumstances. When Boolean systems are discussed, then the completely true and completely false conditions are considered as well as they are represented as 1 and 0 values respectively [1]. However, in the case of fuzzy systems, there is absence of logic for completely true and completely false conditions. Nonetheless they provide logic for an intermediate state between completely true and completely false conditions, which can be partly true as well as partly false. The algorithm based on fuzzy logic assist in solving different problems after taking consideration of all the data accessible. Later on, it takes the most appropriate decision based on the obtained input. The fuzzy logic approach basically impersonates the ways a person makes a decision after considering all the options available between complete true and complete false [2]. 1.1
What is the Need to Use Fuzzy Logic?
Fuzzy logic is a part of artificial intelligence, which is nowadays a popular technology in today’s world. In the conventional traffic system, the technology is already outdated in the present era. The conventional traffic system is very basic form of traffic system and with increase in road traffic, number of automobiles, road accidents and road fatalities, the conventional traffic system is unable to cope up with them, and in the future it will be more difficult for it to handle them. Hence, the better and improved © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 87–96, 2022. https://doi.org/10.1007/978-3-030-93247-3_10
88
A. Singh
approach is using fuzzy logic and automation in traffic system of a place to cope up with the increasing traffic problems. There are some major features of fuzzy logic discussed below [2]: • Fuzzy logic provides flexibility as well as smooth application of method corresponding to machine learning. • They are the most appropriate technique to apply reasons in doubtful as well as inaccurate situations. • They assist in impersonating the human logic present in a person’s thought process. • They permit a person to create nonlinear functions with random difficulties. • They may have 2 values representing 2 likely sols to exist. • They see inferences as a procedure of transmitting flexible limitations. • They should be made with the assistance of specialists that can absolutely guide a person throughout the process. With the incorporation of fuzzy logic and modernizing the conventional traffic system, it is expected to reduce the number of traffic accidents and road fatalities in a place considerably. 1.2
Objectives
The major objectives of the study is given in the following: • • • • •
To To To To To
understand the traffic conditions in a place. observe the present traffic accidents and road fatalities. modernize the conventional system of traffic in a place. reduce the traffic accidents and road fatalities in a given place. improve the traffic system and make people secure on roads.
2 Motivation and Reason for Choosing Goa as the Study Area The author chose Goa as the state of Goa as the study area because high fatality rates in the state, which was also reported by Gururaj in 2005, putting the state in the top 3 for maximum rate of fatality on roads per one hundred thousand people [3]. The state has 224 km of NHs, 232 km of SHs and 815 km of DHs in length and an area of over 3700 km2, making it well connected through road network. It is surrounded by Western Ghats, making the terrain rolling in nature and Arabian Sea. Only parts of the state is having plain terrain. Being a tourist paradise it is having mostly visited by domestic and international tourists. Its current population is around 1.5 million people with almost the same amount of vehicles [4]. The state is having 69% of 2 wheelers and over 20% cars in the total vehicle population [5]. Having good quality roads and highways, people often drive rashly. Being a tropical state and having around 300 cm average annual rainfall, many accidents occur due to skidding of vehicles during rainy seasons [4]. In many intersections, there is either absence of traffic lights or the conventional
Importance of Fuzzy Logic in Traffic and Transportation Engineering
89
system of traffic is being practiced in Goa. According to author, this can be solved up to a certain extent with the incorporation of fuzzy systems and automation in the traffic system of the state.
3 History of Fuzzy Logic From the third decade of 20th century people started working on the concept of fuzzy logic and in the year 1965, first time a word fuzzy logic was used by a professor named Lotfi Zadeh in UC Berkeley in California. Zadeh found that the existing computer logic was incapable of modifying data related to individual or indistinct thoughts of a human being. This algorithm is still implemented in numerous areas, including artificial intelligence and control theory. It was created to permit the computer to find the peculiarities between the data which is partly true and partly false. Somewhat comparable to the reasoning process of a person. This includes some brightness and little dark and so on [2].
4 Literature Review Atakan et al. (2021) worked on fuzzy logic to create strategies in order to control the timing of traffic signals. They found that there proposed model is better and effective than the conventional traffic signals [6]. Tohan and Wong (2021) worked on fuzzy logic in order to quantify the problem of congestion in traffic in a given place. They found that there method of solving the problem of traffic congestion is effective than the existing methods [7]. Kheder and Rukaibi (2021) worked on fuzzy logic in order to improve the safety of pedestrians along with improving the walkability as well as traffic flow. Their proposed model was found to be effective and practical [8]. Wu et al., (2020) worked on fuzzy logic to create vibrant system to make decisions to assist in a smart scheme for the purpose of navigation to be implemented on road traffic. Their proposed method helped in improving navigation systems [9]. Tomar et al. (2018) worked on fuzzy logic along with the regression of logistics in order to manage traffic in a given place. Their proposed model was found to be more efficient than the existing ones [10]. Rout et al. (2020) worked on fuzzy logic and Internet of things to provide route for emergency automobiles which can assist in smart cities. Their proposed model was able save time and to provide the most efficient route for such automobiles without wasting unnecessary time [11]. Jovanovic et al. (2021) worked on fuzzy logic in order to control diverged interchange particularly the diamond one which is already crossed its saturation point and the system of ramp meter. They found that their model was more practical to solve the existing problem [12]. Alemneh et al. (2020) worked on fuzzy logic to improve the safety of pedestrians on roads. They claimed that their proposed method is effective in ensuring the pedestrian’s safety issues and better than the existing system [13].
90
A. Singh
Kheder and Almutairi (2021) studied fuzzy system in order to use Nero-fuzzy inference systems to reduce the problem of noise pollution and create a model in Arabian conditions. They stated that their proposed traffic noise model is better than the other such models and most suitable one for Arabian conditions [14]. Komsiyah and Desvania (2021) used fuzzy inference system to analyse traffic signals and create a simulation for a three way intersection. They found that their proposed model performs way better than the existing conventional one [15]. Abbasi et al. (2021) created a vehicle weighting model FWDP, with the assistance of fuzzy logic. This will further prioritize the data in ad hoc net of an automobile. They stated that their proposed model is more useful and efficient in solving the existing problem [16].
5 Main Focus of the Paper Along with Issues and Problems The paper mainly focused on fuzzy logic which is a part of artificial intelligence to be incorporated in the traffic systems of a place and to modernize it. It highlights the need of implementing fuzzy systems and automation of conventional traffic systems in the state of Goa. The major issue observed by the author in the state of Goa was the rash driving of people on roads and highways mostly because of taking undue advantage of good quality roads in the state. Since, the state also attracts a big number of tourists of various types, its necessary to monitor and control their actions. The papers aim is to improve the system of traffic in the state of Goa and to reduce the number of road accidents as well as fatalities on roads and highways to a negligible value.
6 Architecture of the System of Fuzzy Logic The architecture of fuzzy logic consist of four major parts, which are following [2]: • • • •
Rule Base Fuzzification Inference Engine Defuzzification
Rule Base Rule Base consists of all the instructions and rules available along with if condition as well as then condition delivered by the specialists in order to administer the system of decision making, based on language info. The latest advancements in the theory of fuzzy logic proposes numerous operative techniques to assist in the designing of fuzzy controllers as well to change them. With these improvements and advances, the fuzzy rules are reduced considerably. Fuzzification They are utilized to transform inputs into fuzzy sets, including crisp numbers. These inputs like crisp numbers are mainly the same inputs taken by sensors and then further it was handed over to control system to process them, including pressure, temperature, and rpm and so on.
Importance of Fuzzy Logic in Traffic and Transportation Engineering
91
Inference Engine They determine how much the present fuzzy input matches based on every single rule and then it takes decision on to dismiss the number of rules after taking consideration of the field of input. Further the dismissed rules are collected together in order to create control action. Defuzzification They are utilized to transform the fuzzy sets attained with the help of inference engine into crisp values. They are the opposite of fuzzification process. Many techniques of defuzzification exists and the most appropriate one is utilized with a particular specialist systems to decrease the rate of errors.
7 Methodology The input value can be taken and send for fuzzification, the distance, crowd and time of driving can be sent fuzzy set and then for fuzzification. From fuzzification, it can be sent to rule base and inference engine, traffic personnel’s discussion and experience of driver can be sent to rule base and inference engine. Then from rule base and inference engine, it can be sent for defuzzification, output MF, speed along with brake alerts can be sent to defuzzification to convert them into crisp values. Finally sent for output crisp values [17].
Input
Fuzzificaon
Rule Base and Inference Engine
Defuzzificaon
Output Fig. 1. Flow Chart of Fuzzy Logic
8 Membership Function in Fuzzy Logic The graphs that can describes the ways every point in the spaces of input are charted to the values of membership in the range of 0–1. These input spaces are generally called as universal set (u) which consist of all the likely concerned components in every specific applications [1]. There are some major kinds of fuzzifiers mentioned below: • Singleton • Gaussian • Triangular or Trapezoidal
92
A. Singh
9 Fuzzy Control The term fuzzy control can be described by the following points [1] (Tables 1 and 2): • This is a method to symbolize the thinking of a person ways incorporating in a control system. • They can imitate logical thought process of human beings, meaning the way a person is able to draw conclusion from the known things. • They are not created in order to deliver reasons with accuracy but reasoning process which are satisfactory. • All the situations where it is uncertain about numerous things can be solved with the assistance of fuzzy logic. Table 1. Showing the comparison between efficiency of two major traffic data collection methods [5]. Efficiency of traffic data collection methods Minimum value (%) Maximum value (%) Traditional method 70 95 AI method 95 100
Table 2. Showing increase of the population of vehicles over the years [5]. Year Vehicle population in Goa (lakhs) 2014 10.09 2018 14.10
10 Applications of the System of Fuzzy Logic There are major applications of fuzzy logic that are discussed below [1]: • They are utilized in the field of aerospace engineering where satellites as well as spacecraft are required in controlling their altitude. • In large multinational companies, they use fuzzy logic to individually evaluate their businesses as well as in their support system responsible for making decisions. • They are used to control and handle traffic and speed control of vehicles in motorized system of transportation. • They are utilized with neural network as it is able to impersonate the way a human makes decision at a faster rate. This is accomplished by combining the data and by transforming the data into significant ones by creating partly true and partly false conditions as part of fuzzy sets. • They are widely utilized in the current control systems like expert system. • They are utilized in normal linguistic processing as well as in numerous rigorous applications involving AI. • In the chemical industries, they are utilized in the process of chemical distillation, drying and pH control.
Importance of Fuzzy Logic in Traffic and Transportation Engineering
93
11 Result and Discussion In this section there are several graphs explaining the difference in the traditional system of collection of traffic data and artificial intelligence system of collection of traffic data, proving the latter to be superior in the field of traffic and transportation engineering, implying to use several techniques of artificial intelligence to be used in order to improve traffic system. It also highlights the increase in the number of vehicles drastically over the years, making conventional traffic system to be outdated more and more with increase in time. Hence, there is a need of fuzzy system and other techniques of AI to be used.
Efficiency of data collecon methods 100
95
80
95
100
70
60 40 20 0 IN (%)
TRADITIONAL METHOD Minimum
AI METHOD
Maximum
Fig. 2. The above graph is plotted on the data provided by Goodvision.
In the Fig. 2, it is clear that the traditional system of traffic data collection is not much efficient and it can be said that the traditional system of traffic will be outdated more and more in the future. It is also evident the efficiency of data collection with the assistance of artificial intelligence data collection is significantly high, which implies that automation and numerous methods of AI including fuzzy logic can be used in the state of Goa.
94
A. Singh
Vehicle Populaon in Goa 14.1 15
10.09
10 5 0 In Lakhs Year 2014
Year 2018
Fig. 3. The above graph is plotted by the data provided by the Goa government traffic department.
With the help of Fig. 3, it is evident that the number of vehicles in the state of Goa is increasing over the years greatly, which also implies that this trend will continue in the future. Hence, fuzzy system and automation of traditional traffic systems in the state will help in controlling and managing the traffic.
12 Major Advantages of Fuzzy Logic There are major advantages of system of fuzzy logic highlighted below: • The above mentioned system can work with all kinds of input including inaccurate, unclear or disturbed input info. • Since the algorithm can be defined with small amount of data, so small amount of memory is needed. • It is brought with set theory, which is a concept of the subject maths and the reasoning in not at all complex or difficult. • Its design is very simple and easy to understand. • It delivers effective sols to difficult problems in all the areas as it is similar to how a person reason and takes decision, but at a faster rate. • It can help in managing and controlling large volumes of traffic. • It can assist in making new traffic policies in order to reduce accidents. • It is broadly used for the purpose of practical as well as commercial in different fields of study.
Importance of Fuzzy Logic in Traffic and Transportation Engineering
95
• Its reasoning is inaccurate but satisfactory. • If used artificial intelligence, then it assist a humans in controlling products meant for consumers as well as machines.
13 Major Disadvantages of Fuzzy Logic There are some major disadvantages of the system of fuzzy logic which are mentioned below: • It is ambiguous as different researchers uses dissimilar methods to solve a given problem. Hence, there is no systematic way to solve a given problem. • Since it works on both inaccurate and accurate data, so it generally compromises accurateness. • Evidence of its features are hard to get in many situations as not all the time humans are unable to get a scientific explanation.
14 Conclusion This paper only focuses on the use of fuzzy system and why it is needed to control and manage traffic system in the state of Goa. There are other techniques of artificial intelligence which is also necessary to be applied in the conventional traffic system in the state of Goa, but they are currently out of the scope of this paper. The fuzzy logic is a part of artificial intelligence, which should be applied in the state of Goa. Figure 1 described the basic working of fuzzy logic and Fig. 2 shows the efficiency of AI methods in traffic data collection method is much better than the traditional one. As it was observed in the Fig. 3, that the number of automobiles is increasing rapidly over the years. It is evident that the trend in the future will continue. This will be a bigger problem in the future to control traffic in the state of Goa, hence fuzzy system and other techniques of artificial intelligence are needed to be implemented in the system of traffic. This will further help in decreasing the road accidents, fatalities up to a bare minimum and in controlling as well as managing the road traffic. This will also help in improving the traffic safety and traffic system, along with making people secure on roads and highways in the state of Goa. There are some major disadvantages of fuzzy logic which can’t be ignored, but with enough precautions they can be minimized. In some cases where fuzzy logic is not required, there is a need to avoid using them. For the time being fuzzy logic can be used in the traffic system of the state of Goa, with more advances in the technology, it might be possible to implement better techniques in the system of traffic in the state of Goa and to modify it. The research is still going on in the field of artificial intelligence and its various techniques. It is a time consuming process, but with future research it is evident that the system of traffic can be improved significantly.
96
A. Singh
References 1. 2. 3. 4. 5. 6. 7. 8. 9.
10. 11.
12.
13.
14.
15.
16. 17. 18.
GeeksforGeeks. https://www.geeksforgeeks.org/fuzzy-logic-introduction/amp/ Guru99. https://www.guru99.com/what-is-fuzzy-logic.html Traffic Collisions in India. https://en.m.wikipedia.org/wiki/Traffic_collisions_in_India Goa. https://en.m.wikipedia.org/wiki/Goa Analysis of M.V. Accidents in Goa. https://www.goatransport.gov.in/roadsafety (2018) Tunc, I., Yesilyurt, A.Y., Soylmez, M.T.: Different fuzzy logic control strategies for traffic signal timing control with state inputs. IFAC-PapersOnLine 54(2), 265–270 (2021) Tohan, T.D., Wong, Y.D.: Fuzzy logic-based methodology for quantification of traffic congestion. Physica A 570, 125784 (2021) Kheder, S., Al Rukaibi, F.: Enhancing pedestrian safety, walkability and traffic flow with fuzzy logic. Sci Total Environ. 701, 134454 (2020) Wu, B., Cheng, T., Yip, T.L., Wang, Y.: Fuzzy logic based dynamic decision-making system for intelligent navigation strategy within inland traffic separation schemes. Ocean Eng. 197, 106909 (2020) Tomar, S., Singh, M., Sharma, G., Arya, K.V.: Traffic management using logistic regression with fuzzy logic. Procedia Comput. Sci. 132, 451–460 (2018) Rout, R.R., Vemireddy, S., Raul, S.K., Somayajulu, D.V.L.N.: Fuzzy logic-based emergency vehicle routing: an IoT system development for smart city applications. Comput. Electr. Eng. 88, 106839 (2020) Jovanovic, A., Kukik, K., Stevanovic, A.: A fuzzy logic simulation model for controlling an oversaturated diverge diamond interchange and ramp metering system. Math. Comput. Simul. 182, 165–181 (2021) Alemneh, E., Senouchi, S.-M., Messous, M.-A.: An energy-efficient adaptive beaconing rate management for pedestrian safety: a fuzzy logic-based approach. Pervasive Mobile Comput. 69, 101285 (2020) AlKheder, S., Almutairi, R.: Roadway traffic noise modelling in the hot hyper-arid Arabian Gulf region using adaptive neuro-fuzzy interference system. Transport. Res. Part D: Transport Environ. 97, 102917 (2021) Komsiyah, S., Desvania, E.: Traffic lights analysis and simulation using fuzzy inference system of Mamdani on three-signaled intersections. Procedia Comput. Sci. 179, 268–280 (2021) Abbasi, F., Zarei, M., Rahmani, A.M.: FWDP: a fuzzy logic-based vehicle weighting model for data prioritization in vehicular ad hoc networks. Veh. Commun. 100413 (2021) Gupta, R., Chaudhari, O.K.: Application of fuzzy logic in prevention of road accidents using multi criteria decision alert. Curr. J. Appl. Sci. Technol. 39(36), 51–61 (2020) GoodVision. https://www.walterpmoore.com/traffic-studies
A Fuzzy Based Clustering Approach to Prolong the Network Lifetime in Wireless Sensor Networks Enaam A. Al-Hussain(&) and Ghaida A. Al-Suhail Department of Computer Engineering, University of Basrah, Basrah, Iraq [email protected]
Abstract. The selection of cluster heads (CHs) in wireless sensor networks (WSNs) is still a crucial issue to reduce the consumed energy in each node and increase the network lifetime. Therefore, in this paper an energy-efficient modified LEACH protocol based on the fuzzy logic controller (FLC) is suggested to find the optimal number of CHs. The fuzzy chance is combined with the probability of CH selection in LEACH to produce a new selection criterion. The FLC system depends on two inputs of the residual energy of each node and the node distance from the base station (sink node). Accordingly, the modified clustering protocol can improve the network lifetime, decrease the consumed energy, and send more information than the original LEACH protocol. The proposed scheme is implemented using the Castalia simulator integrated with OMNET++, and the simulation results indicate that the suggested modified LEACH protocol achieves better energy consumption and network lifetime than utilizing the traditional LEACH. Keywords: Cluster head FIS LEACH OMNET++ Wireless sensor networks
Network lifetime Castalia
1 Introduction Nowadays, the tremendous advancement of sensor equipment technology contributes to huge implementation capabilities in many fields, such as underwater monitoring, health monitoring, smart infrastructure monitoring, multimedia surveillance, Internet of Things (IoT), and other fields of use. Among these, in a targeted area environment, sensor devices are often distributed randomly over settings that can dynamically change. Information from such nodes may be sensed, processed, and sent to adjacent nodes and base station (BS). However, these sensors have many restricted features, such as limited memory, low computing, low processing, and most importantly, low power. As sensor nodes have limited resources, the clustering process mechanism is favored as an energy-efficient technique in WSNs. When networking is restricted to a few nodes, the strategy can conserve network energy. It would effectively extend the network’s lifespan by minimizing the consumed energy using multi-hop transmission and data aggregation [1–5]. In particular, Low Energy Adaptive Clustering Hierarchy (LEACH) protocol is one of the most well-known. protocol [6], which depend on © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 97–107, 2022. https://doi.org/10.1007/978-3-030-93247-3_11
98
E. A. Al-Hussain and G. A. Al-Suhail
adaptive clustering to utilize energy consumption. It can be considered as a benchmark of clustering routing protocol in WSNs and MANETs where the SNs in the network field are separated into clusters. Each cluster has one sensor node referred to as the leader node (or CH) and it is selected randomly. On the contrary, though energy from sensor nodes is retained by LEACH, its energy efficiency is still somewhat disadvantaged because of random, faster power drainage, particularly where smaller nodes per cluster are induced by the unequal distribution of nodes in clusters and time limit due to the use of the TDMA MAC Protocol [6–8]. To avoid the random selection of the CHs, and to find the optimum number of selected CHs and solve the complexities in the relation between the network lifetime and the other parameters of the sensor nodes, many approaches have been developed such as (i) Fuzzy Inference System (FIS) [9] (ii) Adaptive Neural-Network Fuzzy-System [10] (iii) Metaheuristic Intelligent Algorithms like swarm algorithms ABC and ACO, and flower-pollination [11–13]. Therefore, a new modified LEACH protocol via the fuzzy logic controller is obtainable in this paper, which efficiently improve the network lifespan and decrease the number of dead nodes during its rounds. The modified LEACH protocol aims to select the CHs based upon the Type1-Fuzzy Inference Method (T1-FIS). The CHs are chosen by considering two parameters (i) residual energy (REN) and (ii) the node’s distance from the base station (DBS) based on the threshold significance. The rest of this paper will be structured as follows. Firstly, the related works are addressed in Sect. 2. In Sect. 3 the Modified LEACH protocol is described in detail. Section 4 displays and discusses the simulation results. Finally, in Sect. 5, the conclusion has been drawn.
2 Related Works LEACH protocol is a routing protocol that utilizes a clustering technique to create random, adaptive, and self-configured clusters. In LEACH, all sensor nodes are combined into clusters, each of which has a Leader Node (or CH) who handles the TDMA schedule and send out the aggregated information to the BS. Since only CH sends the data to the BS, the network’s energy consumption is significantly reduced. In each round of the LEACH protocol, the CH is elected at random, and the probability of being CH is proportional to the number of nodes. After several rounds, the chance of a low-and high-energy sensor node being as CH is the same, which contributes to an energy imbalance in CHs that exists in the whole structure; and consequently, the lifetime of the network is reduced [1, 2]. Though, to improve the LEACH protocol, and due to the complexity problem in the description of the relation between the network lifetime and the other parameters of the nodes, a FIS is one of the most well-known intelligent schemes that can be nominated to solve such problem. The reasons beyond are; it doesn’t need precise system information and it is classified as a powerful tool in Artificial Intelligent (AI) methods that can build efficient solutions via the combination of many input data parameters and then provide the desired cost criterion. Many researches have been devoted to clustering parameters based on the fuzzy rules to determine and choose network CHs. For instance, Abidi, et al. [1] introduced a Fuzzy CH selection algorithm based on LEACH protocol using three input parameters:
A Fuzzy Based Clustering Approach to Prolong the Network Lifetime
99
Remaining Energy, Neighbours Alive, Distance from the BS to select CH. in [2], Ayati, M., et al. designed a three-level clustering method with numerous inputs at each level of clustering: remaining energy and centrality, transmission quality and distance from the BS, overall delay, and denial-of-service (DOS) attacks. Al Kashoash et al. [3] also proposed a FIS for CH selection based on two inputs: Residual Energy and Received Signal Strength (RSSI) value to increase network lifetime and packet transmission rate. The proposed algorithm demonstrated a significant increase in network lifetime up to the LEACH Protocol. The authors in [4–6] propose completely efficient solutions for balancing the energy depletion of the SNs and extending the life of the WSN. These methods make use of three fuzzy variables: node residual energy, distance to BS, and distance to CH. The simulation results demonstrate that, when compared to the LEACH protocol, the proposed algorithm significantly progresses energy efficiency and network lifetime of WSNs. Additionally, Lee et al. [7] suggested a clustering scheme for mobile sensor nodes that is utilize three inputs: residual energy, movement speed, and pause time. The issue of energy consumption in WSNs has remained a focus of research in recent years, and as a result, many current studies continue to work in this direction by developing efficient methods for increasing the network’s efficiency. Such as the authors in [8–10], who employ the Fuzzy Clustering Algorithm to improve network reliability and lifespan. Additionally, that use Fuzzy Clustering Algorithm to enhance the network reliability and increase the Lifetime. Also, many Intelligent methods are suggested in terms of Adaptive Neural-Network Fuzzy-System, Metaheuristic Intelligent Algorithms like swarm algorithms ABC and ACO, and flowerpollination [11–14]. Additionally, Balaji et al. [15] recommended a multi-hop data packet exchange, with the data packets eventually being sent to the BS. When packets are transmitted from the source sensor to the BS via the CH, they are transmitted using fuzzy logic type1 with three parameters. Which correctly predicts the nodes with a high degree of confidence and is close to the BS. On the other hand, in [16], the selection of the fuzzy logic cluster head is based on three inputs: remaining energy, node density, and distance to the BS (sink). Due to the fact that WSNs suffer from a number of issues related to energy consumption and network scalability as a result of their complexity and nonlinear behavior, some recent research has focused on automating the construction and optimization of the rule base table in a FIS. For example, Tran et al. [17] improve energy efficiency in large-scale sensor networks by using energy-based multihop clustering in conjunction with the Dijkstra algorithm to determine the shortest path. Meanwhile, Fanian et al. [18] propose a Fuzzy Multi-hop Cluster based routing Protocol and an intelligent scheme called the Shuffled Frog Leaping Algorithm for improving the rule base table in a FIS. Additionally, the authors in [19, 20] suggest that an energy-efficiency and reliability-based cluster head selection scheme would be an ideal way to improve the overall accomplishment of WSNs.
3 Modified LEACH Protocol Design In this section, a selection approach for cluster heads (CHs) using the Fuzzy Inference Scheme is suggested to improve network lifetime and reduce the energy consumption of the LEACH protocol in WSNs. The organization of this section is introduced as
100
E. A. Al-Hussain and G. A. Al-Suhail
follows: (i) Network model assumptions are stated first, (ii) the details about the design of the fuzzy logic controller are presented, and (iii) finally the operation of the suggested protocol is given. 3.1
Network Model
The criteria required to describe the network model based on the proposed Fuzzy LEACH protocol are considered as follows: 1. N Sensor Nodes are considered uniformly disseminated on M X M interesting area, and all the nodes and BS are stationary (non-mobile). 2. All SNs have the capability to sense, aggregate, and forward the data to the BS (i.e., acts as a sink node). 3. In the network, the nodes are non-chargeable and are homogeneous in initial energy terms. 4. The Sink Node (BS) is situated in the central of the network field. It is often assumed that the communication links to the other nodes are symmetrical. So that the data rate and energy consumption of any two nodes are symmetrical in terms of packet transmission. 5. The nodes are operated in power control mode, based on the receiving distance from the SN. 6. At the Sink node (BS), the selected CH nodes would not be selected again in any new round of selection. 7. In each round of the Set-Up phase, the cluster heads are still selected randomly but with extra fuzzy logic criteria to enhance the CHs selection process of the LEACH protocol. 3.2
Design of Fuzzy Logic Controller (FLC)
There are commonly four key steps in the FLC [14]: 1. Fuzzification: Convert the crisp values of each input variable to fuzzy values in the form of membership functions. Triangular, Trapezoidal, and Gaussian membership functions are the most well-known types of MFs, but to avoid discontinuities in the input domain, the Gaussian membership function is proposed, which is defined as in Eq. 1: f ðx; r; cÞ ¼ expð
ðx cÞ2 Þ: 2r2
ð1Þ
where, c is the mean, and r is the standard deviation. 2. Rule evaluation: Apply step 1 output to the fuzzy rule to evaluate the fuzzy output. A typical rule of the Mamdani fuzzy model is used due to their widespread acceptance Rn: if x1 is X1 and x2 is X2 then Y is y. 3. Aggregation: It is integrating each rule’s outputs into a single fuzzy set. Many aggregation methods can be used such as (i) max(maximum), (ii) sum (sum of the rules o/p sets), and (iii) proper (probabilistic or).
A Fuzzy Based Clustering Approach to Prolong the Network Lifetime
101
4. Defuzzification: transform fuzzy set values to a single crisp number. Many types of defuzzification methods are found, the weighted average method and centroid method, or what is sometimes called (center of the area) are the most popular methods for defuzzification. Figure 1 demonstrates a fuzzy inference system for selecting CHs. in this model, the Fuzzy Logic Controller (FLC) is assessed using two parameters: Residual Energy and node distance to Base Station; and it can produce the output parameter namely the fuzzy chance. Later, to improve the cluster head selection, a fuzzy chance and LEACH probability criterion can be combined to produce a new chance in finding these CHs.
Fig. 1. The fuzzy inference system for the CHs selection
In our proposal, we used the Mamdani approach as a FIS because of its simplicity. The rule of the Fuzzy Logic Controller (FLC) is to measure the probability of CH selection based on two input descriptors as shown in Table 2: (i) Residual Energy (REN) and (ii) the distance between each node and the Base Station (DBS). Thus, this controller is designed with two designated inputs (REN and DBS) and one output (chance). The first one is the residual energy (REN). The second input is the distance to the base station (DBS). Consequently, the linguistic variables and the fuzzy logic rule base are shown in Tables 1 and 2.
Table 1. Inputs/output variables. Parameter
linguistic
Linguistic variable
Residual energy Low, Medium, (REN) High Distance to BS (DBS) Close, Average, Far Chance VL, L, M, H, VH
Table 2. Fuzzy rules. Residual energy (REN) Low Low Low Medium Medium Medium High High High
Distance to BS (DBS) Close Average Far Close Average Far Close Average Far
Chance M L VL H M L VH H M
102
E. A. Al-Hussain and G. A. Al-Suhail
In the FLC system, the Gaussian membership functions are employed rather than triangular or trapezoidal membership functions to represent linguistic variables and in order to avoid the discontinuities if the input MFs do not cover each input domain completely. Figure 2 depicts the membership functions of REN, and the membership functions of DBS respectively.
Fig. 2. Fuzzy inference system of proposed algorithm (a) MFs of input variable REN, (b) MFs of input variable DBS, and (c) MFs of output variable chance.
On contrary, the fuzzy inference system (FIS) is simulated using the Xfuzzy tool which is a development environment that combines many powerful tools to design and tuning the parameters of a fuzzy system. Also, there is an excellent ability to generate a C++ code for this system that can be integrated with the Castalia simulator as in Fig. 3.
A Fuzzy Based Clustering Approach to Prolong the Network Lifetime
103
Which demonstrates the simple architecture for Modified Leach protocol using Xfuzzy tools and Castalia simulator that integrating with OMNET++. The graphical user interface of Xfuzzy shows the specifications by means of drop-down structures so that the complete system or any rule bases can be select as the active specification by few stages: (i) Description stage: select a preliminary description of the system by using two tools (xfedit and xfpkg) that assist in the description of fuzzy systems. (ii) Verification stage: study the behavior of the fuzzy system under development value of the various internal variables for the input values of the given range. (iii)Tuning stage: adjusting the different MFs. (iv) Synthesis stage: generate a system representation that could be used externally such as xfcpp, that used to develop a C++ description.
Fig. 3. Simple architecture for modified leach protocol using xfuzzy tools and OMNET+ +/castalia simulator.
3.3
Operation of the Modified LEACH Protocol
In this subsection, we use the proposed CHs selection algorithm using T1-FIS depending on Residual Energy (REN) and the Distance from the BS (DBS) of each node (DBS). This algorithm performs its operation at rounds based on LEACH protocol according to specific and defined criteria. Thereby, each round starts with the following steps of the proposed fuzzy-based algorithm:
104
E. A. Al-Hussain and G. A. Al-Suhail
4 Simulation Results Now, this section discusses the performance assessment of the Modified LEACH in Sect. 3, using Castalia simulator and OMNET++. The proposed algorithm is examined when the network of our simulations consists of 100 SNs spread uniformly over an area of 100 100 m2. We consider the location of the base station to be in the location (50, 50). The initial energy of all nodes is expected to be 3 J. The obtained results are compared with the original LEACH protocol. The environmental network parameters utilized in the simulation are expressed in Table 3. Table 3. Simulation parameters. Parameters Network size No. of nodes No. of clusters Location of BS Node distribution Energy model
Value 100 100 m2 100 5 50 50 m Random Battery
Parameters Initial energy Simulation time Round time Packet header size Data packet size Bandwidth
Value 3J 300 s 20 s 25 Bytes 2000 Bytes 1 Mbps
A Fuzzy Based Clustering Approach to Prolong the Network Lifetime
105
Figure 4 illustrates First Node Dead (FND), Half Node Dead (HND), and Last Node Dead (LND). The results demonstrate that Modified LEACH outperforms the original leach protocol by about 50.94%, 14.1667%, and 13.259% in standings of FND, HND, and LND respectively. Meanwhile, Fig. 5, and Fig. 6 present energy consumption of the nodes, and total number of alive nodes is measured in relation to the number of network communication rounds to evaluate the proposed protocols. The results show that the Modified LEACH consumes less energy than and Traditional LEACH in terms of consumed energy, and the number of alive nodes per round by about 13.69%, and 15.29% at the first 100 s.
Fig. 4. FND, HND, and LND.
Fig. 5. Total energy consumption.
Fig. 6. Total no. of alive nodes.
106
E. A. Al-Hussain and G. A. Al-Suhail
5 Conclusions Due to the increase and expansion of WSN applications, particularly in the last few years, it has become important to find an effective solution to WSN challenges. Energy savings was one of the primary challenges confronting these networks. In this paper, a CH selection scheme is proposed using Fuzzy-Logic system (T1-FIS). It depends on Residual Energy and Node Distance from BS in order to maximize the network lifetime and decrease the energy consumption per sensor node. The modified LEACH Protocol is developed and simulated in Castalia (v3.2) and OMNET++ (v4.6). The results showed that the modified protocol can successfully decrease the energy consumption and increase the network lifespan compared to the original LEACH and other existing Fuzzy-based LEACH proposals. For future work, the proposed scheme can involve different parameters in the design of fuzzy inference system like the centrality, SNR, RSSI, and packet size. Moreover, one of Artificial Intelligent algorithms (AI) such as GA, PSO, ABC, and FPA is also eligible to optimize the QoS accomplishment of the LEACH protocol.
References 1. Abidi, W., Ezzedine, T.: Fuzzy cluster head election algorithm based on LEACH protocol for wireless sensor networks. In: 13th International Wireless Communications and Mobile Computing Conference (IWCMC), pp. 993–997. IEEE (2017) 2. Ayati, M., Ghayyoumi, M.H., Keshavarz-Mohammadiyan, A.: A fuzzy three-level clustering method for lifetime improvement of wireless sensor networks. Ann. Telecommun. 73(7–8), 535–546 (2018). https://doi.org/10.1007/s12243-018-0631-x 3. Al-Kashoash, H.A., Rahman, Z.A.S., Alhamdawee, E.: Energy and RSSI based fuzzy inference system for cluster head selection in wireless sensor networks. In: Proceedings of the International Conference on Information and Communication Technology, pp. 102–105 (2019) 4. Abbas, S.H., Khanjar, I.M.: Fuzzy logic approach for cluster-head election in wireless sensor network. Int. J. Eng. Res. Adv. Technol. 5, 14–25 (2019) 5. Mahboub, A., Arioua, M., Barkouk, H., El Assari, Y., El Oualkadi, A.: An energy-efficient clustering protocol using fuzzy logic and network segmentation for heterogeneous WSN. Int. J. Electr. Comput. Eng. 9, 4192 (2019) 6. Kwon, O.S., Jung, K.D., Lee, J.Y.: WSN protocol based on LEACH protocol using fuzzy. Int. J. Appl. Eng. Res. 12, 10013–10018 (2017) 7. Lee, J.S., Teng, C.L.: An enhanced hierarchical clustering approach for mobile sensor networks using fuzzy inference systems. IEEE Internet Things J. 4, 1095–1103 (2017) 8. Phoemphon, S., So-In, C., Aimtongkham, P., Nguyen, T.G.: An energy-efficient fuzzy-based scheme for unequal multihop clustering in wireless sensor networks. J. Ambient. Intell. Humaniz. Comput. 12(1), 873–895 (2020). https://doi.org/10.1007/s12652-020-02090-z 9. Al-Husain, E.A., Al-Suhail, G.A.: E-FLEACH: an improved fuzzy based clustering protocol for wireless sensor network. Iraqi J. Electri. Electron. Eng. 17, 190–197 (2021) 10. Lata, S., Mehfuz, S., Urooj, S., Alrowais, F.: Fuzzy clustering algorithm for enhancing reliability and network lifetime of wireless sensor networks. IEEE Access 8, 66013–66024 (2020)
A Fuzzy Based Clustering Approach to Prolong the Network Lifetime
107
11. Thangaramya, K., Kulothungan, K., Logambigai, R., Selvi, M., Ganapathy, S., Kannan, A.: Energy aware cluster and neuro-fuzzy based routing algorithm for wireless sensor networks in IoT. Comput. Netw. 151, 211–223 (2019) 12. Sharma, N., Gupta, V.: Meta-heuristic based optimization of WSNs energy and lifetime-a survey. In: 2020 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence), pp. 369–374. IEEE (2020) 13. Yuvaraj, D., Sivaram, M., Mohamed Uvaze Ahamed, A., Nageswari, S.: An efficient lion optimization based cluster formation and energy management in WSN based IoT. In: Vasant, P., Zelinka, I., Weber, G.W. (eds.) ICO 2019. AISC, vol. 1072, pp. 591–607. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-33585-4_58 14. Devika, G., Ramesh, D., Karegowda, A.G.: Swarm intelligence–based energy‐efficient clustering algorithms for WSN: overview of algorithms, analysis, and applications. Swarm Int. Optim. Algorithms Appl., 207–261 (2020) 15. Balaji, S., Julie, E.G., Robinson, Y.H.: Development of fuzzy based energy efficient cluster routing protocol to increase the lifetime of wireless sensor networks. Mob. Netw. Appl. 24, 394–406 (2019) 16. Rajput, A., Kumaravelu, V.B.: FCM clustering and FLS based CH selection to enhance sustainability of wireless sensor networks for environmental monitoring applications. J. Ambient. Intell. Humaniz. Comput. 12(1), 1139–1159 (2020). https://doi.org/10.1007/ s12652-020-02159-9 17. Tran, T.N., Van Nguyen, T., Bao, V.N.Q., An, B.: An energy efficiency cluster-based multihop routing protocol in wireless sensor networks. In: International Conference on Advanced Technologies for Communications (ATC), pp. 349–353. IEEE (2018) 18. Fanian, F., Rafsanjani, M.K.: A new fuzzy multi-hop clustering protocol with automatic rule tuning for wireless sensor networks. Appl. Soft Comput. 89, 106115 (2020) 19. Murugaanandam, S., Ganapathy, V.: Reliability-based cluster head selection methodology using fuzzy logic for performance improvement in WSNs. IEEE Access 7, 87357–87368 (2019) 20. Van, N.T., Huynh, T.T., An, B.: An energy efficient protocol based on fuzzy logic to extend network lifetime and increase transmission efficiency in wireless sensor networks. J. Intell. Fuzzy Syst. 35, 5845–5852 (2018)
Visual Expression Analysis from Face Images Using Morphological Processing Md. Habibur Rahman1(&) , Israt Jahan1 and Yeasmin Ara Akter2 1
,
Department of Computer Science and Engineering, East Delta University, Chattogram, Bangladesh 2 School of Science, Engineering and Technology, East Delta University, Chattogram, Bangladesh [email protected]
Abstract. Visual Expression Analysis is an active field of computer vision where any given scenario’s emotion is analyzed as anger, disgust, fear, surprise, sadness, contempt, happiness and many more. Human facial expression can be used as an effective tool in this research arena. Detecting emotion requires identifying whether there is a face or not in the image. Most of the systems were prepared with grayscale images, but this manuscript proposes using MTCNN, a face detector that recognizes faces from RGB images . The methodology includes RGB color images of customized dataset of Flickr. The face is considered as the region of interest (ROI) from any given image. ROI is further converted into binary images after evaluating the combinations of morphological operations (erosion, dilation, opening and closing) that selects the best morphological technique i.e. subtracting the eroded images form the gray images for retrieving facial features. After extracting the features, Random Forest, Logistic Regression, SVM, xGBoost, GBM and CNN classifiers have been implemented to get the best classifier. Consequently, based on the performance analysis, CNN is the best model with 99.71% train accuracy and 98.01% test accuracy to classify four facial expressions: ‘anger’, ‘happiness’, ‘sadness’ and ‘surprise’. Keywords: Computer vision processing CNN
Morphological operations MTCNN Image
1 Introduction The human brain is a complicated system that learns things by remembering patterns of a given data. It is possible to develop a technical approach that can learn patterns and participate in decision making like humans. The input data can be of various type, for example, image data, textual data etc. When a system is developed to understand and participate in decision making based on video or image data is called computer vision [1]. Visual expression analysis is an application of computer vision that extracts information from the image to determine the emotion since emotion is an essential tool for analyzing a person’s feelings. In this regard, human facial expression can be used as a pivotal point to detect emotion. The changes in facial patterns during each expression © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 108–119, 2022. https://doi.org/10.1007/978-3-030-93247-3_12
Visual Expression Analysis from Face Images
109
give us message about a person’s mood. Eyes, lips, and nose hold most of the data, which helps to recognize the human sentimental condition. Hence, this paper focuses on some morphological operations to extract information from the organs mentioned above to analyze facial expression.
2 Related Works Expressions can be easily determined from speech and face movement. However, researchers are attracted to work on differentiating expression from images because of the availability of image and video. Byungsung LEE et al. introduced a real-time method to incorporate facial expression from video images to generate a well-being life care system to give emotional care service [2]. They used PCA and template matching algorithms for face detection. To detect face candidate, HT skin colour model, mean filtering, morphological operations were applied. S. L. Happy et al. offered a supervised real-time classification algorithm that operates on grayscale frontal face image [3]. They employed Haar classifier for face detection, LBPH for feature extraction and PCA for classification. The proposed system fails to detect emotions from rotated and occluded image. A hybrid feature extraction and facial expression method is proposed by Maliha Asad et al. [4] where features are extracted by applying PCA, fisher LDA and HOG individually, and the recognition was done by SVC classifier. Extended ck+ dataset was used to train 7 emotions where only 5 emotions were tested due to poor detection rate. Jayalekshmi J and Tessy Mathew trained their proposed system on the JAFFE dataset [5]. Their system performed Zernike moment, Local Binary Pattern (LBP) and Discrete Cosine Transform (DCT) for feature family. Advait Apte et al. applied morphological operations, mean and standard deviation on Cohn-Kanade AU-coded dataset for pre-processing to extract features. They got 92% accuracy by applying scaled conjugate gradient backpropagation of neural network [6]. Allen Joseph and P. Geetha proposed a facial expression system based on facial geometry and trained it with KDEF dataset [7]. They used discrete wavelet transform and fuzzy combination to enhance images, modified mouthmap and eyemap algorithm to extract mouth region and neural networks to classify facial expression. Viola-Jones algorithm is used as a face detector in [3–6] and [7]. Guan-Chun Luh et al. recently utilized Yolo, YoloV2 and YoloV3 deep neural network to study facial expression-based emotion recognition on JAFFE, RaFD and CK+ dataset [8]. G. G. Lakshmi Priya and L. B. Krithika introduced a GFE-HCA approach for facial expression recognition where the MMI dataset is used with five emotions [9]. They also detected face using the Viola-Jones algorithm but extracted only edge-based and shape-based features and, after that, sent to a self-organized map based NN classifier for emotion classification. Based on the state-of-the-works, it is clear that most of the Researches are dependent on grayscale images only. Viola-Jones algorithm is used as a face detector and lacks to detect rotated images that are more than 120°. However, to detect face from RGB images, we discussed a different approach where we applied MTCNN, Haar-Cascade, morphological operations, and several machine learning algorithms and a CNN classifier for emotion classification.
110
Md. H. Rahman et al.
3 Dataset In today’s era, images are not laborious to obtain. In consequence, image datasets are also available. This research work intends to interpret facial expression by detecting human face from input images. Most of the existing image dataset contains cropped facial images. As a result, a dataset is adopted by selecting necessary images from the album “The face we make” available on Flickr [10]. Our dataset consists of a total of 804 RGB images where 160 images are for “Anger”, 216 images for “Happiness”, 204 images for “Sadness”, and 224 images for “Surprise”. Fig. 1 shows a sample of our collected data.
Fig. 1. Samples of the dataset
4 Methodology Our proposed system consists of two phases. The first phase includes Haarcascade and MTCNN algorithms to detect face and the second phase performs multiple classifications to recognize expressions. By combining these two phases, the overall system design is illustrated in Fig. 2:
Fig. 2. Overview of working procedure of proposed system
Visual Expression Analysis from Face Images
4.1
111
Face Detection
In this research, Haarcascade and Multi-task Cascaded Convolutional Neural Network (MTCNN) algorithms are used for detecting human face from the images. Haarcascade is a face detection algorithm proposed by Viola and Jones where they trained 4960 images of human face [11]. This algorithm requires frontal human faces and works best on grayscale images. To make the dataset compatible with detecting human face, the images are needed to be converted into grayscale images. The detection rate of haarcascade algorithm is 95.27% based on this dataset. Another approach for face detection is MTCNN, a deep learning approach described by Kaipeng Zhang, et al. in the 2016 [12]. Unlike haarcascade algorithm, MTCNN can distinguish a face from RGB images. According to this dataset, the detection rate of MTCNN is almost 100%. In this paper, MTCNN is proposed as the face detector for our system since MTCNN has a higher detection rate than the Viola-Jones algorithm. The facial expression analysis depends on the face detection result based on the localized faces which are used in second phase of the whole work for further classification. The output image of the Viola-Jones algorithm and MTCNN are shown in Fig. 3.
Fig. 3. a) Original image b) Detected face using Viola-Jones algorithm c) Detected face using MTCNN.
4.2
Pre-processing
Pre-processing makes the raw data adjusted for further consideration. Among numerous ways of pre-processing methods, we employed following techniques: 4.2.1 Cropping and Augmentation As preprocessing, we at first cropped the image according to the bounding box (generated as Sect. 4.1) as a region of interest (ROI). Thereafter, two augmentation techniques “Horizontal flip” and “Rotation” have been used for increasing data samples of the dataset by modifying the data [13]. Additionally, images were resized to 120 * 120 * 1 because the proposed system requires fixed width and length for each images. Fig. 4 shows a sample of ROI and the output of augmentation techniques.
112
Md. H. Rahman et al.
Fig. 4. a) Region of Interest or ROI b) Flipped image of ROI c) Rotated image of ROI.
4.2.2 Morphological Processing Morphological operations are mathematical operations performed on an image to extract features to get the most distinctive facial emotion information. The combination of four morphological operations such as erosion, dilation, opening and closing are experimented in this paper to select the best technique for facial feature extraction. Erosion enlarges dark regions and shrinks bright regions whereas, dilation enlarges bright regions and shrinks dark regions. Opening is defined as an erosion followed by dilation and closing is defined as a dilation followed by an erosion. These operations only works on the grayscale image or binary image. Hence, the system performs RGB to grayscale image conversion on the ROI. Eyes, lips and noses are the key elements to dissect each expression. Erosion followed by a subtraction extracts features from these areas in such a manner where erosion removes the pixel from these areas, and subtraction of eroded image from grayscale image brings back the detached pixel. Output Image ¼ Subtracted Image ¼ Gray Image Eroded Image
ð1Þ
There are few more experimented combinations that can extract facial featuresOutput Image ¼ E ¼ Erosion þ Opening
ð2Þ
D ¼ Dialation þ Closing
ð3Þ
Output Image ¼ Gray Image þ ðE DÞ
ð4Þ
Output Image ¼ Gray Image þ E
ð5Þ
Output Image ¼ Gray Image ðE þ DÞ
ð6Þ
Output Image ¼ Gray Image ðE DÞ
ð7Þ
The output images are the final images for each technique that are going to participate in classification. The images are converted into image vectors using image to feature vector method. Thus, the system extracts feature from eyes, lips and noses to train classifiers. The output images of the experimented combinations of morphological operations are shown below (Fig. 5):
Visual Expression Analysis from Face Images
113
Fig. 5. a) Eroded image of ROI. Output image of- b) Subtracted image. c) Erosion + Opening. d) Gray image + E – D. e) Gray image + E. f) Gray image – E + D.
4.3
Classical Machine Learning Approaches
Machine learning classification algorithms generate predictions using computational statistics, methods delivered by mathematical optimizations based on the training data. In this paper, the Random Forest classifier, Logistic Regression classifier, Support Vector Machine classifier and boosting classifiers such as XGBoost and Gradient Boosting Machine are used for classification purpose. Hyper-parameter tuning is a process of finding the best parameters for the model architecture to get the ideal model for a system [14]. These methods are trained with the collected dataset. The result’s analysis focuses on the accuracy and overfitting, which we got throughout the hyperparameter tuning process. 4.4
Deep Learning (CNN) Approach
Convolutional Neural Network (CNN) is a deep neural network usually applied to interpret visual imagery. Deep learning models are constructed using layers. The layers between the input and output layers are called hidden layers, connected sequentially. They contain neuron, weights and bias that help generate feedback and update the features. Our constructed CNN operates using max-pooling in the hidden layers, stride value and padding on an input image and extract features. The shape of an output layer is computed as-
114
Md. H. Rahman et al.
Outputshape ¼
ðm f þ 1Þ ðn f þ 1Þ þ1 þ1 S S
ð8Þ
Here, m n is the input size; f f is the kernel size and S is the stride value. All layers are summed up, and total features are the trainable parameters for the model. The model learns from the features using an optimizer. In the final layer, the activation function classifies the images by retrieving the highest probability based on each class’s probability distribution value [15]. In the proposed CNN model, “adam” is used as an optimizer with a learning rate and softmax function as an activation function for classification. Based on the classification result, the model improves the result using backpropagation to reduce the error and repeats the whole process using epoch. The combination of layers of Convolutional Neural Network (CNN) that works best for our system is illustrated in Fig. 6.
5 Result Analysis One of the major focuses of this research is implementing and comparing the outcomes of classical machine and deep learning strategies. Hence, we have demonstrated the result analysis in two way.
Fig. 6. CNN layers of sequential model
Visual Expression Analysis from Face Images
115
Firstly, the machine learning classifier’s performance analysis is specified through experiments that shows the impact of augmentation techniques and the combination of morphological operations (discussed in Sect. 4.2.2) on the classifiers. For augmentation techniques, based on the first morphological operation (Eq. 1) the investigation is performed in three ways: without applying augmentation, applying both flip and rotation, and applying only the flip method. Every experiment is analyzed through the hyper-parameter tuning phase to see the changes in the classifiers’ result. Table 1 depicts the result (accuracy) of “Without applying augmentation” and “Applying both augmentation techniques”. Table 1. Experimental result of ‘Without applying augmentation’ and ‘Applying both augmentation technique’. Machine learning classifiers
Logistic regression SVM Random Forest xgBoost Gradient Boosting
Pre-tuning (Hyper-parameter) Applying both flip Without and rotation applying techniques (%) augmentation technique (%) 90.54 89.39
Post-tuning (Hyper-parameter) Applying both flip Without and rotation applying techniques (%) augmentation technique (%) 93.03
89.72
89.55 92.54
87.89 90.38
89.55 91.54
87.89 92.21
88.56 90.05
81.43 91.54
91.54 91.54
91.04 90.88
Table 1 clearly indicates that the accuracy of all classifiers (except the Gradient boosting machine) is higher for the experiment done without applying any augmentation techniques. In both cases, the training accuracy is almost 100%, which means that the classifiers result in overfitting. A third experiment took place that is done by applying flip only to overcome this issue. Table 2 shows the experimented result generated before and after doing hyperparameter tuning process. All the hyper-parameters selected for each classifier are discussed in Sect. 4.3. The result that we got through this test helps to reduce overfitting. Logistic Regression classifier gives the highest accuracy with lowest overfitting rate.
116
Md. H. Rahman et al.
Table 2. Experimented result of “applying flip augmentation technique” for pre and post-hyperparameter tuning. Classifiers
Accuracy (%) Pre Post Logistic regression 95.52 97.51 SVM 96.52 96.51 Random Forest 95.27 96.01 Gradient Boost 94.28 94.78 xgBoost 86.57 95.77
Precision (%) Pre Post 95.5 97.81 96.5 96.51 95.3 95.87 94.3 94.66 87.0 95.75
Recall (%) Pre 95.5 96.5 95.3 94.3 86.6
Post 97.47 96.38 95.78 94.51 95.56
F1-score (%) Pre Post 95.5 97.62 96.5 96.54 95.3 96.02 94.3 94.88 86.4 95.81
Table 3. Experimented results of morphological techniques for pre and post hyperperameter tuning. Classifiers
Erosion + Opening (%)
Gray image + E - D (%)
Gray image – E + D (%)
Gray image + E (%)
Pre
Post
Pre
Post
Pre
Post
Pre
Logistic Regression
95.27
93.53
90.80
90.30
90.05
89.55
91.29
Post 91.04
SVM
93.53
93.53
88.81
88.81
90.05
90.05
90.55
90.55
Random Forest
95.02
94.03
91.04
92.79
90.55
90.05
90.30
91.79
Gradient Boost
94.78
95.52
91.54
92.29
90.05
91.54
90.55
90.05
xgBoost
87.31
95.77
84.83
97.01
82.34
92.54
86.57
90.80
Table 3 shows the experimented results of the morphological techniques where flip augmentation technique was applied. The results didn’t improve though the motive of this experiment was to improve the accuracy. The second method of evaluating our system is the analysis of CNN model. It is initially constructed with five convolution layers where batch size = 128, epoch = 100 and dense = 512. The accuracy of this combination is 95.21% and results in underfitting which cannot be accepted. Due to the higher dense value, the model becomes more complex. To overcome under-fitting and increase performance, the batch size is reduced to 32, and dense value to 128 and epoch is increased to 200. The accuracy for this is 95% and results in overfitting. The layers are reduced from 5 to 4, and a drop out layer is added with a value of 0.4 which results in 98.01% accuracy. But the problem is, this model is not stable. To fix this problem, a global average pooling layer is added along with a max-pooling layer to simplify the input for the fully connected layer, which also has 98.01% accuracy, and both precision and recall are 98%. The output image of first morphological operation (Eq. 1) is used to extract features for the above experiments of the CNN part. Furthermore, the accuracy obtained after performing few more morphological techniques (Sect. 4.2.2) such as Eqs. 2, 4, 6 and 7 are respectively 93%, 98%, 90% and 98%. The model loss is calculated using the “categorical cross entropy function”. The training loss and validation loss of the CNN model without global average pooling are 0.0577 and 0.1455. On the other hand, the training loss and validation loss with global average pooling are 0.0109 and 0.0716. In both cases, model accuracy is the same, but the global average pooling model is more stable as depicted in Fig. 7.
Visual Expression Analysis from Face Images
117
Fig. 7. Model loss graph – a) without global average pooling b) with global average pooling. Model accuracy graph – c) without global average pooling d) with global average pooling.
Confusion matrix gives the overall idea of the performances of classifiers. It also shows how much data is misclassified while testing. In confusion matrices, emotion classes identify as “angry” = 0, “happy” = 1, “sad” = 2 and “surprise” = 3. The confusion matrix of all classifiers, including CNN is illustrated in Fig. 8.
Fig. 8. Confusion matrix of a) Logistic Regression b) SVM c) Random Forest d) xGBoost e) Gradient boosting machine f) CNN
118
Md. H. Rahman et al.
Based on the result analysis, CNN model has the train accuracy of 99.71% and test accuracy of 98.01%, it also has the highest precision and recall value. In this paper, CNN is proposed as the best classifier since, it has the lowest model loss and overfitting rate so far compared with the other classifiers.
6 Visualization We visualized the output of all steps, such as ROI, erosion, subtraction etc., of this system by developing a graphical user interface using the PyQt5 framework. It interprets the emotion from an input image, and it can also compare the predicted result among the classifiers. The below figure shows the interface of the proposed system (Fig. 9).
Fig. 9. Graphical user interface of the proposed system
7 Conclusion and Future Work In this paper, our attempt was to predict the emotion from the images of facial expression. As a consequence, after collecting and annotating the data, we employed MTCNN since it performs better than the Viola-Jones algorithm and has almost 100% detection rate for face detection. Before classification, a combination of several morphological methods has been applied to extract the useful features from the face. Among the machine learning algorithms, Logistic regression gave the best performance with 97.51% accuracy. However, CNN is the best classifier among the fundamental research with 98.01% accuracy and lowest overfitting. That’s why CNN is proposed as the ablest classifier for our design. The proposed model is limited to detect four expressions only. Moreover, one particular expression has several types of signs and all signs should be detected. Thus, our further study on this topic is to overcome these issues and analyze with other deep learning models and improve our model to make it capable of detecting multiple human faces and their emotions.
Visual Expression Analysis from Face Images
119
References 1. Jason, B.: A gentle introduction to computer vision. Mach. Learn. Mastery (2019). https:// machinelearningmastery.com/what-is-computer-vision/ 2. Lee, B., Chun, J., Park, P.: Classification of facial expression using SVM for emotion care service system. In: Proceedings of the 9th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing SNPD 2008 2nd International Workshop on Advanced Internet Technology and Applications, pp. 8–12 (2008). https://doi.org/10.1109/SNPD.2008.60 3. Happy, S.L., George, A., Routray, A.: A real time facial expression classification system using local binary patterns. In: 4th International Conference on Intelligent Human Computer Interaction: Advancing Technology for Humanity IHCI 2012, pp. 1–5 (2012). https:// doi.org/10.1109/IHCI.2012.6481802 4. Asad, M., Gilani, S.O., Jamil, M.: Emotion detection through facial feature recognition. Int. J. Multimed. Ubiquitous Eng. 12(11), 21–30 (2017). https://doi.org/10.14257/ijmue.2017. 12.11.03 5. Jayalekshmi, J., Tessy, M.: Facial expression recognition system ‘sentiment analysis’. In: 2017 International Conference on Networks & Advances in Computational Technologies (NetACT), Trivandrum, pp. 1–8 (2017) 6. Apte, A., Basavaraj, A., Nithin, R.K.: Efficient facial expression ecognition and classification system based on morphological processing of frontal face images. In: 2015 IEEE 10th International Conference on Industrial and Information Systems, ICIIS 2015 - Conference Proceedings, pp. 366–371 (2015). https://doi.org/10.1109/ICIINFS.2015.7399039 7. Joseph, A., Geetha, P.: Facial emotion detection using modified eyemap–mouthmap algorithm on an enhanced image and classification with tensorflow. Vis. Comput. 36(3), 529–539 (2019). https://doi.org/10.1007/s00371-019-01628-3 8. Luh, G., Wu, H., Yong, Y., Lai, Y., Chen, Y.: Yolov3 deep neural networks. In: 2019 International Conference on Machine Learning and Cybernetics, pp. 1–7 (2019). https:// ieeexplore.ieee.org.ezproxy.ugm.ac.id/document/8949236 9. Krithika, L.B., Priya, G.G.L.: Graph based feature extraction and hybrid classification approach for facial expression recognition. J. Ambient. Intell. Humaniz. Comput. 12(2), 2131–2147 (2020). https://doi.org/10.1007/s12652-020-02311-5 10. Dexter, M.: The face we make. Flickr (2012). https://www.flickr.com/photos/thefacewe make/albums 11. Viola, P., Jones, M.J.: Cohomology of one-relator products of locally indicable groups. J. London Math. Soc. s2–30(3), 419–430 (1984). https://doi.org/10.1112/jlms/s2-30.3.419 12. Zhang, K., Zhang, Z., Li, Z., Member, S., Qiao, Y., Member, S.: (MTCNN) multi-task cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016) 13. Jason, B.: Configure image data augmentation - MATLAB - MathWorks India. Mach. Learn. Mastery (2019). https://machinelearningmastery.com/how-to-configure-image-dataaugmentation-when-training-deep-learning-neural-networks/?fbclid= IwAR1G5LP2OGsZAkcS9g2tbQ68wrT_29tWL6P2s3GzOzvaEEm0pneyHFqff7A#:*: text=Image data augmentation is a, of images in the dataset. &te 14. Jordan, J.: Hyperparameter tuning for machine learning models. Jermy Jordan (2017). https://www.jeremyjordan.me/hyperparameter-tuning/ 15. Canedo, D., Neves, A.J.R.: Facial expression recognition using computer vision: a systematic review (2019). https://doi.org/10.3390/app9214678
Detection of Invertebrate Virus Carriers Using Deep Learning Networks to Prevent Emerging Pandemic-Prone Disease in Tropical Regions Daeniel Song Tze Hai, J. Joshua Thomas(&), Justtina Anantha Jothi, and Rasslenda-Rass Rasalingam Department of Computing, School of Engineering, Computing, and Built Environment, UOW Malaysia KDU Penang University College, 10400 Penang, Malaysia {jjoshua,justtina,rasslenda}@kdupg.edu.my
Abstract. Insects are a class of invertebrate organisms. They are a massively effective group, comprising animals such as bees, butterflies, cockroaches, flies, mosquitoes, and ants. Mosquito-borne diseases are those spread by the bite of an infected mosquito. Zika, West Nile virus, Chikungunya virus, dengue, and malaria are diseases that are spread to people by mosquitoes. The purpose of this work is to show that computer vision can classify the mosquito species that spread the dengue (emerging-pandemic prone disease) by using Convolutional Neural Network (CNN). The work is to assist the non-specialist to identify these three types of mosquito species with a simple web interface integrated with deep learning models works at the backend. Convolutional Neural Network (CNN) has been implemented to extract features from the mosquito images and identify the mosquito species, such as Aedes, Anopheles, and Culex. There are 2,111 mosquito images collected and used these mosquito images to train the CNN to perform mosquito species classification. From the experiment result, deep learning shows that it has the ability to identify the mosquito species. There are series of experiment has been conducted with data augmentation, regularization technique, stride, filters methods in convolutional layer to improve the performance and prediction of the algorithms with higher prediction results. Keywords: Dengue Deep learning augmentation Mosquito
Convolutional neural network Data
1 Introduction According to The Star Online (2020), the world is changing rapidly, and a vast array of innovations have been seen just in the last decade that carries just about everything to the next stage. For example, houses with virtual assistants such as Google Assistant, Alexa, and Siri are transforming into smart homes. In 2019, Gartner (2019) has surveyed the companies that are operating with AI or ML. The result shows that 59% of respondents have an average of four AI/ML projects, and this shows that AI/ML is a technology trend in 2020 and 2021.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 120–131, 2022. https://doi.org/10.1007/978-3-030-93247-3_13
Detection of Invertebrate Virus Carriers Using Deep Learning Networks
121
In the past decade, mosquitoes have been seen as the main vectors of diseasecausing infectious agents that have an enormous influence on global health [2]. There is a chance of exposure to mosquito-borne diseases in more than half of the world’s human population, and more than 1 billion cases of these infections reported annually. The mosquito-borne diseases are the main threat in South-East Asia where dengue outbreaks have a significant effect on human health [5]. Besides dengue, Malaria and West Nile virus are the common mosquitoes transmitted diseases. The mosquito’s species, such as Aedes, Anopheles, and Culex, spread these diseases. This project focuses on deep learning where Tensorflow and Keras Deep Learning Library [6] will be adopted for developing and evaluating the deep learning model. This project uses a Convolutional Neural Network (CNN) [10] as the deep learning algorithm to classify the mosquito’s species such as Aedes, Anopheles, and Culex. In this article, we present convolutional neural network with computer vision able to detect and analyse the image inputs and perform classification of dengue mosquitos. We explain in Sect. 2 the Wing geometric morphometric, CNN, CNN layers, and its parameters. In Sect. 3, we introduce the dataset used in the work and the overall structure of the convolutional neural network used for this project. In Sect. 4, we have explained the various stages on the algorithm, and discuss its implementation in stages such as pooling layer, dropout layer, fully connected layer, callbacks, save model. In Sect. 5, we discuss the experimental results with different data size, accuracy, early stopping, training loss, prediction result with confusion matrix. The conclusion of the work has been covered in Sect. 6.
2 Related Work 2.1
Wing GeometricMorphometric
Wing geometric morphometrics [7] to identify the mosquito genera such as Aedes, Anopheles, and Culex. In the data collection process, every adult female mosquito right-wing will be removed. After that, the right-wings are photographed by using 40x magnification with a digital camera. The images were then digitized eighteen landmarks by using the software [7]. In the wing geometric morphometric process, a method named canonical variate analysis (CVA) is used to explore the dissimilarity of the wing shape degree in a morphospace between species and calculate the Mahalanobis distance. Next, a cross-validated for each species reclassification tests were carried out, and a Neighbor-Joining tree was built to show the patterns of species segregation [7]. Mosquito wing geometric morphometry is a proven technique of mosquito identification and is cheap and easy to use. [7] mentioned this method could classify epidemiologically relevant vector mosquito’s species whose identification has proven troublesome using other techniques. As a result, the three genus mosquito has a classification accuracy of 90% correctly, but this method is costly and requires expert knowledge and skills.
122
2.2
D. S. T. Hai et al.
Convolutional Neural Networks
A simple Convolutional Neural Network (CNN) architecture is made up of three layers, which are known as convolution layers, pooling layers, also known as subsampling layers and fully connected layers A simple CNN architecture is built when those layers are stacked, which is shown in Fig. 1. In this section, the layers of CNN will be discussed.
Fig. 1. Basic convolutional neural network layers (unknown)
A matrix of pixel values is what the computer will see when an image enters. There are various matrices that the network will see depending on the resolution and dimension of the image, such as 28 28 3 matrix, which 28 28 is the image width and height while 3 represent RGB channel [9]. 2.3
Convolutional Layers
The convolutional layer plays an essential role in how the convolutional neural network works. The convolution layer is made up of a lot of feature maps that are obtained with the input signal by the convolution of the convolution kernel. For a single-channel twodimensional (2-D) image, every kernel with convolution is a matrix of weight that can be either a 3 3 or 5 5 matrix. The convolution operation uses convolution kernels to process inputs of variable size and extract various input features in the convolution layer [10]. The convolutional layer primarily has weight sharing and sparse interaction. The convolutional layer’s weight sharing mechanism further decreases the network training parameters [10]. The weight sharing mechanism can effectively avoid the network overfitting caused by a wide range of parameters and enhance the network operating performance [3, 9, 11]. The convolutional layer’s sparse interaction not just reduces the model’s storage requirements but also needs less computation to get the output, in consequence, enhancing the model’s performance [3]. The convolutional layer can reduce the complexity of the model considerably by optimizing its output. Three hyperparameters can be optimized, such as depth, stride, and padding. Depth: The output volume depth generated by the convolutional layer can be manually adjusted to the same input region by the number of neurons within the segment. Reducing the depth can considerably decrease the network’s total number of neurons, but also decrease the model’s pattern recognition capabilities substantially.
Detection of Invertebrate Virus Carriers Using Deep Learning Networks
123
Stride: Stride is one of the hyperparameters that can reduce the convolutional layer parameters. Stride specifies the number of pixel shifts over the matrix of the input, and the default value is 1. The number of strides can increase, but it will result in having a smaller feature map as the potential locations have been skipped over [1, 3]. The filter will move one pixel every time and will have a feature map size of 5 5 only. If the amount of the stride is 2, the filters will move two-pixels every time and will have a feature map size of 3 3, this can result in reducing the overlap and the output size. Padding: There are two classes of padding, which are “valid” and “same.“ “Valid” means the convolutional layer will not pad at all and will not maintain the input size. “Same” means there is a pad around the convolutional layer so that it can keep the output size and have the same size as the input size. “Same,” also known as zeropadding, can prevent loss of information that might occur at the image boundary as the information will only be captured when the filter passes through [1]. Max Pooling: Max pooling takes the maximum value from the feature map and provides better efficiency and performance with simple linear classifiers and sparse coding. The max-pooling statistical properties make it very suited to sparse representation. However, the primary weak point of max-pooling is that the pooling operator will only extract the maximum value from the feature map and neglect other values. This condition may result in unacceptable outcomes because of the loss of information [3]. 2.4
Fully Connected Layers
The image’s feature map is extracted after a sequence of convolutional layers and pooling layers, and the feature map neurons are then converted into a fully connected layer [9]. The meaning of fully connected is each upper-level neuron is interconnected with each next-level neuron, which is shown in Fig. 1. Thus, the convolutional layer and pooling layer can be considered as a feature extractor, while the fully connected layer can be regarded as a classifier [8]. A fully-connected layer typically examines which types of categories are the closest matches with the advanced features and the weight they have. The right likelihood for the various categories can be obtained by measuring the dot product and the weight between preceding layers. The fully-connected layer processes the input and then shows the output in an N-dimensional vector, while N indicates the categories number and the probability of the categories [8]. Dropout is a technique that can help the CNN model overcome the overfitting problem. Dropout will drop the unit from the input, and hidden layers depend on the probability. Max-pooling Dropout can maintain the max-pooling layer behaviour at the same time allows other function values to influence the performance of the pooling layer. Data Augmentation can increase the training data size by 1 to 10 N and prevent the model from overfitting. The examples of data augmentation are random cropping, random rotation, and flipping. Batch Normalization is a technique similar to Dropout, which used to prevent the model from overfitting. Besides that, the higher learning rate is allowed to be used in batch normalization. Transfer Learning is essential when the training data is limited as transfer learning enables the model to have excellent performance and prevent the model from overfitting. CallBacks Function is a
124
D. S. T. Hai et al.
set of functions to investigate and find the best parameter for the model in the training phase. The examples of callbacks function are EarlyStopping, LearningRateScheduler, ModelCheckpoint, and TensorBoard.
3 Methodology The dataset will be distributed to three phases which are training, validation, and testing. First of all, the dataset will be used to train the CNN model and validate the CNN model after training. If the performance of the trained model and validated model does not meet the requirement, it is required to modify the CNN. If the performance of the model reaches the requirement, the model will be taken to perform image classification of mosquito’s species. If the predicted result does not meet the requirement, it is required to modify the CNN again until the model meets all the requirements. Figure 2 shows the overall convolutional neural network structure of this project. The CNN structure for this project is made up of 3 convolutional layers, two maxpooling layers, three dropout layers, one batch normalization layer, and one fully connected layer. The optimizer for the CNN structure is Adam. Dataset: A total of 2,111 mosquitoes images are collected through VectorBase [4]. A. Building the CNN Architecture It is essential to design the Convolutional Neural Network (CNN) and to fine-tune the hyper-parameters to get the best model. It is because the hyper-parameter value decides the behaviour of the training model and how the model learns the training data. For example, the input dimensions for each layer will decrease as the convolution operation, and pooling operation is going on. Therefore, the hyper-parameters values will affect the input dimensions and turn out the model has a negative dimension, such as filter-size, stride, padding, etc. A model with a negative dimension will cause the execution stops when executing the code. Therefore, it is essential to design the CNN architecture properly before proceeding to the implementation phase. Below is the calculation for the CNN model activation shape and parameters contained after each convolution operation and pooling operation. Activation Size is calculated by multiplying all values of Activation Shape. In Fig. 2, First of all, the activation shape for convolution operation and pooling operation is calculated by using the formula shown in Eq. (1) and Eq. (2). Next, the formula for the number of parameters for the layer shown in Eq. (3). For the first stage, the dimensions of the input image are 28 28 (width and height), and the color channel value is 1, which is grayscale. Thus, the activation shape for the input image is (28, 28, 1) and the activation size is (28 * 28 * 1 = 784). For the third stage, the convolutional layer has the same filter size, kernel size, stride value, and padding value as the layer above. As this padding for this layer is zero-padding, so the input dimensions remain the same. Thus, Activation Value is 28 * 28 * 32 = 25,088 and parameters size is 32 (32 3 3) + 32 = 9,248. For the fourth stage, the max pooling layer has a 2 2 kernel size, two stride values, and “Valid” padding. Formula 3.1 is adopted as the padding value is “Same”. Dimensions Value is ceil((28 – 2 + 1)/2) = ceil(13.5) = 14.
Detection of Invertebrate Virus Carriers Using Deep Learning Networks
125
Therefore, the activation shape is (14, 14, 32) and the activation value is 14 * 14 * 32 = 6,272. This process will continue until the flatten layer. Flatten layer connect the input dimensions to the dense layer, so the activation shape is (3,136, 1) and the activation size is 3,136 * 1 = 3,136. The next layer is the dense layer, which has a (256, 1) activation shape, and the activation size is 256 * 1 = 256. As for the parameter size, the calculation is (3136 x 256) + 256 = 803,072. The activation shape for the batch normalization layer is (256, 1), the activation size is 256 * 1 = 256, and the number of parameters are 4 * 256 = 1,024. The last layer will be using a softmax activation function to perform multi-classification.
Fig. 2. Overall structure of convolutional neural network structure
126
D. S. T. Hai et al.
ðceilðN þ f 1Þ=s, ceil(N þ f 1)/s, Number of filtersÞ
ð1Þ
ðN; N; Number of filtersÞ
ð2Þ
ðPrevious Input Layer (Filter Kernel Size)Þ þ Number of Bias
ð3Þ
4 Implementation A. Build the Convolutional Neural Network: In the pre-processing data process, the mosquito’s images have been resized to 28 28 dimensions and converted to grayscale. Thus, the input dimensions for the convolutional neural network is required to be the same with the processed images data dimensions. First of all, the first convolutional layer will receive the input training data to perform the convolutional operation. As for the hyper-parameters for this convolutional layer, the convolutional layer has 32 filters, 3 3 kernel size, and the padding is set as zero-padding. With the zero-padding, it can maintain the dimensions of the output size. After the convolutional operation, a ReLU activation function is used to replace all negative activations to 0. Activation functions can improve the non-linear properties of the model. Thus, the output size for this layer has 32 tensors of the size 28 by 28 (28 28 32). The first convolutional layer is used to detect low-level features, such as curves and edges. However, the objective of the model is to classify mosquitoes species by recognizing the mosquitoes pattern. Thus, it is not enough for the model to classify mosquitoes species with one convolutional layer. Therefore, more convolutional layer is added to make the model have a better network to recognize high-level features. As for the pooling layer, the max-pooling is adopted to perform the pooling operation. The filter size is 2 2 and the stride value is 2. Dropout Layer Dropout is a regularization technique for preventing overfitting and gives significant improvements to the neural network. The dropout layer is temporarily dropping the activation unit from the network by changing them to 0. In this project, the dropout layer is used in the input unit and the hidden unit. In the input unit, the dropout value is set as 0.25, while the dropout value is set as 0.5 in the hidden unit. Flatten Layer helps to reshape the multi-dimensional tensors into a 1D tensor. For example, the activation shape of the previous layer is (7, 7, 64), flatten layer reshape into 1D tensor (7 * 7 * 64 = 3136). Hence, the 1D tensor shape (3136) can be used as the input of the dense layer (Fully connected layer). The batch normalization layer is a regularization technique for preventing overfitting. It normalizes the activation value and replaces it with the mean value. Output Layer: Last Dense Layer is used to detect the high-level features and outputs which types of categories are the closest matches with the high-level features and the weight they have. The fully-connected layer processes the input and then shows the output in an N-dimensional vector, while N indicates the categories number and the
Detection of Invertebrate Virus Carriers Using Deep Learning Networks
127
probability of the categories. Besides that, the softmax activation functions will be used to perform multi-classification. Compilation: During the compilation of the model shows the loss function “categorical_crossentropy” is adopted as the softmax activation function is used in the dense layer [5]. Besides that, Adam will be used as the optimizer of the model [11]. Adam can perform computationally efficiently for different parameters with individual learning rates and only requires little memory. Metrics accuracy specifies a list of metrics for different outputs when evaluating the model, such as training accuracy, training loss, validation accuracy, and validation loss. Callbacks Function: During the implementation, TensorBoard is implemented as it is an essential tool that allows the researcher to visualize the dynamic graph of the training and test metrics. TensorBoard can enable the researcher to investigate the model and find the best tuning parameter for their model. Besides that, Model Checkpoint is implemented as it helps the researcher to save the best model during the training process automatically. The model checkpoint will automatically save the model with minimum validation loss. Additionally, Early Stopping is implemented to help the researcher notice when the model overfitting or under fitting. Overfitting of the training dataset may cause by too many epochs while under fitting of the training dataset may cause by too few epochs. Hence, the researcher can define a large number of training epochs during the implementation. The researcher does not need to worry about the performance of the model because the model checkpoint will stop the training when the performance of the model stops to improve. After finishing building the CNN architecture and implementing the callbacks function, the model is prepared to be trained. “X_train and to_categorical(y_train)” represents the training data and the label of the training data. Batch Size represents the number of data fit into the network and epochs represent the number of times the training dataset is fit into the neural network. Shuffle means the training data will be shuffled before every epoch and validation_data means the validation dataset and the label of the validation dataset. Callbacks represent the callbacks function, as mentioned above. Verbose with two means shows the output one line per epoch.
5 Experimental Results In this phase, the researcher will conduct some experiments. The experiments include the data size, the batch size, the regularization technique, the data augmentation technique and the model prediction results. These experiments are intent on finding out how these experiments affect the performance of the model. In this experiment, the validation data size is set as 10%, 20%, and 30% from the dataset. For example, if the validation size is 10%, then the training size is 90%. Hence, the number of training data and validation data for each test case is shown in Table 1.
128
D. S. T. Hai et al. Table 1. Test case for the datasize.
The Fig. 3 shows the training accuracy of different data sizes. As the graph shows, all of the training accuracies are increasing slowly as the training is going on. However, in the epochs 65, test case 1 has the highest training accuracy (94.53%) compared to the other two, where test case 2 has 92.31%, and test case 3 has 93.83%. The reason test case 3 does not continue training is because test case 3 has started to overfit in the validation, and the training process has been stopped by the Keras callbacks function “EarlyStooping.” All of the training loss is decreasing slowly, and test case 1 has the lowest training loss (0.1415) in epochs 65.
Fig. 3. Training accuracy of different data size
Test case 2 has 0.2097 and test case 3 has 0.1710 of the training loss. Thus, test case 1 is still the best CNN model within these test cases. As the training started, validation accuracy for both test cases began to increase rapidly. However, the performance of both test cases began to rise slowly after epochs 35. Eventually, test case 3 has the worst validation accuracy (76.61%), followed by test case 2 (80.16%), and test case 1 (84.62%) will be the best validation accuracy.
Fig. 4. Validation loss of different data size.
Detection of Invertebrate Virus Carriers Using Deep Learning Networks
129
Figure 4 shows the validation loss of different data sizes. The graphs show that test case 3 has the highest validation loss (0.7884) performance and started to overfit in epoch 33. Nevertheless, test case 2 has 0.5781 validation loss and also began to overfit in epoch 39, while test case 1 has 0.4879 validation loss and started to overfit in epoch 65. Table 2. Summary of the different data size.
In test case 1 has the highest training accuracy and validation accuracy. Besides that, test case 1 has the lowest training loss and validation loss. The performance of test case 2 and test case 3 is terrible because both test cases have lesser training data compared to test case 1. Hence, splitting the dataset to 90% of training data and 10% validation data is the best method for the CNN model. The size of the dataset is one of the biggest problems in deep learning. According to research, each class requires 1000 examples. Unfortunately, the researcher is only able to find approximately 600 mosquitoes images for each category. The reason is due to the images of mosquitoes species are limited on the internet. Table 2 shows a summary of different data sizes. Prediction Result A total of 9 images have been used as the test images. These images include 3 Aedes images, 3 Anopheles images, and 3 Culex images. These test images have not been used in the train data. So, these test images can be used for the model to perform image classification. As for the prediction result, the result is shown in Fig. 5, The confusion matrix of model predicted results. From the figure, the predicted label is at x-axis while the true label is at y-axis. The dark blue cell represents all of the test images are accurately predicted, the medium blue cell represents some of the test images are accurately predicted while the light blue cell represents the test images are incorrectly predicted.
130
D. S. T. Hai et al.
Fig. 5. Confusion matrix of the predicted model
Therefore, the 3 Aedes and Culex test images are accurately predicted while 2 Anopheles test images are correctly predicted. However, there is one Anopheles test image incorrectly predicted. The model predicts the test image as Culex but it actually is Anopheles. Hence, there are 8 out 9 of the test images are correctly predicted and 1 out 9 of the test images are incorrectly predicted.
6 Conclusion Computer vision can classify the mosquito species, such as Aedes, Anopheles, and Culex. Hence, one of the computer vision algorithms, Convolutional Neural Network (CNN), is adopted to perform image classification of mosquito species. The CNN achieves pretty high accuracy on the validation accuracy and has excellent results in the accuracy of mosquito species classification. The dataset of the mosquito species is one of the keys to success. There are more than 2,000 mosquito species images collected from the internet and fit these images to the model to increase the reliability of the model to recognize the mosquito species. Besides the dataset, the architecture of CNN is playing an essential role in this project. Plenty of experiments conducted to find the best hyperparameters for the model. Acknowledgement. The research work has conducted from the UOW Malaysia, KDU Penang University College IPA lab. The authors are thankful to the ICO2021 reviewers has given comments to improve the article.
Detection of Invertebrate Virus Carriers Using Deep Learning Networks
131
References 1. Albawi, S., Mohammed, T.A., Zawi, S.: Understanding of a convolutional neural network. In: 2017 International Conference on Engineering and Technology (ICET), pp. 1–6 (2017). https://doi.org/10.1109/ICEngTechnol.2017.8308186 2. Famakinde, D.O.: Mosquitoes and the lymphatic filarial parasites: research trends and budding roadmaps to future disease eradication. Trop. Med. Infect. Dis. 3(4), 1 (2018). https://doi.org/10.3390/tropicalmed3010004 3. Thomas, J.J., Karagoz, P., Ahamed, B.B., Vasant, P. (eds.): Deep Learning Techniques and Optimization Strategies in Big Data Analytics. IGI Global (2019) 4. Murugappan Giraldo-Calderón, G.I., et al.: VectorBase: an updated bioinformatics resource for invertebrate vectors and other organisms related with human diseases. Nucleic Acids Res. 43(Database Issue), D707–D713 (2015). https://doi.org/10.1093/nar/gku1117 5. Ismail, T.N.S.T., Kassim, N.F.A., Rahman, A.A., Yahya, K., Webb, C.E.: Day biting habits of mosquitoes associated with mangrove forests in Kedah, Malaysia. Trop. Med. Infect. Dis. 3(77), 1–8 (2018). https://doi.org/10.3390/tropicalmed3030077 6. Park, J., Kim, D., Choi, B., Kang, W., Kwon, H.: Classification and morphological analysis of vector mosquitoes using deep convolutional neural networks. Sci. Rep. 10(1), 1012 (2020). https://doi.org/10.1038/s41598-020-57875-1 7. Wilke, A., et al.: Morphometric wing characters as a tool for mosquito identification. PLoS ONE 11, 1–12 (2016). https://doi.org/10.1371/journal.pone.0161643 8. Zhang, Q.: Convolutional neural networks. In: 3rd International Conference on Electromechanical Control Technology and Transportation, pp. 434–439 (2018). https://doi.org/10. 5220/0006972204340439 9. Murugappan, M., Thomas, J.V.J., Fiore, U., Jinila, Y.B., Radhakrishnan, S.: COVIDNet: implementing parallel architecture on sound and image for high efficacy. Future Internet 13 (11), 269 (2021) 10. Chui, K.T., Gupta, B.B., Liu, R.W., Zhang, X., Vasant, P., Thomas, J.J.: Extended-range prediction model using NSGA-III optimized RNN-GRU-LSTM for driver stress and drowsiness. Sensors 21(19), 6412 (2021) 11. Thomas, J.J., Fiore, U., Lechuga, G.P., Kharchenko, V., Vasant, P. (eds.): Handbook of Research on Smart Technology Models for Business and Industry. IGI Global (2020). https://doi.org/10.4018/978-1-7998-3645-2
Classification and Detection of Plant Leaf Diseases Using Various Deep Learning Techniques and Convolutional Neural Network Partha P. Mazumder(&), Monuar Hossain(&), and Md Hasnat Riaz(&) Department of Computer Science and Telecommunication Engineering, Noakhali Science and Technology University, Noakhali 3814, Bangladesh [email protected]
Abstract. In this paper, we developed a Convolutional Neural Network model for detecting and classifying simple leaves images of (mostly) diseased plants and healthy plants with the help of different types of deep learning methodologies. We used an open database from PlantVillage dataset of 54,306 images containing 14 different plants in a set of 38 distinct classes of (diseased plants and healthy plants) to train our model. Among different model architectures were trained, the best performance reaching a 99.22% success rate using 0.4% data as testing among the whole dataset in identifying the corresponding (diseased plant or healthy plant) combination. This significantly good amount of success rate ensures the model a very useful advisory or early warning tool, and also an approach that could be further extended to uphold an integrated plant disease identification system to operate in real cultivation conditions or a clear path toward smartphone-assisted crop disease diagnosis on a large amount of areas. Keywords: Plant disease classification Xception
Neural network InceptionV3
1 Introduction Plant diseases have a longer lasting effect on agricultural products. There are estimated more than $30–50 billion annually monetary loss caused by plant diseases [1]. Modern technologies have blessed human society the potential to produce ample rations to meet the request of more than 7 billion people. However the lack of side effects of the food always remains intimidated by a number of factors such as significant amount of change in climate (Tai et al. 2014) [2], the reduce in pollinators (from the reports of Plenary of the Intergovernmental Science-Policy Platform on Biodiversity Ecosystem and Services of its 4th session, 2016) [3], plant diseases (Strange and Scott 2005) [4], and also many others. Also there are disease causing agents called pathogens. However we can lessen crop losses and also can take different types of measures to overpower specific micro-organisms if plant diseases are efficiently diagnosed and distinguished early. Thus, plant pathologists have shared their knowledge with farmers through farming communities. That’s why machine learning comes into the picture. To improve the diagnostic results, several studies on machine learning-based automated plant © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 132–141, 2022. https://doi.org/10.1007/978-3-030-93247-3_14
Classification and Detection of Plant Leaf Diseases
133
diagnosis have been conducted. Convolutional neural networks (CNNs) are widely perceived as one of the most promising classification techniques among machine learning fields. The most attractive advantage of CNN is their ability to acquire requisite features for the classification from the images automatically during their learning processes. Recently, CNN have demonstrated excellent performance in large scale general image classification tasks [5], traffic sign recognition [6], leaf classification [7], and so on. Computer vision, and object recognition techniques in particular, has made immense advancements in the past few years. The PASCAL VOC Challenge (Everingham et al. 2010) [8], and more recently the Large Scale Visual Recognition Challenge (ILSVRC) (Russakovsky et al. 2015) [8] based on the ImageNet dataset (Deng et al. 2009) [9] have been widely used as yardstick for a quantity of visualizationrelated problems in computer vision, including object classification. In 2012, a large, deep convolutional neural network achieved a top-5 error of 16.4% for the classification of images into 1000 possible categories (Krizhevsky et al. 2012) [10]. In the next 3 years, different types of advancements in deep convolutional neural networks lessened the error rate up to 3.57% (Krizhevsky et al. 2012 [10]; Simonyan and Zisserman 2014 [11]; Zeiler and Fergus 2014 [12]; He et al. 2015 [13]; Szegedy et al. 2015 [14]).
2 Related Works Research in agriculture section is focused towards improvement of the standards and the proportion of the product at less wasting with more take. The standard of the agricultural product may be debased due to different types of plant diseases. These diseases are caused by pathogens such as fungi, bacteria and viruses. With the help of different types of applications, many systems have been suggested to solve or at least to lower the problems faced by the farmers, by harnessing the use of image processing and different types of automatic classification tools. Suhaili Kutty et al. [15] discussed the process to classify Anthracnose and Downey Mildew, watermelon leaf diseases using neural network analysis. They used a digital camera with specific calibration procedure under controlled environment. Their classification is based on color feature extraction from RGB color model where the RGB pixel color indices have been extracted from identified ROI (region of interest).To reduce noise from images and for segmentation median filter is used. And for classification of the image, neural network pattern recognition toolbox is utilized. Proposed method achieved 75.9% of accuracy based on its RGB mean color component. Sanjeev Sannaki et al. [16] identify the disease with the help of image processing and AI techniques on images of grape plant leaf. In their proposed system, complex background with grape leaf image is taken as input. Noise is removed using anisotropic diffusion also the segmentation is done by k-means clustering. After segmentation, feature extraction is happened by computing Gray Level Co-occurrence Matrix. And finally classification takes place using Feed Forward Back Propagation Network classifier. Also Hue feature is used for more accurate result. Akhtar et al. [17] have implemented the support vector machine (SVM) approach procedures for the classification and detection of rose leaf diseases as black spot and anthracnose. Authors have implied the threshold method for segmentation and Ostu’s
134
P. P. Mazumder et al.
algorithm was mainly used to establish the threshold values. In this approach, different features of DWT, DCT and texture based eleven haralick features are extricated which are afterwards merged with SVM approach and predicts quite efficient accuracy value. The study of Usama Mokhtar et al. [18] incorporated with method that involves gabor wavelet transform technique to extract fitting features relevant to image of tomato leaf in coincidence with using Support Vector Machines (SVMs). They described technique of Tomato leaves diseases detection and diseases are: Powdery mildew and Early blight. Here gabor wavelet transformation is applied in feature extraction for feature vectors also in classification. Cauchy Kernel, Laplacian Kernel and Invmult Kernel methods are involved in SVM for output decision where tomato leaf infected with Powdery mildew or early blight. The proposed approach ensures excellent footnote with accuracy 99.5%. Supriya et al. [19] worked with the cotton leaves. They first captured the affected leaf and then pre-process converting into other color space. They also used Otsu’s global thresholding method during segmentation. Also color-co-occurrence method is used for extracting different features such as color and texture. Multi SVM (Multi Support Vector Machine) classifier is used for detecting the diseases. Ms. Kiran R. Gavhale et al. [20] presented number of image processing techniques to extract diseased part of leaf. For Pre-processing, Image enhancement is completed using DCT domain and thus color space conversion is done. After that segmentation is done with the help of k-means clustering. Feature extraction is done using GLCM Matrix. For classifying canker and anthracnose disease of citrus leaf, the use of SVM with radial basis kernel and polynomial kernel is done. N.J. Janwe and Vinita Tajane [21] suggested for their medical plants disease identification using Canny Edge detection algorithm, Histogram Analysis and CBIR. The identification of medical plants according to its edge features. The leaf image converts to gray scale and calculate the edge histogram. The algorithm that purposed is canny edge detection.
3 Research Methodology 3.1
Dataset
We use PlantVillage Dataset for completing this classifier. We inspect total 54,306 images of plant leaves, which have a spread of 38 class labels allotted to them. Each class label is a crop-disease pair, and we ensure an attempt to estimate the crop-disease pair given just the picture of the plant leaf. In all the methods used in this paper, we reduce the sizes of the images upto 256 256 pixels, and we carry out both the model optimization and predictions on these downscaled images. Across all our experiments, we work with the colored version of the whole PlantVillage dataset (Fig. 1).
Classification and Detection of Plant Leaf Diseases
135
Fig. 1. Example of leaf images from the Plant Village dataset, representing every crop-disease pair used [22]. (1) Apple Scab, Venturiain inaequalis (2) Apple Black Rot, Botryosphaeria obtuse (3) Cedar Apple Rust, Gymnosporangium juniperi-virginianae (4) Apple healthy, Malus (5) Blueberry healthy, Vaccinium sect.Cyanococcus (6) Cherry healthy, Prunus avium (7) Cherry Powdery Mildew, Podoshaera clandestine (8) Corn Gray Leaf Spot, Cercospora zeae-maydis (9) Corn Common Rust, Puccinia sorghi (10) Corn healthy, Zea mays subsp.mays (11) Corn Northern Leaf Blight, Exserohilum turcicum (12) Grape Black Rot, Guignardia bidwellii, (13) Grape Black Measles(Esca), Phaeomoniella aleophilum, Phaeomoniella chlamydospore (14) Grape Healthy, Vitis (15) Grape Leaf Blight, Pseudocercospora vitis (16) Orange Huanglongbing(Citrus Greening), Candidatus Liberibacter spp. (17) Peach Bacterial Spot, Xanthomonas campestris (18) Peach healthy, Prunus persica (19) Bell Pepper Bacterial Spot, Xanthomonas campestris (20) Bell Pepper healthy, Capsicum annuum Group (21) Potato Early Blight, Alternaria solani (22) Potato healthy, Solanum tuberosum (23) Potato Late Blight, Phytophthora infestans (24) Raspberry healthy, Rubus idaeus (25) Soy bean healthy, Glycine max (26) Squash Powdery Mildew, Erysiphe cichoracearum (27) Strawberry Healthy, Fragaria x ananassa (28) Strawberry Leaf Scorch, Diplocarpon earlianum (29) Tomato Bacterial Spot, Xanthomonas campestris pv.vesicatoria (30) Tomato Early Blight, Alternaria solani (31) Tomato Late Blight, Phytophthora infestans (32) Tomato Leaf Mold, Passalora fulva (33) Tomato Septoria Leaf Spot, Septoria lycopersici (34) Tomato Two Spotted Spider Mite, Tetranychus urticae (35) Tomato Target Spot, Corynespora cassiicola (36) Tomato Mosaic (37) Tomato Yellow Leaf Curl (38) Tomato healthy, Solanum lycopersicum.
136
3.2
P. P. Mazumder et al.
Measurement of Performance
To have a proper sense of how our working will perform on newly unseen data, and also to remain a track of if any of our approaches are overfitting with the new data, we go through all our experiments across a whole range of train-test set splits, namely 80– 20 (80% of the whole dataset used for training, and 20% for testing), 60–40 (60% of the whole dataset used for training, and 40% for testing), 40–60 (40% of the whole dataset used for training, and 60% for testing), 20–80 (20% of the whole dataset used for training, and 80% for testing) (Fig. 2). 3.3
Approach
We evaluate the applicability of deep convolutional neural networks for the classification problem described above. We focus on two popular architectures, namely InceptionV3 [23], and Xception [24]. To summarize1. Choice of deep learning architecture: I. InceptionV3 II. Xception 2. Choice of training mechanism: i. Transfer Learning. ii. Training from scratch 3. Choice of dataset: i. Color 4. Choice of training-testing set distribution: i. Train: 80%, Test: 20% ii. Train: 60%, Test: 40% iii. Train: 40%, Test: 60% iv. Train: 20%, Test: 80% To enable a fair comparison between the results of all the experimental configurations, we also tried to standardize the hyper-parameters across all the experiments, and we used the following hyper-parameters in all of the experiments: • • • • • •
Base learning rate: 0.001 Batch size: 32 Default Image Size: tuple (256,256) Epoch: 100 Depth: 3 Optimizer: Adam
All the above experiments were conducted using Keras, which is a fast, open source framework for deep learning. The basic results, such as the overall accuracy can also be replicated using a standard instance of Keras.
Classification and Detection of Plant Leaf Diseases
Star
Input Image
Image pre-processing and labelling
Augmentation process
NN Training
Testing
Healthy Image
Classified disease
Output result
End
Fig. 2. Flowchart of the entire work.
137
138
3.4
P. P. Mazumder et al.
Results
The overall accuracy we obtained on the PlantVillage dataset varied from (Train from scratch 80.28% (epoch = 25, Optimizer = Adam) to 92.04% (epoch = 150, Optimizer = Adam)) and (Train using InceptionV3 85.53% (epoch = 25, Optimizer = Adam)). Also using Xception model the accuracy varies from 98.625%99.22% (Table 1 and Figs. 3, 4).
Table 1. Different splits of test and train of dataset using Xception and accuracy at the end of 100 epochs Model Xception (epoch-100) 0.2% Test & 0.8% Train Xception (epoch-100) 0.4% Test & 0.6% Train Xception (epoch-100) 0.6% Test & 0.4% Train Xception (epoch-100) 0.8% Test & 0.2% Train
Accuracy Size 99.13% 88 mb 99.22%
,,
98.625% ,, 98.32%
,,
Fig. 3. Training and Validation accuracy using training from Xception (0.6% Test & 0.4% Train)
Classification and Detection of Plant Leaf Diseases
139
Fig. 4. Training and Validation loss using training from Xception (0.6% Test & 0.4% Train)
4 Conclusion Pre-trained models have been greatly used in machine learning and computer vision applications also including plant disease identification. The achievements of convolutional neural networks in object recognition and image classification has made immense advancement in the past few years. The main purpose of this system is to improve the efficiency of the automatic plant disease detection. Our above mentioned results show that a large, deep convolutional neural network can achieve significant results on a highly challenging dataset with the help of purely supervised learning. Experimental results show that the proposed system can successfully detect and classify the plant disease with accuracy of 99.22%. In future work, we will extend our database for more plant disease identification and use large number of data as training data as training purpose in classification. As we increase the training data, the accuracy of the system will be high and then we can compare the accuracy rate and speed of system.
References 1. Sastry, K.S.: Plant Virus and Viroid Diseases in the Tropics, vol. II. Springer, Heidelberg (2013). https://doi.org/10.1007/978-94-007-7820-7 2. Tai, A.P., Martin, M.V., Heald, C.L.: Threat to future global food security from climate change and ozone air pollution. Nat. Clim. Chang 4, 817–821 (2014). https://doi.org/10. 1038/nclimate2317
140
P. P. Mazumder et al.
3. Report of the Plenary of the Intergovernmental Science-PolicyPlatform on Biodiversity Ecosystem Services on the work of its fourth session (2016). Plenary of the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services Fourth session. Kuala Lumpur. http://www.ipbes.net/sites/default/files/downloads/pdf/IPBES-4-4-19Amended-Advance.pdf. Accessed 04 Jan 2021 4. Strange, R.N., Scott, P.R.: Plant disease: a threat to global food security. Phytopathology 43, 83–116 (2005). https://doi.org/10.1146/annurev.phyto.43.113004.133839 5. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1–9 (2012) 6. Hall, D., McCool, C., Dayoub, F., Sunderhauf, N., Upcroft, B.: Evaluation of features for leaf classification in challenging conditions. In: 2015 IEEE Winter Conference on Applications of Computer Vision, pp. 797–804 (2015) 7. Jin, J., Fu, K., Zhang, C.: Traffic sign recognition with hinge loss trained convolutional neural networks. IEEE Trans. Intell. Transp. Syst. 15, 1991–2000 (2014) 8. Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88, 303–338 (2010). https://doi.org/10. 1007/s11263-009-0275-4 9. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009. (IEEE) (2009) 10. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, pp. 1097–1105. Curran Associates, Inc. (2012) 11. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014) 12. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53 13. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. arXiv: 1512.03385 (2015) 14. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015) 15. Kutty, S.B., et al.: Classification of watermelon leaf diseases using neural network analysis. In: 2013 IEEE Business Engineering and Industrial Applications Colloquium (BEIAC), pp. 459–464, April 2013. IEEE 16. Sannakki, S.S., Rajpurohit, V.S., Nargund, V.B., Kulkarni, P.: Diagnosis and classification of grape leaf diseases using neural networks. In: 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT), pp. 1–5, July 2013. IEEE 17. Akhtar, A., Khanum, A., Khan, S.A., Shaukat, A.: Automated plant disease analysis (APDA): performance comparison of machine learning techniques. In: 2013 11th International Conference on Frontiers of Information Technology, pp. 60–65, December 2013. IEEE 18. Mokhtar, U., Ali, M.A., Hassenian, A.E., Hefny, H.: Tomato leaves diseases detection approach based on support vector machines. In: 2015 11th International Computer Engineering Conference (ICENCO), pp. 246–250, December 2015. IEEE 19. Patki, S.S., Sable, G.S.: Cotton leaf disease detection & classification using multi SVM. Int. J. Adv. Res. Comput. Commun. Eng. 5(10), 165–168 (2016)
Classification and Detection of Plant Leaf Diseases
141
20. Gavhale, K.R., Gawande, U.: An overview of the research on plant leaves disease detection using image processing techniques. IOSR J. Comput. Eng. (IOSR-JCE) 16(1), 10–16 (2014) 21. Tajane, V., Janwe, N.J.: Medicinal plants disease identification using canny edge detection algorithm, analysis and CBIR. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 4(6), 530–536 (2014) 22. Mohanty, S.P., Hughes, D.P., Salathé, M.: Using deep learning for image-based plant disease detection. Front. Plant Sci. 7, 1419 (2016). p. 3 23. Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, no. 1, February 2017 24. Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017)
Deep Learning and Machine Learning Applications
Distributed Self-triggered Optimization for Multi-agent Systems Komal Mehmood1(B) and Maryam Mehmood2 1
2
University of Engineering and Technolgy, UET Lahore, Lahore, Pakistan komal [email protected] Mirpur University of Science and Technology, MUST Mirpur AK, Mirpur, Pakistan [email protected]
Abstract. In this paper, distributed constrained convex optimization problem for multi-agent consensus has been investigated. The exchange of information among agents is required for convergence in Distributed optimization algorithms. Scaling of such multi-agent networks is largely hampered by the bandwidth limitation. To address this issue, we propose self-triggered based information exchange among the agents. Specifically, each agent computes its next information exchange time based on the current state information. The proposed algorithm reduces communication burden as compared to its periodic counterpart. When the consensus is reached, the interval between two consecutive information exchanges becomes large and as a result overall data rate requirement is reduced. Numerical results prove the effectiveness of the proposed self-triggered based mechanism. Keywords: Distributed networks · Event-triggered optimization · Self-triggered optimization · Multi-agent systems · Distributed convex optimization
1
Introduction
In the past few years, many efforts have been made to bring improvement to the multi-agent systems, including their cooperation, consensus, formation, optimization and so on [1–4]. Out of these, multiagent consensus is an important problem in which all the agents should achieve a common state. Many interesting results have been obtained for multi-agent consensus problems, particularly for nonlinear systems in the past decade. For example, optimal consensus have been achieved while rejecting external disturbances for a class of non-linear multiagent systems in [5]. Also, a novel algorithm have been proposed for a system with time-varying asymmetric state constraints of the agents in [6]. Second-Order Consensus for Multiagent Systems have been proposed in [7] Apart from achieving consensus in multi-agent systems, there also exist other concerns regarding most practical problems. Another important area of study is the optimization of multi-agent systems that makes all the agents converge to c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 145–154, 2022. https://doi.org/10.1007/978-3-030-93247-3_15
146
K. Mehmood and M. Mehmood
the optimal value. Specifically, distributed optimization has aroused interest in researchers. The main aim of distributed optimization is to optimize the global cost function which is described as the sum of each agents' individual cost function. Each agent is accessible only to its own local cost and to the cost of its neighbors. Therefore, optimal consensus and distributed optimization problems have been popular research topics recently. The distributed optimization problems with consensus are more complex than the consensus problems without optimization. In the past few years, a few novel discrete-time algorithms have been proposed to solve these problems, including subgradient algorithms [8,9] and alternating direction method of multipliers [10]. Also some researchers have come up with continuous-time algorithms to work with distributed optimization problems [11–16]. As we are moving towards multiple agents, we need to find ways to reduce the bandwidth and computational requirements per agent. One way to overcome this problem is to allow every agent to collect information from neighboring nodes and update its values according to some rules instead of using the traditional scheme of sampling at equal time intervals. Event-triggered algorithms for multi-agent distributed optimization problems have been proposed in [17–21]. [18,20] solve consensus problems while in [17] time varying directed balanced graph has been studied. A distributed optimization algorithm where step sizes are uncoordinated converges geometrically to the optimal solution in [19]. [21] proposed a finite-time consensus control for secondorder multi-agent systems. In event-driven schemes, a regular measurement of state error is required to be compared with a defined threshold in order to find out when to update the system parameters, we in this paper consider self-driven scheme for the multi-agent systems, where there is no need to keep track of the state error measurement but next update time is pre-computed at the previous update. In the next Sect. 2 some useful notations and graph theory are explained, Sect. 3 describes a multi-agent system with distributed optimization and analyzes the system optimality. Section 5 proposes a self-triggered algorithm. Section 6 contains a numerical example with simulation results. Section 7 gives a conclusion.
2
Preliminaries and Graph Theory
A set of N scaler agents interconnected with each other according to a communication topology explained as a communication graph G = (V,E) in which V = 1,2,...,N shows the set of nodes while E ⊆ V×V represents the set of edges. If agent i and agent j are directly connected they form an edge (i, j) ∈ E, and they are termed as neighboring agents. In this paper we are studying an undirected weighted graph G, i.e. (i, j) ∈ E and (j, i) ∈ E are equivalent. A = aij is called the weighted adjacency matrix whose diagonal entries are all zero i.e. aij = 0 for all i = j and aij > 0 for all i and j connected with each other making an edge i.e. (i, j) ∈ E. G is considered to be a connected graph that means there is
Distributed Self-triggered Optimization for Multi-agent Systems
147
a way from each i to every j through distinct nodes, determining that they are connected through adjacent nodes. A matrix D defined as the degree matrix of G is a diagonal matrix D = diag{di} with diagonal entries di = j∈Ni aij and Ni is the set of neighboring agents defined as {j ∈ V : (j, i) ∈ E}. The difference of degree and adjacency matrices D − A is the Laplacian matrix of G, denoted by L, i.e. L = D − A. For undirected graphs, L is a symmetric and positive semi-definite matrix. All the rows of L sum up to zero, and thus the vector of 1's is an eigenvector corresponding to eigenvalue λ1 (L) = 0, i.e., L1 = 0. If the graph is connected, L has one eigenvalue equal to zero and other eigenvalues are all positive, i.e., 0 = λ1 (L) < λ2 (L) ≤ ... ≤ λn (L)
3
System Model
We consider a general model of a distributed convex optimization problem for a class of multi-agent systems. The optimization problem is formulated as: min
n
ai x2i + bi xi + ci
(1)
i=1
s.t
n
xi = XD
(2)
i=1
Where i = 1, 2, 3, ..., n are the agents, xi is the state of agent i, XD is a constant such as the states of all agents always sum up to XD . The objective function is a quadratic function, and the global cost function is the sum of cost functions of all agents. The Lagrangian of the above problem is formulated as: L(x1 , x2 , ..., xn , λ) = min
n
ai x2i + bi xi + ci
i=1
+ λ(XD −
n
(3) xi )
i=1
Where λ is the Lagrange multiplier. Solving it for λ, we get ∂L(xi , λ) = 2ai xi + bi − λ = 0 ∂xi
(4)
λ = 2ai xi + bi
(5)
From 4 and 5 λ = 2a1 x1 + b1 = 2a2 x2 + b2 = .... = 2an xn + bn or
λ1 = λ2 = ... = λn = λ∗
(6)
148
K. Mehmood and M. Mehmood
from 2 and 6
n XD + i=1 n λ = 1 ∗
bi 2ai
(7)
i=1 2ai
This gives the optimal value of λ, all agents have to reach this optimal value after certain number of updates, and x∗i =
λ ∗ − bi 2ai
(8)
is the optimal value of xi . Thus 7 and 8 give us the values of λ and x of all agents when our system reaches consensus.
4
Distributed Event-Triggered Algorithm
Our main concern is to come up with a self-triggered sampler to solve the average consensus problem. To propose a self-triggered sampler, we will first address an event-triggered sampling rule proposed in [18]. Then we will design a selftriggered sampler by exploiting the error condition from that event-triggered sampling rule. The distributed event-triggered sampler as proposed in [18] is described as: aij (λi (tik ) − λj (tjk )) (9) λ˙ i = −ki jεNi
λi (k + 1) = λi (k) + dλi λi (tik )
(10)
λj (tjk )
ki = 2ai , ki > 0. Here and are the recently updated values of agent i and its neighboring agents jεNi . And tik denote the k th sampling time of agent i. The event-triggering condition for agent i as designed in [18] is: aij (λi (tik ) − λj (tjk )) (11) λi (tik ) − λi (t) ≤ ci jNi
The proof of 11 is present in [18].
5
Distributed Self-triggered Algorithm
In this section, we propose a self-triggering algorithm to solve the distributed optimization problem. Self-triggered optimization is an aperiodic sampling method where we don’t need to calculate the error and compare it with the threshold on each iteration as in the event-triggered case. Instead, at one update time we will calculate the time for the next update, i.e. at tk we will calculate the time for the next update tk+1 .
Distributed Self-triggered Optimization for Multi-agent Systems
149
Lemma: 1 Let, g(t) := λi (tik ) − λi (t) ≤ ci
aij (λi (tik ) − λj (tjk )) := δki
(12)
jNi
Then g(t) is bounded with g(t) ≤
dλi (tk ) Ldλ ,λ (t−tk ) (e i i − 1) := g(λik , t − tk ) Ldλi ,λi
(13)
where Lidλ,λ is the Lipschitz constant of dλi with respect to λi . Since g(λik , t−tk ) is increasing with time t, there will come a time t = tk+1 , when g(λik , tk+1 −tk ) = δki Then, (14) dλi (tk )Ldλi ,λi (eLdλi ,λi (tk+1 −tk ) − 1) = δki eLdλi ,λi (tk+1 −tk ) = 1 +
δki Ldλi ,λi dλi (tk )
Ldλi ,λi (tk+1 − tk ) = ln(1 + tk+1 − tk =
1 Ldλi ,λi
ln(1 +
δki Ldλi ,λi ) dλi (tk ) δki Ldλi ,λi ) dλi (tk )
Thus the algorithm for the next update time of agent i is as follows: tik+1 = tik +
1 Ldλi ,λi
ln ( 1 +
δki Ldλi ,λi ) dλi (tk )
This self-triggered sampler 15 will make sure tik+1 − tik > 0 for all k. Proof of Lemma 1: Let
g(s) = λi (tik ) − λi (s) d d g(s) = − λi (s) ds ds d g(s) = −dλi (s) ds s dλi (σ)dσ g(s) =
sk s
g(s) =
dλi (σ) − dλi (tk )dσ +
sk
s
g(s) = sk
s
dλi (tk )dσ sk
s dλi (σ) − dλi (tk ) (λi (σ) − λi (tk ))dσ + dλi (tk )dσ λi (σ) − λi (tk ) sk s s g(s) = Ldλi ,λi g(σ)dσ + dλi (tk )dσ sk
sk
(15)
150
K. Mehmood and M. Mehmood
Taking norm on both sides, s g(s) ≤ Ldλi ,λi g(σ)dσ + sk
s
dλi (tk )dσ
sk
By Leibniz Theorem, d g(s) ≤ Ldλi ,λi g(s) + dλi (tk ) ds g(s) ≤
dλi (tk ) Ldλ ,λ (s−sk ) (e i i − 1) Ldλi ,λi
Remark 1: After every event, we find out which agent has the nearest next update in time. When an update time of agent i, tik is reached, it will calculate the value of λ˙ i using the current state information from its neighboring agents, and update it’s state according to (9). Then time for next update tik+1 will be calculated using (15). In the meantime, agent i will transmit it’s updated information to it’s neighbors, and the neighbors on receiving this information will update their state values too and will also recalculate the time for their coming updates respectively. 5.1
Communication Delays
Till this point we have ignored the communication delays between the agents. In this section, we will consider the possible delay of the updated information from the neighboring agents to reach agent i. Let τij be the time it takes for the data to transfer from agent i to agent j or vice versa, i.e. τij = τji Then the new update equation is given by, aij (λi (tik ) − λj (tik − τij )) (16) λ˙ i = −ki jεNi
Here tik is the time when agent i will update its state using the latest state information from it’s neighbors. And the latest update from the neighbors which can reach i before time tik is the value at time tik − τij for any agent j. After calculating λ˙ i , agent i will calculate the time for its next update tik+1 using (15). For this agent i needs to compute the value of δki first, from (11) and (12) the new value of δki with added communication delay is as follows, δki := ci aij (λi (tik ) − λj (tik − τij )) jNi
Distributed Self-triggered Optimization for Multi-agent Systems
151
Hence, by following Lemma 1, Eq. 14 with maximum possible delays in arrival of states at node i is given by ∗
dλi (tk )Ldλi ,λi (eLdλi ,λi (tk+1 −tk −τij ) − 1) = δki Where
(17)
∗ τij = max τij jNi
and ∗ + tik+1 = tik + τij
6
1 Ldλi λi
ln(1 +
δki Ldλi λi ) dλi (tk )
(18)
Numerical Example
This section contains a numerical example to prove the effectiveness of the proposed algorithm. Consider a multi-agent system with 6 agents, connected via undirected graph with fixed communication topology, as shown in Fig. 1. In Table 1, there are the values of coefficients ai and bi of the cost functions of each agent.
Fig. 1. Communication topology
Let XD = 600. The initial values of agents are set as x(0) = (100, 90, 90, 120, 110, 90)T . From Eq. 5 initial values of λi of each agent are λ1 (0) = 20.42, λ2 (0) = 16.37, λ3 (0) = 21.43, λ4 (0) = 23.70, λ5 (0) = 19.45, λ6 (0) = 18.08. The optimal value of λ as calculated by Eq. 7 is λ∗ = 19.76, and the corresponding optimal values of xi are x∗1 = 96.5966, x∗2 = 113.5872, x∗3 = 82.0788, x∗4 = 96.0156, x∗5 = 112.1391, x∗6 = 99.5827. The constant ki = (2ai ). The proposed algorithm with the above mentioned parameter values are illustrated by a computer simulation. Figure 2 shows the changing values of λi on every update. According to the update equation, the change in the state of each agent also depends on the states of its neighboring agents. It can be seen that
152
K. Mehmood and M. Mehmood Table 1. Coefficients of all agents i
1
2
3
4
5
6
ai 0.096 0.072 0.105 0.082 0.074 0.088 bi 1.22
3.41
2.53
4.02
3.17
2.24
ci 51
31
78
42
62
45
after a few iterations all agents achieve consensus and all λs reach their optimal value. Similarly, Fig. 3 shows the corresponding evolution of xi where all agents achieve their respective optimal values. In Fig. 4, we have plotted the inter-sampling times for all six agents against the number of iterations using self-triggered sampling algorithm and it is also compared with a respective periodic update rule. The sampling period for periodic updates is set to 0.2 ms and it remains constant for all iterations. On the other hand, the inter-sampling times of all agents for self-triggered sampling are increasing with every iteration, because with every iteration the states of agents are updated and they are coming closer to consensus. When reached consensus the inter-sampling times become very large thus reducing the communication between agents during a steady state. For the next result, we added a transient at node 1 that changed the state of agent 1, and as a result a disturbance occurred in the system reducing the inter-sampling times as shown in Fig. 5. A transient at node 1 also disturbed its neighbors, thus we can see a drastic fall in the inter-sampling times of agent 1 and it’s neighbors due to the transient. In Fig. 6, we plotted the inter-sampling times of agent 1 with and without a transient to have a clear idea of how the proposed sampler behaves at a transient. Finally Fig. 7 shows another way of plotting the self-triggered samples against time, for the case without a transient occurring at node 1, showing the gap between updates for all agents individually. It can be seen that the inter-sampling times of all agents were very small in the beginning and increased with time as the system moves towards steady state. Here the x-axis is showing time in seconds and y-axis is showing the agents’ numbers.
Fig. 2. The evolution of λi with selftriggered sampling.
Fig. 3. State evolution triggered sampling.
with
self-
Distributed Self-triggered Optimization for Multi-agent Systems
153
Fig. 4. Inter-sampling times obtained with the self-triggered implementation and a periodic sampler with intersampling time 0.0002 s.
Fig. 5. A sudden decrease in the intersampling times when a transient occurs at node 1.
Fig. 6. Inter-sampling time of agent 1 with and without transient.
Fig. 7. Samples of each agent.
7
Conclusion
In this paper, a distributed convex optimization problem for a class of multiagent systems have been studied. The aim of this paper is to reduce computational complexity and unnecessary communication between agents. Thus, a selftriggered algorithm has been proposed which, unlike event-triggered algorithm needs no measurement error of the agents but only the knowledge of the state of the neighboring agents. The proposed algorithm is simple and helps to reach consensus while reducing the number of triggering events, controller updates and communication transmission. The algorithm has been extended to the case with communication delays. In future, we are looking forward to extend the proposed approach by introducing multi-agent systems having directed graphs, also the communication topology can be changed to switching topology.
References 1. Wen, X., Qin, S.: A projection-based continuous-time algorithm for distributed optimization over multi-agent systems. Complex Intell. Syst., 1–11 (2021) 2. Margellos, K., Falsone, A., Garatti, S., Prandini, M.: Distributed constrained optimization and consensus in uncertain networks via proximal minimization. IEEE Trans. Automatic Control (2017) 3. Qiu, Z., Liu, S., Xie, L.: Distributed constrained optimal consensus under fixed time delays. In: Control, Automation, Robotics and Vision (ICARCV), 2016 14th International Conference on IEEE, pp. 1–6 (2016)
154
K. Mehmood and M. Mehmood
4. Feng, K., Wang, Y., Zhou, H., Wang, Z.H., Liu, Z.W.: Second-order consensus of multi-agent systems with nonlinear dynamics and time-varying delays via impulsive control. In: Control and Decision Conference (CCDC), 2016 Chinese. IEEE, pp. 1304–1309 (2016) 5. Wang, X., Hong, Y., Ji, H.: Distributed optimization for a class of nonlinear multiagent systems with disturbance rejection. IEEE Trans. Cybern. 46(7), 1655–1666 (2016) 6. Meng, W., Yang, Q., Si, J., Sun, Y.: Consensus control of nonlinear multiagent systems with time-varying state constraints. IEEE Trans. Cybern. (2017) 7. Su, H., Liu, Y., Zeng, Z.: Second-order consensus for multiagent systems via intermittent sampled position data control. IEEE Trans. Cybern. 50(5), 2063–2072 (2020). https://doi.org/10.1109/TCYB.2018.2879327 8. Lou, Y., Shi, G., Johansson, K.H., Hong, Y.: Approximate projected consensus for convex intersection computation: convergence analysis and critical error angle. IEEE Trans. Automatic Control 59(7), 1722–1736 (2014) 9. Nedic, A., Ozdaglar, A.: Distributed subgradient methods for multi-agent optimization. IEEE Trans. Automatic Control 54(1), 48–61 (2009) 10. Wei, E., Ozdaglar, A.: Distributed alternating direction method of multipliers. In: Decision and Control (CDC), 2012 IEEE 51st Annual Conference on IEEE, pp. 5445–5450 (2012) 11. Huang, B., Zou, Y., Meng, Z.: Distributed continuous-time constrained convex optimization with general time-varying cost functions. Int. J. Robust Nonlinear Control 31(6), 2222–2236 (2021) 12. Yang, S., Liu, Q., Wang, J.: A multi-agent system with a proportional-integral protocol for distributed constrained optimization. IEEE Trans. Automatic Control 62(7), 3461–3467 (2017) 13. Gharesifard, B., Cort´es, J.: Distributed continuous-time convex optimization on weight-balanced digraphs. IEEE Trans. Automatic Control 59(3), 781–786 (2014) 14. Kia, S.S., Cort´es, J., Mart´ınez, S.: Distributed convex optimization via continuoustime coordination algorithms with discrete-time communication. Automatica 55, 254–264 (2015) 15. Liu, Q., Wang, J.: A second-order multi-agent network for bound-constrained distributed optimization. IEEE Trans. Automatic Control 60(12), 3310–3315 (2015) 16. Shi, G., Johansson, K.H., Hong, Y.: Reaching an optimal consensus: dynamical systems that compute intersections of convex sets. IEEE Trans. Automatic Control 58(3), 610–622 (2013) 17. Li, H. Liu, S., Soh, Y.C., Xie, L.: Event-triggered communication and data rate constraint for distributed optimization of multiagent systems. IEEE Trans. Syst. Man Cybern. Syst. (2017) 18. Chen, G., Dai, M., Zhao, Z.: A distributed event-triggered scheme for a convex optimization problem in multi-agent systems. In: Control Conference (CCC), 2017 36th Chinese. IEEE, pp. 8731–8736 (2017) 19. L¨ u, Q., Li, H., Liao, X., Li, H.: Geometrical convergence rate for distributed optimization with zero-like-free event-triggered communication scheme and uncoordinated step-sizes. In: Information Science and Technology (ICIST), 2017 Seventh International Conference on IEEE, pp. 351–358 (2017) 20. Li, X., Tang, Y., Karimi, H.R.: Consensus of multi-agent systems via fully distributed event-triggered control. Automatica 116, 108898 (2020) 21. Li, Q., Wei, J., Yuan, J., Gou, Q., Niu, Z.: Distributed event-triggered adaptive finite-time consensus control for second-order multi-agent systems with connectivity preservation. J. Franklin Institute (2021)
Automatic Categorization of News Articles and Headlines Using Multi-layer Perceptron Fatima Jahara, Omar Sharif , and Mohammed Moshiul Hoque(B) Department of Computer Science and Engineering, Chittagong University of Engineering and Technology, Chittagong 4349, Bangladesh [email protected], {omar.sharif,moshiul 240}@cuet.ac.bd Abstract. News categorization is the task of automatically assigning the news articles or headlines to a particular class. The proliferation of social media and various web 2.0 platforms usage has resulted in substantial textual online content. The majority of this textual data is unstructured, which is extremely hard and time-consuming to organize, manipulate, and manage. Due to the fast and cost-effective nature, automatic news classification has attained increased attention from news agencies in recent years. This paper introduces a deep learning-based framework using multilayer perceptron (MLP) to classify Bengali news articles and headlines into multiple categories:accident, crime, entertainment, and sports. Due to the unavailability of the Bengali news corpus, this work also developed a dataset containing 76343 news articles and 76343 headlines. Additionally, this work investigates the performance of the proposed classifier using five-word embedding techniques. The comparative analysis reveals that MLP with Keras embedding layer outperformed the other embedding models achieving the highest accuracy of 98.18% (news articles) and 94.53% (news headlines). Keywords: Natural language processing · Text classification categorization · Deep learning · News corpus
1
· News
Introduction
With the rapid increase of online news sources and the availability of the Internet, people are preferring to read daily news from news portals. Thousands of news portals are constantly providing updated news articles and headlines every hour n the Bengali text. Most of these are textual contents are unorganized or unstructured. Thus, it has become almost impracticable for a group of editorials to categorize these massive amounts of news articles by reading each of them. Moreover, the variability of different arrangements and categorization methods makes it troublesome for the users to decide their favored news articles/headlines for a particular class without browsing through exclusive news portals. Manual classification of massive online news content is time-consuming, complicated, and costly due to its messy nature. Thus, an automatic news classification can be a c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 155–166, 2022. https://doi.org/10.1007/978-3-030-93247-3_16
156
F. Jahara et al.
potential solution that uses deep learning (DL) and NLP in a more agile, inexpensive, and reliable way to analyze a massive amount of news content. Bengali news classification can help make the user experience better and web searching using news categories. Moreover, Bengali online news portals can be used to sort news articles/headlines and search an article easily. Textual news classification is the process of classifying or tagging texts collected from news articles into their predefined categories. Although several techniques are available to develop news classification in high-resource language, a minimal number of researches have been conducted on news classification in low-resource languages, including Bengali. Bengali is the 7th most widely spoken language globally, and with the rapid growth of Bengali news articles portals, the necessity of an automated classification system becomes an ought. Classification of Bengali news concerning both articles and headlines is challenging due to the lack of standard news corpus and NLP tools. Additionally, variations of textual contents in different news classes produce an imbalanced dataset making the classification task more complicated. Most of the previous studies on Bengali news classification systems focused on classifying news based on headlines or articles using ML techniques with TF-IDF features which provided lower accuracy. To address these issues, this work introduces a deep learning technique (i.e., multi-layer perceptron) with Word2Vec embedding to categorize Bengali news articles and headlines. The specific contributions of this work are: – Develop a corpus containing 76343 news articles (523403 unique words) and 76343 news headlines (61470 unique words) into four news classes. – Investigate various word embedding techniques including Keras embedding, Word2Vec, and FastText with parameters tuning. – Propose a multi-layer perceptron (MLP)-based model with optimizing hyperparameters to classify Bengali news articles and headlines. – Investigate and compare the performance of the proposed model with existing techniques.
2
Related Work
Several studies have been carried out on textual news classification in highresource languages such as English and Chinese. However, Bengali news classification is in the primitive stage to date. Cecchini et al. [1] used three models: SVM, MAXENT, and CNN with TF-IDF to classify Chinese news articles. Stein et al. [2] proposed NB, DT, RF, SVM, and MLP for English news article classification based on both authors and topics. Mandal et al. [3] proposed an ML-based technique to classify Bengali web texts using DT, KNN, NB, and SVM on 1000 documents. Alam et al. [4] used LR, NN, NB, RF, and Adaboost models with Word2Vec and TF-IDF feature extractions for Bengali news articles classification where NN with Word2Vec outperformed other models. Recent work used m-BERT and XLM-RoBERTa to categorize Bengali textual news [5]. Rabib et al. [6] used several ML techniques
Automatic Categorization of News Articles and Headlines
157
(e.g., SVM, NV, RF, and LR) for news classification. This work also used BiLSTM and CNN for fine-tuned predictions of Bengali news into 12 categories. A recent research used the DL-based methods, including MLP, CNN, RNN, LSTM, C-LSTM, Bi-LSTM, HAN, CHAN to classify news articles and titles into 10 predefined categories where Bi-LSTM with Word2Vec (Skip-gram) performed the best [7]. Shopon et al. [8] proposed a BiLSTM model to classify Bengali news articles into 12 different categories based on news captions. Shahin et al. [9] investigated ANN, SVM, LSTM, and Bi-LSTM model for Bengali news headlines classification into 8 classes where Bi-LSTM has outperformed the other models. Most of past studies focused on classifying Bengali news either based on articles or headlines. These studies concentrated on using default models without mentioning hyperparameters optimization. Moreover, none of the work investigated Keras embedding layer on the Bengali news classification task. This work proposes an MLP-based news article and headlines classification system with Keras embedding layers and hyperparameters optimization to address the weaknesses of past news classification systems in Bengali.
3
News Corpus Development
Due to the unavailability of the standard corpus in Bengali, this research developed a textual news corpus. The corpus development process has followed the directions suggested by Das et al. [10]. Steps for dataset preparation are: – Data Accumulation: The data collection task was automated by webscraping the news portals. A total of 76359 news articles are accumulated from 5 renowned sources: Prothom Alo, Daily Nayadiganta, Daily Samakal, Kaler Kantho, and Bhorer Kagoj (Table 1b). The news articles collection span from the year 2015 to 2020. Five participants have been accumulated data from these sources. We have scraped almost 76343 news articles to create the corpus for the news classification system. The crawling is done based on four categories: accident, crime, sports, and entertainment. The title, author, date, and description are the four main meta information that also encoded. The news articles and headlines are considered to crawl data. – Data Cleaning: Data cleaning is the process of preparing data by removing unnecessary data, which prevents coarse data from providing inaccurate results. General Bengali news articles may contain Bengali as well as English digits and also some foreign words. These digits, foreign words, and various punctuations have been removed from the dataset due to their insignificance in model classification. We have also removed 398 common Bangla stop words to focus on the critical words that bear meaning in the data context. The raw data contained 15,655,516 total words and 662,996 unique words in the news articles, but after the preprocessing and cleaning, a total of 11,843,411 words with 523,403 unique words are remaining included in the corpus. The news headlines raw data contained 516,781 total words and 60,167 unique words. A total of 373,860 words (61,470 unique words) have been included in the corpus for the headlines.
158
F. Jahara et al.
3.1
Data Statistics
The developed corpus contains 76,343 news documents where news articles contributed 15,655,516 words and headlines contribution is 516,781 words. Table 1 highlights the summary of the developed corpus in each category. Table 1. Class-wise data statistics. Class
Data
Total words Unique words Articles Headlines Articles Headlines
Accident
11841
2,077,659
Crime
74,387
100,018
6,834
11,222
2,998,583
78,160
127,494
10,175
Entertainment 17,568
2,828,029
120,263
166,258
20174
Sports
35,712
7,751,245
243,971
269,226
22,984
Total
76,343 15,655,516 516,781
662,996 60,167
Figure 1a shows the distribution of the dataset into four categories where 46.8% data belongs to the Sports category while the other classes contributed a similar proportion of data. Figure 1b shows the wise source distribution of the news corpus where the maximum amount of data is accumulated from the news portal of ‘Kaler Kantho’.
News portal Kaler Kantho Prothom Alo Bhorer Kagoj Daily Nayadiganta Daily Samakal
Data 41950 20613 10795 2066 919
(b) Source-wise data distribution. (a) Class-wise data distribution.
Fig. 1. Summary of data distribution in the corpus.
4
News Classification Framework
The proposed research aims to develop a news article classification framework that can classify the Bengali news articles and headlines into four predefined categories. Figure 2 demonstrates the abstract framework of the proposed news classification, which consists of four main modules: preprocessing, embedding model generation, classification model generation, and prediction.
Automatic Categorization of News Articles and Headlines
159
Fig. 2. Abstract view of the proposed news classification framework.
4.1
Data Preprocessing
Several prepossessing on raw data is a necessity before feeding it to the embedding or classification models. – Label Encoding: This is the process of converting labels into numeral values which specifically converts text categories into machine-readable form. In this work, all labels are encoded with the unique integer (from 1 to 4). – Tokenization: This process divide a sentence/text into a sequence words called tokens [11]. A token is a string of contiguous characters grouped as a semantic unit and delimited by space, punctuation marks, and newlines. Each news is tokenized, producing a total of 11,843,411 tokens for articles and 373,860 for headlines. – Word Encoding: This process transforms the words of a text into numbers to maps each unique word to a particular value (for example, 224 for the word ). A pre-defined set of words in the tokenized train set is chosen by limiting the number of most frequent words for encoding with a word index (ranging from 1 to 331530). – Text Sequencing: This process converts a text document into a list of integers. Each word in the news (i.e., articles and headlines) is assigned with an integer value in the document. – Padding: All the articles do not appear as the same length in the corpus. Thus, to train the model, padding is used for scaling the lists into the same length [12]. This work used post padding with ‘0’ at the end of the sequence to make them of the same length 2642. The length greater than 2642 is truncated. 4.2
Word Embedding
Word embedding is a type of feature extraction technique for selecting a set of relevant features from the texts, reducing the amount of input data for classifier training. Although several embedding techniques are extensively used in text
160
F. Jahara et al.
classification, this work used the three most common techniques: Keras embedding layer, Word2Vec, and FastText. FastText exploit sub-word information to construct an embedding model where word representations are learned from the character n-grams and the sum of n-gram vectors [13]. Two variants: a continuous bag of words (CBOW) and Skip-gram are considered for Word2Vec and FastText with ‘Gensim’. Table 2 summarizes the most common parameters with their corresponding values used to generate embedding models. Table 2. Parameters of embedding models. Embedding
Attribute
Description
Optimal value
Keras embedding layer
input dim
Size of the vocabulary
100,000
output dim
Embedding dimension
800
input length
Input sequence length
2642
sg
Training algorithm: CBOW(0) or Skip-gram(1)
0,1
Size (output dim)
Embedding dimension
800
Window
Maximum distance between a target 5 word and words around the target word
min count
Minimum count of words
5
input length
Input sequence length
2642
Word2Vec & FastText
For Word2Vec and FastText (for both CBOW and Skip-gram), the same set of values have been used. The embedding model maps the vocabulary and represents each word through a feature vector of dimension (input length x output dim) (2642 × 800) which is then fed to the classifier model. The resultant vector is a dense vector consisting of real values. The embedding layer works like a lookup table where the words are the keys, and the feature vectors are the values. 4.3
Classifier Model Generation
This work proposes an MLP-based deep learning model to classify Bengali news articles and headlines. Moreover, several ML classifiers such as LR, RF, NB, DT, KNN, and SVM are also trained with TF-IDF features to investigate the performance of the textual news classification task. MLP consists of an input layer, an output layer, and one hidden layer between these two layers. The neurons in the hidden layers perform the classification of the features with the help of non-linear activation functions (ReLU) and predefined weights and biases. Figure 3 illustrates the architecture of the MLP classifier model.
Automatic Categorization of News Articles and Headlines
161
Fig. 3. Architecture of MLP-based model for news classification.
The input layer takes the processed train set and passes them to the adjacent layer of the model. Since no processing is done in this layer, the input and output shapes are the same: (None, input length) (None, 2642) where input length denotes the length of the input prepossessed data. The embedding layer extracts the features, which generates a feature matrix of size (None, input length, output dim) (None, 2642, 800) input length denotes the length of the input data and output dim represents the dimension of the embedding. The matrix is then flattened into a one-dimensional vector of shape (None,output dim) (None, 800) using the GlobalAveragePooling 1D layer. The flattened vector then passes to the dense layer, a hidden layer with 450 units that use the ‘relu’ activation function to learn the specific learning parameters. To avoid overfitting dropout rate of 0.5 is used through the dropout layer. The output shape of this layer is (None, units) (None, 450). The final dense layer produces the output prediction results of shape (None, units) (None, 29). It has several units equal to the number of categories and uses the ‘softmax’ activation function to predict class labels’ probabilities. The generated model is first compiled using the ‘Adam’ optimizer with a learning rate of ‘0.001’ and the ‘sparse categorical crossentropy’ loss function. For tuning the model, we have used ‘accuracy’ as the metric. The preprocessed train set and validation set and their encoded labels are fed to the compiled model for training and tuning. We have adopted a set of necessary hyperparameters for tuning the model to get the optimal value of the parameters. After the model is tuned through hyperparameter optimization, the optimal values are used to train the model. The classifier model is trained by applying the optimal set of parameters and using the train set to create the trained classifier model. 4.4
Prediction
The trained classifier model is used to predict the labels of the unseen test samples. Class labels of the unlabeled test set are used for the evaluation of the trained classifier model. A total of 7592 news articles of the pre-processed test set is fed to the trained model, which then predicts the label of the news articles using the ‘softmax’ probability distribution (Eq. 1).
162
F. Jahara et al.
exp(θi ) Sof tmax(θi ) = n i=1 exp(θi )
(1)
here, θi denotes the output feature vector from the trained model, and n represents the number of categories. The output values range from 0 to 1, and the class one with the highest probability is taken as the predicted label.
5
Experiments
Classifier models implemented on python 3.6.9 framework with scikit-learn 0.22.2 packages. Pandas and NumPy 1.18.5 are used to prepare the data. The ‘Scikitlearn’ is used to implement the machine learning classifiers. Parameters of the classifiers are selected by trial and error approach during experimentation. The experiments have been performed on a general-purpose computer with an Intel® CoreTM i3-5005U CPU running at 2.00 GHz, 4.0 GB RAM, and 64-bit Windows 10 Pro. Google Colab with Keras and Tensorflow backend is used as the deep learning framework. Overall, 80% (61122 documents) of data is used for training, 10% (7629 documents) for validation and 10% (7529 documents) for testing purposes. The MLP model with the embedding models is tuned with different hyperparameters. Table 3 illustrates a summary of the hyperparameters utilized by the MLP classifier. Table 3. Hyper-parameters summary. Hyper-parameter Search space
Optimal value
Learning rate
[ 0.1, 0.01, 0.001, 0.0001, 0.00001, 0.000001]
0.001
Optimizer
[Adam, Adamax, Adagrad, Adadelta, Nadam, RMSprop, SGD, Ftrl]
Adam
Activation function
[ReLU, tanh, sigmoid, softmax, softplus, softsign, ReLU selu, elu]
Dropout
[0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
0.5
No. of hidden layers
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
1
No. of units (per [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, hidden layer) 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 900, 1000]
450
Batch size
[1, 32, 64, 128, 256, 1024, 61122]
32
Embedding dimension
[10, 50, 100, 150, 200, 250, 300, 350, 400, 500, 600, 700, 800]
800
Vocab size
[5k, 10k, 30k, 50k, 80k, 100k ,330k ]
100k
Automatic Categorization of News Articles and Headlines
6
163
Results Analysis
Several measures such as accuracy (Ac), precision (Pr), recall (Re), and F 1score are considered to evaluate the proposed textual news classification model on the developed corpus. Table 4 shows the accuracy of the proposed MLP-based news classification system with different embedding techniques. Table 4. Performance of MLP-based Bengali news classification. Category embedding techniques News articles Ac (%) Pr
News headlines Re
F1 -score Ac (%) Pr
Re
F1 -score
Keras embedding layer
98.18
0.98 0.98 0.98
94.53
0.95 0.95 0.95
Word2Vec (CBOW)
97.47
0.97 0.97 0.97
72.34
0.71 0.72 0.71
Word2Vec (Skip-gram)
97.77
0.98 0.98 0.98
82.97
0.84 0.83 0.83
FastText (CBOW)
96.98
0.97 0.97 0.97
97.35
0.97 0.97 0.97
FastText (Skip-gram)
97.35
0.97 0.97 0.97
79.40
0.80 0.79 0.80
Results indicate the proposed MLP model with Keras embedding achieved the highest scores for categorizing the news articles (98.18%) and headlines (94.53%) compared to other embedding techniques. Since the Keras embedding layer works as a layer of the neural network, it gets trained with the MLP model through backpropagation. This helps Keras embedding layer to learn relevant features effectively and thus performs better than pre-trained Word2Vec and FastText models. Table 5 shows the class-wise performance of the proposed Bengali news classification model with Keras embedding layer concerning news articles and headlines. Table 5. Category-wise performance measures. Category name
News articles Ac
Pr
Re
News headlines F1 -score Ac
Pr
Re
No. of test data F1 -score
Sports
97.63 1.00 0.97 0.99
96.80 0.97 0.96 0.97
3564
Entertainment
99.02 0.95 0.99 0.97
91.78 0.90 0.94 0.92
1747
Accident
98.80 0.98 0.99 0.99
94.53 0.96 0.92 0.94
1171
Crime
98.01 0.98 0.98 0.98
91.56 0.91 0.92 0.92
1110
Weighted average 98.18 0.98 0.98 0.98
95.53 0.95 0.95 0.95
7592
Category-wise performance analysis revealed that Entertainment obtained the maximum accuracy of (99.02%) but Sports and Accident obtained the highest F1 -score (0.99) for news articles categorization. On the other hand, as regards
164
F. Jahara et al.
news headlines, the Sports class achieved both the highest accuracy (96.80%) and the highest F1 -score (0.97) among all classes. Although Entertainment category obtained the maximum accuracy, F1 -score is decreased because of several Sports class data is misclassified as Entertainment. However, concerning headline categorization, the Sports category is performed better due to the several sports headlines that contained the relevant contents.
7
Comparison with Existing Approaches
We investigated the performance of existing techniques [3,4] on the developed dataset. Table 6 shows a comparative accuracy measure of previous techniques of textual news classification in Bengali. Table 6. Performace comparison. Method
Techniques
Ac (%) News articles News headlines
Alam et al. [4]
LR + TF-IDF RF + TF-IDF NB + TF-IDF
97.92 97.22 97.07
92.77 90.61 92.88
93.77 54.08 98.01
88.31 59.68 93.82
Mandal et al. [3] DT + TF-IDF KNN + TF-IDF SVM + TF-IDF Proposed
MLP + Keras embedding 98.18
94.53
The comparative analysis showed that the proposed method outperformed previous techniques achieving the highest classification accuracy of 98.18% (news articles) and 94.53% (news headlines). The possible reason is that the MLP method learns by estimating errors and update weights through backpropagation, whereas ML models can not. Again TF-IDF uses a pre-trained feature matrix, whereas the Keras embedding layer has the privilege of getting trained and updating the embedding values through model training.
8
Error Analysis
The results confirmed that the MLP model with Keras embedding performed better than other models for Bengali news classification. To better understand the model’s performance, a detailed error analysis is performed using the confusion matrix Fig. 4. It is observed that the Sports category gained the most accurate predictions (3474 true positives out of 3564) (Fig. 4a). However, 86 Sports data was misclassified as Entertainment class. Few Sports news such as
Automatic Categorization of News Articles and Headlines
(a)
165
(b)
Fig. 4. Confusion matrix.
(Bangladesh national cricket team lefthanded opener Soumya Sarkar tied the knot with Khulna’s daughter Priyanti Debnath Pooja.), includes Entertainment contents as a celebration in the Sports class that the classifier often confused. Moreover, some Crime articles can also be classified as Accident class which leads to 19 misclassified Crime news. As regards to the headlines classification, Accident and Crime data are mostly misclassified with each other. In particular, 42 Accident data are misclassified as Crime and 54 Crime data are misclassified as Accident (Fig. 4b). The model is confused for an Accident head(Three senline tenced to life imprisonment, two acquitted in Dia-Rajiv death case) with the Crime data. In most cases, news headlines are not well-structured and are mostly deprived of the inner context of the news data, which causes the reduced performance. In contrast, news articles hold more words than headlines, which helps the classifier model learn more relevant features to distinguish classes.
9
Conclusion
This work introduced a multilayer perceptron (MLP)-based textual news articles and headlines classification model in Bengali. Due to the unavailability of the standard corpus, a Bengali news corpus is developed to perform the news classification task. This work investigated five embedding techniques with a tuned MLP classifier model. Moreover, the performance of the proposed MLP-based model is compared to six ML-baselines (with TF-IDF features). Results showed that MLP with Keras embedding layer achieved the highest news classification accuracy of 98.18% (for news articles) and 94.53% (for headlines) on the developed corpus. Moreover, the proposed model outperformed the existing ML baselines for classifying the Bengali news categorization. The performance of the current implementation can be enhanced with more data in the corpus from other news
166
F. Jahara et al.
categories (such as politics, technology, business, and so on). More investigations with larger or shorter headlines and extensive article contents can be performed. Other embeddings (i.e., GloVe, bag-of-words) and classifier models like CNN, LSTM may also investigate to improve performance. Multi-label news classification can be addressed for the generalization of the model. Acknowledgement. This work was supported by the CUET NLP Lab, Chittagong University of Engineering & Technology, Chittagong, Bangladesh.
References 1. Cecchini, D., Na, L.: Chinese news classification. In: IEEE International Conference on Big Data and Smart Computing (2018) 2. Stein, A.J., Weerasinghe, J., Mancoridis, S., Greenstadt, R.: News article text classification and summary for authors and topics. Comput. Sci. Inf. Technol. (CS & IT) 10, 1–12 (2020) 3. Mandal, A.K., Sen, R.: Supervised learning methods for bangla web document categorization (2014) 4. Alam, M.T., Islam, M.M.: Bard: bangla article classification using a new comprehensive dataset. In: 2018 International Conference on Bangla Speech and Language Processing (ICBSLP) (2018) 5. Alam, T., Khan, A., Alam, F.: Bangla text classification using transformers. CoRR abs/2011.04446 (2020) 6. Rabib, M., Sarkar, S., Rahman, M.: Different machine learning based approaches of baseline and deep learning models for Bengali news categorization. Int. J. Comput. Appl. 176(18), 10–16 (2020) 7. Rahman, R.: A benchmark study on machine learning methods using several feature extraction techniques for news genre detection from bangla news articles & titles. In: 7th International Conference on Networking, Systems and Security (2020) 8. Shopon, M.: Bidirectional LSTM with attention mechanism for automatic Bangla news categorization in terms of news captions. In: Mallick, P.K., Meher, P., Majumder, A., Das, S.K. (eds.) Electronic Systems and Intelligent Computing. LNEE, vol. 686, pp. 763–773. Springer, Singapore (2020). https://doi.org/10.1007/ 978-981-15-7031-5 72 9. Shahin, M.M.H., Ahmmed, T., Piyal, S.H., Shopon, M.: Classification of bangla news articles using bidirectional long short term memory. In: 2020 IEEE Region 10 Symposium (TENSYMP), pp. 1547–1551 (2020) 10. Das, A., Iqbal, M.A., Sharif, O., Hoque, M.M.: BEmoD: development of Bengali emotion dataset for classifying expressions of emotion in texts. In: Vasant, P., Zelinka, I., Weber, G.W. (eds.) ICO 2020. AISC, vol. 1324, pp. 1124–1136. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-68154-8 94 11. Rai, A., Borah, S.: Study of various methods for tokenization. In: Mandal, J.K., Mukhopadhyay, S., Roy, A. (eds.) Applications of Internet of Things. LNNS, vol. 137, pp. 193–200. Springer, Singapore (2021). https://doi.org/10.1007/978-981-156198-6 18 12. Trappey, A.J., Trappey, C.V., Wu, J.L., Wang, J.W.: Intelligent compilation of patent summaries using machine learning and natural language processing techniques. Adv. Eng. Inf. 43, 101027 (2020) 13. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)
Using Machine Learning Techniques for Estimating the Electrical Power of a NewStyle of Savonius Rotor: A Comparative Study Youssef Kassem1,2(&) , Hüseyin Çamur2 , Gokhan Burge2, Adivhaho Frene Netshimbupfe2, Elhamam A. M. Sharfi2, Binnur Demir2, and Ahmed Muayad Rashid Al-Ani2 1
2
Faculty of Engineering, Mechanical Engineering Department, Near East University, 99138 Nicosia, North Cyprus Faculty of Civil and Environmental Engineering, Near East University, 99138 Nicosia, North Cyprus {yousseuf.kassem,huseyin.camur, binnur.demirerdem}@neu.edu.tr, [email protected]
Abstract. The ability and accuracy of machine learning techniques have been investigated for static modeling of the new-style wind turbine. The main aim of this study is to predict the electrical power (MP) of the new-style Savonius rotor as a function of aspect ratio, overlap ratio, number of the blade, wind speed, and rotational speed. In this paper, the EP of the proposed rotors was evaluated through Multilayer Feed-Forward Neural Network (MFFNN), and Cascade Feed-forward Neural Network (CFFNN) and Elman neural network (ENN) based on experimental data. Additionally, the proposed models were compared with previous models used in Ref. [6] to show the ability and accuracy of the proposed models. The results indicated that the ENN model has higher predictive accuracy compared to other models. Keywords: Machine learning models Mechanical power regressions Savonius turbine New-style
Multiple linear
1 Introduction The energy sector is the most prominent of the economic crisis and the environmental disaster in most developing countries. This sector is the biggest waste and the primary cause of budget deficits and debt ballooning, in addition to being the primary cause of air pollution and related deaths. Moreover, the electricity crisis has been increased due to the growth of population, rising living standards, and industry sectors, which have led to an increase the energy demand, and the increased electricity cost associated with fossil fuel-based electrical energy production. Generally, most Arabic countries do not suffer from poverty in electrical energy sources, such as oil, gas, sunlight, and wind. Nowadays, all the world countries are looking to utilize renewable energy resources instead of fossil fuels to mitigate climate change. Also, the use of renewable energies © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 167–174, 2022. https://doi.org/10.1007/978-3-030-93247-3_17
168
Y. Kassem et al.
can be an alternative solution for solving the electricity crisis in most countries and reducing the consumption of fossil fuels. Wind energy is one of the most alternative energy resources for electricity production globally. Wind turbines are utilized to convert wind kinetic energy into electrical energy. In the literature, utilizing the wind turbine helps to meet the needs of the basic domestic in all the world countries [1]. In addition, low-cut wind turbine will help to reduce the greenhouse gas emissions and fossil fuel consumption [2]. Savonius wind turbine has a simple structure and is suitable to operate at low wind speed. Thus, it can be utilized for generating electricity for domestic applications. Several scientific researchers have investigated the influence of the rotor geometries and blade shape on Savonius’s performance [3, 4]. For example, Mahmoud et al. [3] investigated the effect of rotor geometries and end-plate on Savonius’s performance. The results showed that the CP increased when the upper and lower end-plate are used. Based on the above, the investigation of the effect of turbine geometries and the shape of the blade is important for evaluating the performance of the Savonius rotor. In the literature, the behaviour of the performance of the Savonius turbine shows high nonlinearity. Several empirical models including machine learning models and mathematical have been used to predict the performance of the Savonius turbine including power coefficient, mechanical power, and torque [5]. For example, Sargolzaei and Kianifar [5] used three machine learning models to estimate the torque of the proposed rotor. The results showed ANFIS model gave the best accuracy compared to other models. As an ongoing study of authors investigation the performance of Savonius turbine [6], this study’s goal is to predict the electrical power of new-style Savonius wind turbine using three machine learning tools, namely, Multilayer Feed-Forward Neural Network (MFFNN), Cascade Feed-forward Neural Network (CFFNN), and Elman neural network (ENN). Also, the accuracy of models is compared with previous models (multilayer perceptron neural network (MLPNN) and radial basis function neural network (RBFNN)) used in Ref. [6].
2 Experimental Data Figure 1 shows the 2D and 3D views of the proposed rotors. In this study, the effect of blade number (NB), blade height (H), blade diameter (D), external gap (L0 ), and wind speed (WS) on the mechanical power (MP) of the proposed rotors are investigated as shown in Fig. 1. The blade and the shaft of the rotor are made from PV and stainless steel, respectively. Also, the desks are made from fiberglass. Details of the experiment setup and measurements were given in Ref. [6]. In this research, the experimental data of the new style Savonius wind turbine were collected to develop and validate the proposed models (MFFNN, CFFNN, and ENN) in comparison with the MLPNN and RBFNN models. In this study, aspect ratio, overlap ratio, wind speed, rotational speed, and the number of the blade are used as input variables. The mechanical power is the output variable.
Using Machine Learning Techniques for Estimating the Electrical Power
169
3 Prediction Methods Many models and techniques are such as machine learning models and mathematical models are used as alternative tools to descript a complex system. They are utilized in a wide variety of applications. In this study, four empirical models (Multilayer FeedForward Neural Network, Cascade Feed-forward Neural Network, and Elman neural network and multiple linear regression) are developed to estimate the mechanical power of the new-style Savonius rotor. In this study, TRAINLM is utilized as a training function. Also, Mean squared error (MSE) is estimated to find the best performance of the training algorithm. The descriptions of developed models in detail were given in Ref. [7–9]. MATLAB software was used to develop the proposed models.
Fig. 1. The 2D and 3D view and dimensions of the new-style Savonius rotors
4 Results and Discussions 4.1
Artificial Models
The descriptive statistics of the experimental data are presented in Table 1. In this study, the data are divided into training and testing groups and the results by the models are compared with each other. The optimum network architecture for all models was determined through the trial and error method. It should be noted that the optimum number of HLs and NNs in the MFFNN, CFFNN and ENN models were estimated based on the minimum value of MSE.
170
Y. Kassem et al.
It is found that the best transfer function for the hidden neurons is the tangentsigmoid function. It is found that one hidden layer and 6 neurons are selected as the best for the MFFNN model (5:1:1) with an minimum value of MSE (3.64 10–7). While it found that 1 hidden layer and 8 neurons are chosen as an optimum number for the CFFNN model (5:1:1) with an MSE value of 9.20 10–7. Additionally, it is observed that the ENN model (5:1:1) with 5 neurons has the minimum MSE with a value of 3.14 10–7. For the training phase, the R-squared value was found to be about 1 for all proposed models as shown in Fig. 2. Table 1. Selected parameters used in this study Parameter Variable Explanation Input Input Input Input Input Input
1 2 3 4 5 6
Output
4.2
NB H/D L'/D WS RPM MP EP
Standard deviation
Number of blades 0.82 Aspect ratio 1.31 Overlap ratio 0.59 Wind speed 3.19 Rotational speed 38.02 Mechanical 2.81 power Electrical power 2.39
Variation coefficient
Minimum Maximum
2.00 1.88 0.00 3.00 11.80 0.01
4.00 6.25 1.88 12.00 173.60 12.24
0.82 1.31 0.59 3.19 38.02 2.81
0.01
10.41
2.39
Performance Evaluation of Empirical Models for Testing Data
In this study, R-squared and root mean squared error (RMSE) are determined to find the best model for estimating the value of EP. The comparison of the predicted and actual values of the EP for all models is shown in Fig. 3. It is found that the ENN model has the highest value of R-squared (0.999996) and the least value of RMSE (0.000437) compared to other models. Furthermore, the performance of the developed models is compared with previous models used in Ref. [6]. It is found that the highest R-squared value of 0.999996 and lowest RMSE value of 0.000437 are obtained from ENN model. It is concluded that the ENN model was found to be the best model for estimating the EP of the newconfiguration Savonius rotor and more precise compared to CFFNN, MFFNN, MLPNN, and RBFNN models (Table 2).
EsƟmated power [W]
Using Machine Learning Techniques for Estimating the Electrical Power
12 10 8 6 4 2 0
171
MFFNN
y = 0.9997x + 0.001 R² = 0.999993 0
2
4
6
8
10
12
EsƟmated power [W]
Actual power [W] 12 10 8 6 4 2 0
CFFNN
y = 0.9994x + 0.002 R² = 0.999978 0
2
4
6
8
10
12
EsƟmated power [W]
Actual power [W] 12 10 8 6 4 2 0
ENN
y = 0.9999x + 0.0007 R² = 0.999994 0
2
4
6
8
10
12
Actual power [W] Fig. 2. Comparison of experimental data and the estimated values found by machine learning models
Y. Kassem et al.
Electrical power [W]
172
14
R-square = 0.999994 RMSE = 0.000543
9 4 -1
0
2
4
6 8 Wind speed [m/s]
Electrical power [W]
Actual 14
12
10
12
10
12
MFFNN
R-square = 0.999984 RMSE = 0.000931
9 4 -1
0
2
4
6 8 Wind speed [m/s]
Actual
Electrical power [W]
10
14
CFFNN
R-square = 0.999996 RMSE = 0.000437
9 4 -1
0
2
4
6 8 Wind speed [m/s] Actual
ENN
Fig. 3. Comparison of experimental data and the estimated values found by empirical models
Using Machine Learning Techniques for Estimating the Electrical Power
173
Table 2. Performance evaluation of the models Statistical indicator Current study MFFNN CFFNN ENN R-squared 0.999994 0.999984 0.9999966 RMSE 0.000543 0.000931 0.000437
Ref. [6] MLPNN RBFNN 0.999130 0.950136 0.07046 0.53245
5 Conclusions The main objective was to examine the application of artificial neural network models (Multilayer Feed-Forward Neural Network, and Cascade Feed-forward Neural Network, and Elman neural network) for predicting the electrical power (EP) of newconfiguration Savonius rotors. These models were also compared with multilayer perceptron neural network (MLPNN) and radial basis function neural network (RBFNN) to show the predictive accuracy of the proposed models. In this work, the impact of the blade number, aspect ratio, overlap ratio, wind speed, rotational speed on the electrical power was investigated and the experimental data were used to develop the proposed models. Moreover, the coefficient of determination (R2) and root mean squared error (RMSE) were used to assess the best empirical model. It is found that the ENN model was found to be the best model for estimating the EP of the newconfiguration Savonius rotor and more precise compared to CFFNN, MFFNN, MLPNN, and RBFNN models.
References 1. Abdulmula, A.M., Sopian, K., Haw, L.C., Fazlizan, A.: Performance evaluation of standalone double axis solar tracking system with maximum light detection MLD for telecommunication towers in Malaysia. Int. J. Power Electron. Drive Syst. 10(1), 444 (2019) 2. Arreyndip, N.A., Joseph, E., David, A.: Wind energy potential assessment of Cameroon’s coastal regions for the installation of an onshore wind farm. Heliyon 2(11), e00187 (2016) 3. Mahmoud, N., El-Haroun, A., Wahba, E., Nasef, M.: An experimental study on improvement of Savonius rotor performance. Alex. Eng. J. 51(1), 19–25 (2012) 4. Driss, Z., Mlayeh, O., Driss, S., Maaloul, M., Abid, M.S.: Study of the incidence angle effect on the aerodynamic structure characteristics of an incurved Savonius wind rotor placed in a wind tunnel. Energy 113, 894–908 (2016) 5. Sargolzaei, J., Kianifar, A.: Neuro–fuzzy modeling tools for estimation of torque in Savonius rotor wind turbine. Adv. Eng. Softw. 41(4), 619–626 (2010) 6. Kassem, Y., Gökçekuş, H., Çamur, H.: Artificial neural networks for predicting the electrical power of a new configuration of Savonius rotor. In: Aliev, R., Kacprzyk, J., Pedrycz, W., Jamshidi, M., Babanli, M., Sadikoglu, F. (eds.) ICSCCW 2019. AISC, vol. 1095, pp. 872– 879. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-35249-3_116 7. Kassem, Y., Gokcekus, H.: Do quadratic and Poisson regression models help to predict monthly rainfall? Desalin. Water Treat. 215, 288–318 (2021)
174
Y. Kassem et al.
8. Kassem, Y., Gokcekus, H., Camur, H., Esenel, E.: Application of artificial neural network, multiple linear regression, and response surface regression models in the estimation of monthly rainfall in Northern Cyprus. Desalin. Water Treat. 215, 328–346 (2021) 9. Li, X., Han, Z., Zhao, T., Zhang, J., Xue, D.: Modeling for indoor temperature prediction based on time-delay and Elman neural network in air conditioning system. J. Build. Eng. 33, 101854 (2021)
Tree-Like Branching Network for Multi-class Classification Mengqi Xue, Jie Song, Li Sun, and Mingli Song(B) Zhejiang University, Hangzhou 310007, Zhejiang, China {mqxue,sjie,lsun,brooksong}@zju.edu.cn Abstract. In multi-task learning, network branching, i.e. specializing branches for different tasks on top of a shared truck, has been a golden rule. In multi-class classification task, however, previous work usually arranges all categories at the last layer in deep neural networks, which implies that all the layers are shared by these categories regardless of their varying relationships. In this paper, we study how to convert a trained typical neural network into a branching network where layers are properly shared or specialized for the involved categories. We propose a three-step branching strategy, dubbed as Tree-Like Branching (TLB), to exploit network sharing and branching for multi-class classification. TLB first mines inherent category relationships from a trained neural network in a layer-wise manner. Then it determines the appropriate layer in the network on which specialized branches grow to reconcile the conflicting decision patterns of different categories. Finally TLB adopts knowledge distillation to train the derived branching network. Experiments on widely used benchmarks show that the derived tree-like network from TLB achieves higher accuracy and lower cost compared to prior models, meanwhile exhibiting better interpretability. Keywords: Multi-task learning · Multi-class classification neural network · Knowledge distillation
1
· Deep
Introduction
In computer vision, deep neural networks, especially convolutional neural networks (CNNs), has continuously celebrated their success in many vision tasks [9,11,19,22]. The development of CNNs enormously increases the network capacity which leads to more discrimination ability and better generalizability. Hence, jointly solving multiple tasks in one model is drawing more attention than tackling single tasks in isolation because of the multi-objective nature hidden in many real-world problems like self-driving, bioinformatics and so on. In multi-task learning, a sharing-and-branching mechanism [1,3,6,13] is established where heterogeneous tasks share the foremost portion of the network and have individual task-specific portion of the network in latter layers. On the one hand, network sharing introduces a regularization effect on multi-task learning and reduces the amount of the parameters. On the other hand, network branching specializes branches for different tasks, which reconciles the contradictions c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 175–184, 2022. https://doi.org/10.1007/978-3-030-93247-3_18
176
M. Xue et al.
between different tasks and thus improve the overall performance of multi-task learning. When it comes to classification, popular networks mostly place all categories at the single last layer [19,22,24,25], which makes the entire network shared by all these categories. However, different categories, like different tasks, also shares different amount of decision patterns due to their varying category similarities. In other words, a multi-class classification problem is in fact a multitask problem if we view each category as a binary classification task. Motivated by this observation, in this paper we propose a network sharing-and-branching strategy named Tree-Like Branching (TLB) to convert a trained neural network into a branching network for multi-class classification. TLB consists of three main steps. First, it employs agglomerative hierarchical clustering [14] on class-specific gradient features from a trained neural network to build the category hierarchy. With the category hierarchy, specialized branches grow on the original network, which turns itself into neural decision tree. Finally TLB adopts distillation [5] to train the derived tree-like network. Note that we use only unlabelled data in every step as labeled data is usually out of reach due to privacy or security issues. The proposed TLB gradually divides the whole multi-class feature space into several class-related feature space, which resembles the divide-and-conquer principle in decision trees (DTs) [17]. Decision making underlying the derived tree experience a coarse-to-fine process, and the class-adaptive branches show various topology with diverse category set. Such derived tree-like networks enjoy better interpretability thanks to their structure organizations. Experiments on popular benchmarks including CIFAR10 [7], CIFAR100 [7] and a mixed dataset using CIFAR100 and Oxford Flowers [15] show that TLB achieves higher accuracy, lower computation cost and better interpretability than prior networks. In a nutshell, we make following three main contributions: – We argue category relationship is vital for multi-class classification and propose to adopt agglomerative hierarchical clustering on gradient features to build it. – We propose a novel three-step branching strategy named TLB to build the tree-like branching network for multi-class classification. – Experiments show that the network derived by TLB exhibits higher accuracy, lower computation cost and better interpretability than prior methods.
2 2.1
Relation to Prior Work Multi-task Learning
Multi-task learning (MTL), which aims to train a single model that could simultaneously solve more than one task, has been widely used in the field of machine learning. Some previous works [3,12,13] take a study on network sharing and strive for the best sharing architectures for multi-task learning depending on the tasks at hand. In this paper, inspired by the sharing-and-branching mechanism in MTL, we solve multi-class classification problems by a tree-like branching network.
Tree-Like Branching
2.2
177
Knowledge Distillation
Knowledge distillation (KD) [5] is a process that efficiently transfers knowledge from a large teacher model to a small student model, which is a typical model compression and acceleration technique. The teacher model teaches the student model to acquire the knowledge, ending up with a small model which has comparable or even superior performance. Following [5], various types of KD algorithms have been proposed to exploit the hidden knowledge and produce compact models, such as attention-based [26], NAS-based [10], and graph-based KD [8]. Unlike these methods, we utilize KD to design and train the branching network. 2.3
Neural Trees
Neural tree is a new type of model which integrates the characteristics of the decision trees into convolutional neural networks. For example, soft decision trees (SDTs) [2,21] train a decision tree to mimic a deep neural network’s predictions. Adaptive Neural Trees (ANTs) [23] unite neural networks and decision trees to grow a tree-like neural network using some primitive modules. These neural trees inherently gain higher interpretability but lower accuracy than modern deep networks. Our method exploits the category relationship to design the branching network for multi-class classification, without any performance degradation.
3 3.1
Method Problem Setup
Assume there is a a N -way classification problem, where the input and the output spaces are denoted by X and Y. The category set is defined as C = {1, 2, ..., N }. Furthermore, assume there is trained typical neural network, which can be denoted by fΘ : X → Y, parameterized by Θ. For the sake of clarity, typical neural networks refer to existing prevailing networks such as VGG [19], K ResNet [4] and GoogLeNet [22]. The training data is denoted by X = {xk }k=1 . Note that in this paper we adopt unlabeled data to convert the trained typical network into branching network, since labeled data is usually out of reach due to privacy issue or expensive cost. Block-Pooling Module. Modern popular backbone network architectures are usually comprised of block-pooling modules. A block consists of a group of layers such as convolutional layers with same number of filters and feature map size or special architectures like bottleneck [4]. The following pooling operation is usually implemented by max pooling or average pooling layers. Such block-pooling modules are sequentially stacked one after another in the network. We use mi to denote the i-th block-pooling module, which is parameterized by Θi . The whole L network architecture M can thus be defined as M = {mi }i=1 , where L denotes the number of block-pooling modules. Please note that here we omit some other components, for instance, average pooling layers or fully connected layers [4].
178
M. Xue et al.
Fig. 1. Pipeline of the proposed TLB strategy using a baseline consisted of four modules. The symbol m and R denote the block-pooling module and the router, respectively.
In following subsections, we delineate the proposed TLB as a three-step strategy (see Fig. 1 for an illustration). At the first step, TLB mines inherent category relationships from a trained neural network in a layer-wise manner. Then it determines the appropriate layer in the network on which specialized branches grow to reconcile the conflicting decision patterns of different categories at the second step. At the final step, TLB adopts knowledge distillation to train the derived branching network. 3.2
Step 1: Class Relationship from Trained Networks
ˆ ∈ RN , Given an unlabeled sample x, the trained network produces the output y T ˆ = f (x) = yˆ1 , · · · , yˆN . In the vector y ˆ , element yˆi represents a score related y to a specific class ci in the category set C. We compute the derivative of yˆi w.r.t. the parameters Θj of the j-th block-pooling module through the trained network as follows: g ji =
∂ yˆi . ∂Θj
(1)
In the j-th module, the distance between the p-th category and the q-th category is approximated by 2 j j j j dp,q = g p − g q = g − gj . (2) k=1
p,k
q,k
Tree-Like Branching
179
The distances are calculated using pair-wise comparison on C and thus a similarity matrix D j ∈ RN ×N is constructed. The similarity matrix D j contains the category relationship of C using the knowledge learned at module mj . 3.3
Step 2: Building Branching Networks
As categories rely on different patterns to make decision, e.g., animals versus vehicles, here we adopt different branches to solve classification problems which differ vastly. To this end, we enable the original trained network to grow branches on itself. In order to determine the best location for branching in the trained network, we use similarity matrices obtained in previous step to separate C at appropriate modules meanwhile keeping the network in low complexity. Specifically, for each similarity matrix D j , we adopt the row vector D ji to represent the i-th category and employ hierarchical clustering [18] on the category vectors to construct category hierarchies for C. In hierarchical clustering, the Euclidean distance is selected as a measure of dissimilarity between vectors and the UPGMA [20] is chosen as the linkage criterion which computes pairwise dissimilarities. At first, each D ji is treated as a separate cluster, hierarchical clustering is performed in a bottom-up manner and stop when there are only two clusters that correspond to two disjoint category sets: Clj and Crj , Clj ∪ Crj = C. The number of clusters is set 2, which indicates that if module mj is the branching point, it will grow two branches, one for Clj and the other for Crj . The ratio between inter-cluster distance and intra-cluster distance is employed to determine branching or not. Take Clj as an example, the ratio is computed as follows: j
C D l i∈Clj j∈Crj d ( D i , D j ) inter j = j j
rl = , (3) Dintra Cl · Cr i∈C j d ( D i , u) l
where u is the arithmetic mean of class vectors in Clj and d is the Euclidean distance. With the ratios of two subsets, an indicator ρj is introduced to determine whether mj is suitable for branching: ρj = w(j) · (rlj + rrj ),
(4)
where w(j) is a scaling function which scales ρj according to the location of the module. As the deeper layers are preferred for branching, we set w(j) = 1/j. We first calculate ρj of every module mj . Then the branching point is chosen by j ∗ = arg maxj ρj when maxj ρj ≥ τ . The threshold τ is used to prevent unnecessary branching. When the first spilt node comes up, all the modules before it, namely m1 to mj ∗ −1 , automatically become the root node. And all the following modules, namely mj ∗ +1 to mL , are duplicated as two branch nodes with the same architectures. A special router Rφj∗ parameterized by φ is introduced, defined as Rj ∗ : Xj ∗ → [0, 1]. Rφj∗ deals with the features from split node mj ∗ and passes them to branches with corresponding categories. In this way, a branched model has made its first appearance. We repeat the process of branching on the
180
M. Xue et al.
resulting branches until the maximum depth is reached. Eventually, the final tree-like network is derived. Figure 1 provides an example for a whole branching process. 3.4
Step 3: Training with Knowledge Distillation
Due to the lack of labeled training data, we employ knowledge distillation [5] to exploit the knowledge learned in the trained model. To this end, we follow the general teacher-student framework, in which the pre-trained typical network is the teacher and the derived network is the student. The teacher provides a pseudo label for an unlabelled sample by softening the predcited probabilities of the input data. Similarly, the pseudo label used for training routers, which are used for students to determine which branch the data should be delivered to, is also given in this manner. Taking a sample xk as an example, we adopt Pˆt (xk ) to denote the soft targets produced from the teacher. The derived network is trained with by fitting the soft targets when given the same data point. Specially, the pseudo labels for routers should be converted to binary values as 0 or 1, due to their binary decision making for routing; accordingly, the Pˆtn (xk ) in vector Pˆt (xk ) is set to 1 when n-th category belongs to following left branch and 0 for the right. The loss functions are defined as: LKD (Pˆt , Pˆs ) = DKL (Pˆt (xk ), Pˆs (xk )), Ltotal = (1 − λ)LCE + λLKD ,
(5) (6)
where DKL denotes the Kullback-Leibler divergence between the predicted categorial distributions from the teacher and the student networks, and LCE denotes the cross-entropy loss. The student network is optimized by minimizing Eq. 6, where λ is a trade-off hyper-parameter to balance the effects of the two terms. At inference, the classification decisions are made by the specific branches routed by the routers.
4 4.1
Experiments Experimental Setup
Training Details. Our approach is implemented in PyTorch [16] on a Quadro P6000 GPU. For data augmentation, all samples are resized to 32 × 32 with a random horizontal flip. A thin ResNet-18 [4] is used as the network architecture of the teacher with 11 millon parameters. Our optimizer is SGD with a momentum to 0.9, a weight decay coefficient of 0.0005 and a batch size of 128. The initial learning rate is 0.1, decayed by a factor of 10 after 150 epochs with total 200 epochs. The hyper-parameter λ and τ are set to 0.6 and 2.2. The sample used for calculating category relationship is randomly chosen from testing data. Teachers and derived students have same training settings.
Tree-Like Branching
181
Table 1. Performance comparison. We compare TLB with the teacher and a random strategy, which randomly arranges categories in each branch while keeping the same network architecture as that from TLB. Parameters (M) Accuracy (%) At training At inference Teacher TLB Teacher TLB Teacher Random TLB CIFAR10
4.2
11.17
14.60 11.17
8.37 93.3
94.7
95.2
CIFAR100 11.22
17.43 11.22
8.54 73.5
73.3
75.9
CIFAR-S1 11.17
17.36 11.17
8.51 90.1
87.5
90.4
CIFAR-S2 11.17
14.60 11.17
8.37 89.5
89.5
91.2
Mixed 1
11.18
18.37 11.18
8.67 53.7
55.1
59.9
Mixed 2
11.18
18.49 11.18
8.66 54.4
53.9
58.2
Mixed 3
11.18
18.37 11.18
8.67 52.4
56.1
58.8
Datasets and Experimental Results
Table 1 summaries our experimental results on different datasets and compares our method with the teacher and a random strategy, which shuffles and rearranges category subsets of students while keeping the same network architectures. Results on CIFAR. We firstly adopt CIFAR10 and 100 [7] to verify the effectiveness of the proposed TLB. CIFAR10 contains 60, 000 images from 10 classes, 50, 000 for training and 10, 000 for testing. CIFAR100 [7] consists of 100 classes and 600 images per class. To validate that TLB is not specific to the categories involved, we also construct two datasets, CIFAR-S1 and CIFAR-S2, by randomly sampling the CIFAR100 to evaluate the proposed method. From Table 1 we can see that TLB invariably outperforms teachers and the random strategy. The consistent accuracy improvement indicates that TLB is an effective strategy not only on well-structured datasets but also randomly generated datasets. It is noticed that the random strategy causes a drop of 2.6% on CIFAR-S1 dataset, revealing that incorrect network sharing is detrimental for classification. Results on Mixed Dataset. We construct a mixed dataset to show that TLB can reconcile the conflicting the decision patterns for different categories. The dataset is constructed by one half sampled from the OxFord Flowers [15] and the other half from CIFAR100. The Oxford Flowers dataset exhibits a very different distribution from CIFAR100, therefore categories in these two datasets rely on different patterns to distinguish themselves from others. Experimental results in Table 1 show that TLB significantly outperforms the teacher and the random strategy on all the three different mixed datasets, by absolutely 2.7%–6.4%. Moreover, detailed results in Table 2 reveal that TLB achieves higher accuracy on data from CIFAR100 and Oxford Flowers separately, which indicates that the proposed method can fully exploit the sharing-and-branching mechanism to boost the classification performance.
182
M. Xue et al. Table 2. Detailed test accuracy (%) on the mixed dataset.
Dataset
Mixed 1 CIFAR 100(10)
Mixed 2
Mixed 3
Oxford Flowers (10)
CIFAR 100(10)
Oxford Flowers (10)
CIFAR 100(10)
Oxford Flowers (10)
Teacher 73.8
47.8
75.5
49.2
66.3
52.4
76.2
53.2
75.9
56.0
77.4
55.3
TLB
Root
Split Leaf Vehicles
Animals (a)
Accuracy(%) of routers and students
90 85
Student
Router
80 75 70 65 60
55 50
Random Random Random TLB TLB TLB (Mixed1) (Mixed2) (Mixed3) (Mixed1) (Mixed2) (Mixed3)
(b)
Fig. 2. (a) The network architecture derived from CIFAR10 by TLB. (b) Accuracy of students and routers on three mixed datasets using TLB or the random strategy.
We also give discussion about computation cost and interpretability of TLB. In Table 1, we can see that during the training phase derived students have more parameters than teachers due to branching. However, at inference each category will be routed to specific branch, which leads to lower inference cost. Figure 2(a) illustrates the tree-like network architecture derived from CIFAR10 by TLB with two class-related branches for vehicles and animals, which matches with human visual perception characteristics. In Fig. 2(b), the accuracy of routers increases with accuracy of students, and TLB has always exceeded the random strategy, which implies TLB divides the multi-class feature space properly to make routers have less uncertainty when routing. In general, our proposed TLB can derive tree-like network architectures which have better interpretability from perspective of models and lower cost at inference time.
5
Conclusion
In this paper, we propose a novel three-step branching strategy, named TreeLike Branching (TLB), to explore and exploit network sharing-and-branching mechanism for multi-class classification. This approach makes use of category relationship to convert a trained typical neural network into a tree-like network
Tree-Like Branching
183
with properly designed shared and class-adaptive layers using knowledge distillation. Extensive experiments on widely used benchmarks demonstrate that TLB achieves superior performance, lower cost, meanwhile exhibiting better interpretability with accurately network sharing and branching. Acknowledgement. This work is funded by National Key Research and Development Project (Grant No: 2018AAA0101503) and State Grid Corporation of China Scientific and Technology Project: Fundamental Theory of Human-in-the-loop HybridAugmented Intelligence for Power Grid Dispatch and Control.
References 1. Bakker, B., Heskes, T.: Task clustering and gating for bayesian multitask learning. J. Mach. Learn. Res. 4(May), 83–99 (2003) 2. Frosst, N., Hinton, G.: Distilling a neural network into a soft decision tree. arXiv preprint arXiv:1711.09784 (2017) 3. Gao, Y., Ma, J., Zhao, M., Liu, W., Yuille, A.L.: Nddr-cnn: Layerwise feature fusing in multi-task cnns by neural discriminative dimensionality reduction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3205–3214 (2019) 4. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) 5. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015) 6. Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7482–7491 (2018) 7. Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) 8. Lee, S., Song, B.C.: Graph-based knowledge distillation by multi-head attention network. In: BMVC, p. 141 (2019) 9. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) 10. Macko, V., Weill, C., Mazzawi, H., Gonzalvo, J.: Improving neural architecture search image classifiers via ensemble learning. arXiv preprint arXiv:1903.06236 (2019) 11. Mahjourian, R., Wicke, M., Angelova, A.: Unsupervised learning of depth and egomotion from monocular video using 3d geometric constraints. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5667–5675 (2018) 12. Meyerson, E., Miikkulainen, R.: Beyond shared hierarchies: Deep multitask learning through soft layer ordering. arXiv preprint arXiv:1711.00108 (2017) 13. Misra, I., Shrivastava, A., Gupta, A., Hebert, M.: Cross-stitch networks for multitask learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3994–4003 (2016) 14. M¨ ullner, D.: Modern hierarchical, agglomerative clustering algorithms. arXiv preprint arXiv:1109.2378 (2011)
184
M. Xue et al.
15. Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. In: 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing, pp. 722–729. IEEE (2008) 16. Paszke, A., et al.: Automatic differentiation in pytorch (2017) 17. Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986) 18. Rokach, L., Maimon, O.: Clustering methods. In: Data Mining and Knowledge Discovery Handbook, pp. 321–352. Springer, Boston (2005). https://doi.org/10. 1007/0-387-25465-X 15 19. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) 20. Sokal, R.R.: A statistical method for evaluating systematic relationships. Univ. Kansas Sci. Bull. 38, 1409–1438 (1958) 21. Su´ arez, A., Lutsko, J.F.: Globally optimal fuzzy decision trees for classification and regression. IEEE Trans. Pattern Anal. Mach. Intell. 21(12), 1297–1311 (1999) 22. Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015) 23. Tanno, R., Arulkumaran, K., Alexander, D., Criminisi, A., Nori, A.: Adaptive neural trees. In: International Conference on Machine Learning, pp. 6166–6175. PMLR (2019) 24. Vasant, P., Zelinka, I., Weber, G.W.: Intelligent Computing & Optimization (2018). https://doi.org/10.1007/978-3-030-00978-6 25. Vasant, P., Zelinka, I., Weber, G.W.: Intelligent Computing and Optimization (2019). https://doi.org/10.1007/978-3-030-33585-4 26. Zagoruyko, S., Komodakis, N.: Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. arXiv preprint arXiv:1612.03928 (2016)
Multi-resolution Dense Residual Networks with High-Modularization for Monocular Depth Estimation Din Yuen Chan, Chien-I Chang(&), Pei Hung Wu, and Chung Ching Chiang Department of Computer Science and Information Engineering, National Chiayi University, Chiayi, Taiwan [email protected]
Abstract. Deep-learning neural networks (DNN) have been acknowledged to capably solve the ill-posed monocular depth estimation problem in self-drive applications. In this paper, we proposed a dense residual multi-resolution supervised DNN toward accurate monocular depth estimations for traffic landscape scenes. The proposed DNN is constructed by regularly integrating the dense residual short-cut connections into the multi-resolution backbone. However, since some implicitly influential features cannot be viable at the end of learning, the DNN structure for generating the details of estimated monocular depths shall not be too deepened. Basically, the structural depth of DNN can be suppressed by effectively exploiting the functional residual connections. In the proposed DNN structure, the amount of short-cut connections can be moderate through rational employments. Particularly, for achieving high modularization, we address three-layered modules to generate the adequate levels and layers in which the results can easily be controlled to meet a requested prediction/ inference accuracy. Therefore, the visualization and quantitative results can demonstrate the superiority of the proposed DNN to other compared DNNs for street landscape experiments. Keywords: Monocular depth estimation Deep learning neural networks Multi-resolution backbone Dense residual module
1 Introduction The accurate monocular depth estimation is getting widespread for perspective scene understanding in computer vision. Even though in the computer vision field, the monocular depth estimation is essentially an ill-posed problem because of its need to correctly map a variety of many 3D scenes to few 2D scenes, with consecutive improvements, the DNN solutions continuously provide plausible progressive results. For example, in the early period, Eigen et al. [1] have verified that DNN can successfully attain accurate monocular depth estimation. Given the workable verification of monocular depth estimation in DNN, the DNN-based monocular depth estimation becomes more intriguing and also becomes a convincing application to the self-drive and smart navigation in an autonomous vehicle, robotics, mobile devices, and advanced © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 185–195, 2022. https://doi.org/10.1007/978-3-030-93247-3_19
186
D. Y. Chan et al.
driver assistance system (ADAS). Specifically, the monocular depth estimation DNN, perhaps abbreviated as mono-Net, can be cataloged into three types by the training resources. Firstly, the type-I mono-Nets [1–6] are trained by purely using available high-resolution ground truth (GT) depth maps such that the high-resolution GT depth maps are available and completely supplied for loss computation. Still, even though the GT depth maps can be captured by high-quality depth devices, they still need laborious handcrafted annotations. Namely, a sufficient acquisition of ground truth (GT) depth maps usually needs a large amount of high-quality labeled training data by the primary collection of the depth sensor and the subsequent handcrafted correction and refinement. In fact, although the process seems too verbose nowadays, following the hybridinstallation maturity, the resolution increment, and the implementation convenience of various TOF depth sensors with color cameras, and the drawing-assistant/imageprocessing software tools, the laboring difficulties will be greatly mitigated sooner or later in the future. The structures of networks [1] are of two stages to lead the coarse-tofine inference that the latter stage takes charge of the estimation refinement. Analogically, Song et. al. proposed a depth-to-depth auto-encoder DNN [2] that the GT depth map and roughly estimated depth map are combined as inputs sent into the second network for further depth estimation promotion. In [3], the DNN contrasted the global perspective profile predictor and the local depth detail predictor which are an autoencoder CNN of symmetric skip-connections and a resolution-invariable CNN to learn the piecewise smooth depths and the discontinuous depths in gradient fields, respectively. And then, following these two modules, an integration module merges the relative fine details into global information to generate the depth map from a singlecolor image. In [4], a spatial pyramid pooling (ASPP) module, a cross-channel leaner, and a full-image encoder are integrated to compose the scene understanding module using extracted dense features. Behind this module, the ordinal regression layer is applied for transferring the resolving of depth estimation to that of ordinal regression. The autoencoder network in [5] can exploit the transfer learning to acquire high accurate depth estimation in a pre-trained truncated DenseNet-169. Although there are many short-cut connections and layers, the structure of truncated DenseNet-169 is not regarded complicated. This can confirm the significance of effective kernel initialization in spite of using the legacy artificial neural networks or prevalent DNNs. In [6], the multi-scale feature fusion module can extract the multi-scale features along the learning of encoder module. And then, the multi-scale feature maps and primarily estimated depth map are concatenated as the input of refinement module which is merely composed of few successive blank convolutional layers to achieve the depth map with clear object boundaries. The type-II mono-Nets are trained by using the dual information of rectified stereo color-image pairs or nearby images [7–10]. They are acknowledged to be unsupervised approaches, which focus to explore the unsupervised cues when the real-world GT depths, the 3D geometric appearances, and the semantic contours are unavailable for their training. In general, those approaches require the functional synthetic module to create the virtual depth map and synthesize the virtual stereo-image pairs for the loss function settlement. The Type-III mono-Nets can have the semisupervised and self-supervised characteristic [11–14]. The common Type-III monoNets incorporate the geometrically correlative GT resources to achieve the specific multiple tasks which may include monocular depth estimation network and semantic
Multi-resolution Dense Residual Networks with High-Modularization
187
segmentation as well as, perhaps, planar 3D reconstruction. Because their respective subnetworks have the close task-specific goals in terms of the characteristic of predicted signal, those subnetworks can be mutually enhanced by simultaneously learning and intimately transferring the common advantageous features. In general, based on the essence of more available resources, the type-III mono-Nets can intrinsically obtain better predictions than the two formers especially for diverse photographing scenarios. Rather, the practical self-drive will encounter the inevitable car vibrations and nonuniform sunshine. Thus, the rectification and the calibration of trained stereo image pairs need complicated preprocessing which will be the obstacle of real time selfdriving. To the best of our knowledge about structures and available resources based on the surveys stated below, creative chances and improvable points of Type-I mono-Net are relatively limited to the Type-II and Type-III mono-Nets. However, the equipment capturing the training resource of Type-I mono-Nets can be easily settled on the practical autonomous vehicles. The high-resolution representation network (HR-Net) [15] is primarily addressed for human posture estimation. In effect, it could provide the suited multi-resolution architecture to explore and delve the appropriated multiresolution features. Hence, motived by HR-Net, we propose a high-regular multiresolution monocular depth estimation network with dense-residual blocks for landscape monocular depth estimation. And its architecture can be high modularized as inherently preferred solution in firmware and hardware. The contributions of this study are three-fold. • The developed multi-resolution architecture can handle the smoothness and sharpness of semantic profiles in the generated depth map. • The addressed modularization focuses on the regularity of structural combination and the fixation of number of convolutional channels. It can be conceivable that the simplification can be easily attained to directly benefit firmware/hardware solutions of monocular DNN. • The proposed layered modules can make the mono-DNNs be easily implemented as teacher-student DNN and facilitate the rational control of adequate levels and layers under the constraint of the leverage between the prediction accuracy and the network size.
2 Three Layered Regular Modules for Multi-resolution DNN For pursing the smooth, sharp and non-fragmental semantic profiles in the generated depth map, the different resolution features need to be simultaneously extracted and reserved. Motivated by HR-Net in [15], the proposed mono-DNN can be developed with the architectural simplification of HR-Net intrinsically, which can easily maintain the multi-scale perspective features. In our proposed DNN, the dense short-cut connections are embedded into the modules to extract the appropriated landscape features from diverse traffic scenarios with deepened residual learning. Moreover, the proposed DNN have no long-range connections. e.g., long-distance skip paths. This cannot only attain the easy modularization, but also lead the addressed approach to be prone to real-time firmware/hardware implementations. Particularly, the proposed system is constructed by
188
D. Y. Chan et al.
means of the progressive design in the order of increasing block size. Although the number of layers is three in this study, the layers and the assembling of assembling can be arbitrary. In general, the proposed layered modularization can fairly simplify the design of teacher-student DNN. The three modules are designed from small layer to big layer for the convenience and the flexibility of building the multi-resolution mono-DNN in terms of interaction and expansion of distinct resolutions. They are depicted as follows. The proposed DNN has a gradually extended architecture with increasing parallel specific resolution paths along the deepening the network depth. The first-layer module is a four-tier regular dense residual block, denoted as 4t-RRDB, which can be regarded as the basic-layer module for mono-DNN building, as Fig. 1 shown. Its regularization is the fixation of convolutional channel number. This can facilitate the wide deployment of the depth-wise separable convolution. Because the number of resulted feature maps can be directly consistent to the number of subsequent filters, the 1 1 convolutions can be saved. Hence, the depth-wise separable convolution can be simplified by removing the 1 1 convolutions following the 2D convolutions when the conjunction of those 1 1 convolutions is likely in vain or redundant. In Fig. 1, interlaced concatenation first uniformly blends multiple feature cubes slice by slice delivered through different connections. Instead of the use of generic 1 1 convolution, Conv1d-1 1 L being the short-term 1 1 convolution performs the learnable short-range weighted-sum of L slices. Thus, the shrinkage of slices cannot be excessive such that the network can inhibit the activity of embedded useful clues to be promptly diminished in the subsequent assembled features. The multiple short-skip connections are comprehensively settling in this basic module for regular dense residual learning. For regularization of the next modular expansion, we progressively extend one more path to form the second layer module with lowered resolution. We treat the 4tRRDB as the basic component to build the higher layer module called 2-resolution two-level RRDB module (2rTRM), which is the expansion of 4t-RRDB module. As shown in Fig. 2, the two parallel paths are intimately integrated to acquire a highly symmetric two-level 4t-RRDB-based module. Within 2rTRM, transferring the feature maps cross the different paths is performed only once from the high to the low resolutions. This can let the efficacies of different resolutions fusion and inter-path interference within numerous-parameter DNN be more easily traceable. Analogically, we treat 2rTRM as the modular element to further cover three resolutions for achieving the third-layer module named 3-resolution two-level RRDB module, abbreviated as 3rTRM. As shown by Fig. 3. it can be considered as the expansion-advanced module, where the 2rTRM modules are concisely consolidated, As shown Fig. 3, 3rTRM has two 2rTRMs of different resolutions in the status of overlapped embedding. According to the systematic propriety of informational interchange and structural expansion in HR-Net topology, the 2nd-layer module can be constructed based on 4t-RRDB. And then, we concisely aggregated the modules to be the 3rd-layer module.
Multi-resolution Dense Residual Networks with High-Modularization
189
Fig. 1. Detailed structure of 4t-RRDB where K and BN express the channel number and the batch normalization, respectively. The D_Conv.3 3 represents the depth-wise separable 3 3 2D convolution, and the Conv1d-1 1 L expresses the 1 1 1D convolution for shrinking L times the number of feature cube slices. The black skip connection is to link the head and the tail of 4t-RRDB, and the dense colored lines provides the inner skipping connections.
Fig. 2. Structural detail of 2-resolution two-level RRDB module (2rTRM).
Fig. 3. Structural detail of 3-resolution two-level RRDB module (3rTRM) where sub-networks masked by two pink dash-line blocks are two overlapped 2rTRMs with different resolution combinations.
190
D. Y. Chan et al.
3 High-Modularized Multi-resolution Architecture Observe the regular ingredient association and the channel number maintenance of three modules in Fig. 1, 2 and 3, the lightweight monocular DNN simplification can be easily attained. For example, sustaining and retaining the identical number of convolutional channels to the number of inputted cube slices can facilitate the large-scale convolution decrease by straightforward replacing the generic 3 3 one-to-many convolution by either the depth-wise separable 3 3 convolution or even the simplified depth-wise separable 3 3 convolution, which only performs one-on-one dedicated filtering, for each array of bundled filters. With those proposed modules on hands, the implementation of proposed monocular estimation multi-resolution DNN can have the variants of full regular modularization. Moreover, as Fig. 4 shown, the proposed DNN can be easily made as the type of student-teacher DNN. By adding more 3r-TRMs, the student-and-teacher framework can be attained that the student network can play a role of fundamental network shown in Fig. 4.
Fig. 4. Diagram of proposed monocular estimation multi-resolution DNN.
The constructed DNN in Fig. 4 can already outperform the compared DNNs in the tests with dataset KITTI under the training loss function designed for the depth estimation field. Hence, we only perform it as the proposed DNN. The first loss term of total training loss given later is the mean square error (MSE) of pixel-wise depth difference denoted by N 1X 2 LMSE Dp ; D ¼ di di ; N i¼1
ð1Þ
where d i and d i are the ith pixel value in the predicted depth map denoted by Dp and the N GT depth map D , respectively, which have N points. The second term is the profile loss given by
Multi-resolution Dense Residual Networks with High-Modularization
LSSIM Dp ; D ¼ 1 SSIM Dp ; D ;
191
ð2Þ
where SSIM ðX; YÞ ¼
2lX lY þ c1 2rX rY þ c2 rX;Y þ 0:5c2 ; l2X þ l2Y þ c1 r2X þ r2Y þ c2 rX rX þ 0:5c2
while computing the structural similarity index measure (SSIM) for images X and Y that l and r are mean and standard deviation, respectively, of compared images. The third term is the gradient loss defined by N 1X rh ðdi Þ rh d þ rv ðdi Þ rv d LG Dp ; D ¼ i i N i¼1
ð3Þ
where rh ð:Þ and rv ð:Þ are to calculate the horizontal and vertical components of gradient, respectively, for the pixel in parentheses that the gradient operator rð:Þ ¼ ðrh ð:Þ; rv ð:ÞÞ. The final term is to the edge-aware smoothness/regularization term defined by N 1X LS Dp ; D ¼ jrh ðdi Þjejrh ðIi Þj þ jrv ðdi Þjejrv ðIi Þj ; N i¼1
ð4Þ
where Ii is the ith -pixel strength of the luminance image which is sent to the proposed DNN as the depth estimation target in Fig. 4. With a forementioned four terms on hands, their weighted sum is the total loss given by LT Dp ; D ¼ a1 LMSE Dp ; D þ LSSIM Dp ; D þ a2 LG Dp ; D þ a3 LS Dp ; D ;
ð5Þ
where we can empirically set a1 = 0.1, a2 = 0.3 and a3 = 0.5 for considering the normalization and significance of dynamic range, both.
4 Experiments In this section, we evaluate the performance of our mono-Net by comparing it with some state-of-art mono-Nets on KITTI dataset for depth predictions of single-view landscape color image. Each original low-resolution depth map is interpolated as the GT depth map of 1244 376 pixels same as the pixel number of color image in KITTI dataset by using the tool of NYU Depth V2. There are totally 11,348 landscape depthcolor pairs in, KITTI dataset that they were divided into 9,078 training frames and 2,270 testing frames for our simulation work. We implement our depth estimation network by our own programming using Python 3.6 in Windows 10 and Intel i7-9700k with GeForce RTX 2080 Ti, 16GB. The recursive training routine is set to have 20 epochs with 10000 steps per epoch and 0.0001 learning rate that the batch size is only
192
D. Y. Chan et al.
one frame for matching the capability of inexpensive GPU hardware. In the simulation, our proposed DNN shown in Fig. 4 are compared with the other DNNs in literatures being Eigen’s DNN [1], Song’s DNN [2], Fu’s DNN [4] and Alhashim’s DNN [5]. The qualitative comparisons between our mono-Net and the other four mono-Nets in visualization shown in Fig. 5, where the displayed results of estimated depth maps are majorly selected from KITTI dataset for the aim of testing traffic landscape scenes. For fair comparisons, we only measure quantitative qualities of the depths, which are located at the locations of original available GT depth pixels in KITTI dataset without the depths of interpolated pixels. The quantitative qualities include four reconstructionerror metrics given by absolute relative difference (Abs_rel), squared Relative difference (Sq_rel), RMSE, and the logarithmic RMSE (RMSE in log), as well as the percentage of correct depth-estimated pixels. The smaller the reconstruction errors, the higher accuracy the mono-DNN can obtain. For counting the number of correct predicted pixels, the difference of estimated depth and GT depth at the jth checked pixel can be formularized by dj ¼ min dj =d j ; d j =dj . By neglecting the subscript index
ofdj , such a difference measured for all checked depths is identically expressed by symbol d such that the error-toleration criterions can be set as d < 1.25, d < 1.252 and d < 1.253 for correct-prediction approval. The quantitative comparisons are tabulated at Table 1 and Table 2 which demonstrate that the proposed mono-DNN can acquire the higher percentages of correct estimated pixels and the lower reconstruction errors on average than the compared mono-Nets, respectively.
Table 1. Comparisons of percentages of correct pixels estimated from our mono-Net and the other four mono-Nets under the error-toleration thresholds given by 1.25, 1.252, and 1.253. d < 1.25 Eigen’s DNN [1] 0.692 Song’s DNN [2] 0.893 Fu‘s DNN [4] 0.932 Alhashim’s DNN [5] 0.886 Proposed 0.999
d < 1.252 0.899 0.976 0.984 0.965 1.000
d < 1.253 0.967 0.992 0.994 0.986 1.000
Table 2. Comparisons of average reconstruction errors for our mono-Net and the other monoNets in terms of absolute relative difference, squared Relative difference, RMSE (linear), and RMSE (in log). Abs_rel Eigen’s DNN [1] 0.190 Song’s DNN [2] 0.108 Fu‘s DNN [4] 0.072 Alhashim’s DNN [5] 0.093 Proposed 0.028
Sq_rel 1.515 0.278 0.307 0.589 0.353
RMSE 7.156 2.454 2.727 4.170 3.903
Multi-resolution Dense Residual Networks with High-Modularization
193
Since the proposed DNN is fully modularized through layer-by-layer development, its simplification can be very easy. In our simulation, we have replaced 4t-RRDB by the plain 4-tier residual module thorough the entire network for acquiring several times the regular computation reduction that the monotonous 4-tier residual module has only a skip connection linking its head and tail without inner short-distance short cuts. However, when the basic module has the dense inner short-distance short cuts, the proposed DNN can maintain the gradual degradation of depth inference accuracy along saving the convolutional layers once a module bundle. The Fig. 5 can demonstrate that the the proposed mono-Net can achieve the estimated depth maps with pertinent shapepreservation and dividable-area smoothnees for street landscape scenes.
(a) KITTI image-depth pairs.
(c) KITTI image-depth pairs.
(b) KITTI image-depth pairs.
(d) KITTI and NYU-v2 image-depth pairs.
(e) KITTI image-depth pairs.
Fig. 5. The qualitative comparisons in visualization of estimated depth maps selected from KITTI dataset between (a) Eigen’s DNN, (b) Song’s DNN, (c) Fu’s DNN, (d) Alhashim’s DNN, and (e) the proposed DNN, where the first, second and third rows displays the color images, the GT depth maps and the estimated depth maps, respectively.
5 Conclusion In this paper, we present a dense residual multi-resolution supervised DNN in high modularization toward accurate monocular depth estimations for traffic landscape scenes. The architecture of proposed DNN is built by regularly integrating the dense residual connections into the multi-resolution backbone such that it needn’t be deepened for generating the details of estimated monocular depths. For highly modularizing the proposed architecture, we address three layered modules to facilitate the amount of
194
D. Y. Chan et al.
rationally controlling the adequate levels and layers under the requested prediction accuracy. The visualization and quantitative results of landscape experiments can demonstrate the superiority of our DNN to compared DNNs. Particularly, the proposed mono-Net can generate well shape-preservation depth map such that the depthgenerated multi-views could have the considerable perceptual comfort for bare-eye 3D viewing. Acknowledgments. This paper is supported by the funding granted by Ministry of Science and Technology of Taiwan, MOST 109–2221-E-415 -016 -
References 1. Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multiscale deep network. In: Advances in Neural Information Processing Systems, vol. 27 (NIPS), December 2014 2. Song, M., Kim, W.: Depth estimation from a single image using guided deep network. IEEE Access 7, 142595–142606 (2019) 3. Kim, Y., Jung, H., Min, D., Sohn, K.: Deep monocular depth estimation via integration of global and local predictions. IEEE Trans. Image Process. 27(8), 4131–4144 (2018) 4. Fu, H., Gong, M., Wang, C., Batmanghelich, K., Tao, D.: Deep ordinal regression network for monocular depth estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2002–2011 (2018) 5. Alhashim, I., Wonka, P.: High quality monocular depth estimation via transfer learning. arXiv preprint arXiv:1812.11941 (2018) 6. Hu, J., Ozay, M., Zhang, Y., Okatani, T.: Revisiting single image depth estimation: toward higher resolution maps with accurate object boundaries. In: IEEE Winter Conference on Applications of Computer Vision, Waikoloa Village, HI, USA, pp. 1043–1051, March 2019 7. Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition, pp. 270–279, July 2017 8. Pilzer, A., Lathuili`ere, S., Sebe, N., Ricci, E.: Refine and distill: exploiting cycleinconsistency and knowledge distillation for unsupervised monocular depth estimation. CVPR, pp. 9768–9777, June 2019 9. Wong, A., Soatto, S.: Bilateral Cyclic Constraint and Adaptive Regularization for Unsupervised Monocular Depth Prediction, pp. 5644–5653. CVPR, Open Access paper (June 2019) 10. Ye, X., Fan, X., Zhang, M., Xu, R., Zhong, W.: Unsupervised monocular depth estimation via recursive stereo distillation. IEEE Trans. Image Process. 30, 4492–4504 (2021) 11. Jiao, J., Cao, Y., Song, Y., Lau, R.: Look deeper into depth: monocular depth estimation with semantic booster and attention-driven loss. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 55–71. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_4 12. Godard, C., Aodha, O.M., Firman, M., Brostow, G.: Digging into self-supervised monocular depth estimation, pp. 3828–3838. ICCV, Open Access paper (Oct. 2019) 13. Lee, J.H., Han, M.-K., Ko, D.W., Suh, I.H.: From big to small: multi-scale local planar guidance for monocular depth estimation, CVPR, arXiv preprint arXiv:1907.10326, June 2020
Multi-resolution Dense Residual Networks with High-Modularization
195
14. Song, X., et al.: MLDA-Net: multi-level dual attention-based network for self-supervised monocular depth estimation. IEEE Trans. Image Process. 30, 4691–4705 (2021) 15. Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5693–5703 (2019)
A Decentralized Federated Learning Paradigm for Semantic Segmentation of Geospatial Data Yash Khasgiwala, Dion Trevor Castellino(&), and Sujata Deshmukh Department of Computer Engineering, Fr. Conceicao Rodrigues College of Engineering, Mumbai, India {yashkhasgiwala,diontrevorc}@gmail.com, [email protected]
Abstract. Data-driven deep learning is recognized as a promising approach to building precise and robust models to classify and segment complex images. Road extraction from such complex aerial images is a hot research topic in geospatial data mining as it is essential for sustainable development, urban planning, and climate change research. It is unheard of for individual satellites to possess many data samples to build their personalized models. Centralizing satellite data to train a model on varied data is also infeasible due to privacy concerns and legal hurdles. This makes it a challenge to train Deep Learning algorithms since their success is directly proportional to the amount of diverse data available for training, preventing Deep Learning from reaching its full potential. Federated Learning (FL) sidesteps this problem by collaboratively learning a shared prediction model without sharing data across satellites. This paper constructs a semantic segmentation-based FL system that leverages the strengths of shared learning and Residual architecture combined with Unet for road extraction. We train and evaluate our system on a public road dataset reorganized into a heterogeneous distribution of data scattered among multiple clients and compare it with models trained locally on individual satellites. We further try to enhance the performance of our FL-based model by implementing various versions of Unet. Keywords: Federated learning Data privacy Dense Unet Residual Unet Semantic segmentation
1 Introduction Every day, tonnes of devices collect millions of data points that can be used to improve services. However, data is not shared freely because of privacy concerns, computational constraints of collecting data at a centralized location, and legal restraints. For example, satellites revolving around the earth cover different geological landmasses. To train models to differentiate between certain geospatial features, it is necessary to leverage data that these satellites collect. However, since these aerial images could contain sensitive data specific to a region, data sharing is restricted. Such situations warrant the use of Federated Learning (FL), where multiple clients (or devices) collectively learn a model by sharing the weights of the local models trained on their respective data. This kind of learning is possible in today’s world because of the © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 196–206, 2022. https://doi.org/10.1007/978-3-030-93247-3_20
A Decentralized Federated Learning Paradigm for Semantic Segmentation
197
massive amount of data split among various devices. FL has many advantages, such as generating a more intelligent model while ensuring lower latency, less energy consumption, and most importantly, keeping data privacy intact as the clients only share the updated parameters, not the data itself. Consequently, this eliminates the requirement of powerful hardware, thereby making it possible to execute these computations on various interconnected IoT devices. This can be extended to include the satellites on which we depend for navigational purposes that cover the different specific regions of our planet. These satellites can capture and map high-resolution images of the landmass they observe. We can mine geospatial data such as buildings, roads, vegetation, etc., with the help of semantic segmentation from these images. Semantic segmentation is the procedure of labeling specific areas of an image. Combined with semantic segmentation, FL can be used to significant effect for road extraction from geospatial data.
2 Related Work Google conceptualized FL to improve model training on devices with both data and computational restraints such as cellular devices, tablets, etc. (Konecny et al. [1]). This development then spurred the creation of an FL system independent of a central server which usually orchestrated the training process. Apart from reducing dependence on the central server, this also speeds up the model training process as the latency reduces due to direct peer-to-peer communication (Roy et al. [2]). Nvidia then explored the feasibility of this paradigm in the field of medical segmentation of brain tumor samples, where data privacy is of utmost importance (Rieke et al. [3]). FL was used to improve traffic congestion as the existing methods weren’t successful without continuous monitoring (Xu et al. [4]). Consequently, it was employed by a Network Intrusion Detection System to secure Satellite-Terrestrial Integrated Networks (Li et al. [5]). This method is also used in its asynchronous form as a part of geospatial applications (Sprague et al. [6]). Outside the FL paradigm, Aerial Detection and classification of vehicles is achieved through semantic segmentation (Audebert et al. [7]). While most previous research attempts rely on homogeneously distributed data, most of the FL-aided segmentation on heterogeneous data is done in the Healthcare sector. We simulate a heterogeneous distribution of data spread among three satellites to mimic real-world satellite locations, which are used to train models locally on each satellite. The performance of these local models is validated against the performance of the FL model, which takes advantage of the same data. Both of these approaches strive to implement semantic segmentation on their respective data. We have used a modified version of the Residual Unet [6] to compare the performance of these two approaches. Modifications in the Residual Unet are made concerning the number of residual blocks and the upsampling algorithm. We then use a Dense [7] and a standard Unet [8] with FL to gauge the relative performance of the Deep Residual Unet with the FL approach.
198
Y. Khasgiwala et al.
3 Methodology 3.1
Algorithm
In this section, we discuss the algorithm for FL and semantic segmentation. In the realworld implementation of FL, each federated member will have its data coupled with it in isolation. FL aims to ship a copy of global model weights to all the local clients, where each client will use these weights to gradually learn and train on the local data while simultaneously updating its model weights. The client will then ship its model weights back to a centralized server after each epoch, where these intermediate local model weights are aggregated to form an updated global model. The parameters of this global model are then shared with the clients to continue training based on the new parameters. To simulate FL in a real-world scenario, we randomly distribute data in a heterogeneous manner among three satellites. We then build the global model with an input shape of (256, 256, 3). We obtain the weights of the global model and initialize an empty list to store the weights of the local models. The client-side receives the global model weights and the total number of data points across all clients from the centralized server. The local model at the client end is built and initialized with global weights. This model is trained on the data possessed by that particular client. The weights of this model are then multiplied by a scaling factor which is the ratio of the total number of local samples and total samples across all clients. This is to ensure that the clients’ parameters are given weightage concerning the size of data they contain i.e., a client with a relatively large amount of data points will get more weightage than a client with a relatively small amount of data points. After each local model is trained once, i.e. one communication round (1 epoch), the local parameters of each client are added and set to the global model. These global model parameters are then sent back to the clients for further training until a fixed number of communication rounds are complete. Semantic segmentation is the process of assigning each pixel in an image to a class label. Its architecture consists of an encoder-decoder module. The encoder learns a rudimentary representation of the input image while the decoder semantically projects this learned feature map onto the pixel space to get a dense classification. During training, each pixel is assigned a class label predicted by the model. Loss is computed between each pixel of the predicted mask and the ground truth mask. The gradients are then computed using backpropagation. Training is done until the loss function converges to a global minimum. The Intersection over Union is calculated to evaluate how accurately the predicted mask resembles the ground truth mask (Fig. 1).
A Decentralized Federated Learning Paradigm for Semantic Segmentation
199
Fig. 1. Client-side and server-side FL algorithm
3.2
Model
Residual Unet is an improvement over the traditional Unet architecture. It is designed to solve the problem of vanishing and exploding gradients in deep networks. It benefits from the identity mapping feature of deep residual networks and semantic representation learning of Unet architecture. We implement a deeper and modified version of the Residual Unet. The model consists of an encoder, a bridge, and a decoder. The encoder facilitates the learning of a summarised representation of the input image. The bridge connects the encoder and the decoder. The decoder utilizes the feature map from the encoder to restore the segmentation map. As seen in Fig. 2, the encoder has four residual blocks (64, 128, 256, and 512 kernels in the convolution operations in each residual block, respectively). Instead of using a pooling operation to reduce the spatial dimension of the feature map, a stride of two is applied in the first convolution operation of the second, third, and fourth residual block; this reduces the dimensions of the feature map by a factor of 2. The input of the encoder block is concatenated with its output. This eases network training and helps in information exchange between lower and higher levels of the network without degradation. This concatenated output of the encoder block is fed to the corresponding decoder block and the succeeding encoder block as a skip connection to facilitate upsampling and feature extraction, respectively. The bridge also consists of a convolution operation (1024 kernels) incorporating a stride of two. There are four residual blocks 512, 256, 128, and 64 kernels in the convolution operation in each residual block, respectively, in the decoder. At the start of each decoder block, the lower-level feature map is up-sampled using a transposed convolution instead of the upsampling 2d operation. The transposed convolution operation has weights; hence it learns to reconstruct the image from the lower feature maps rather than using the nearest/bilinear interpolation technique used by the upsampling 2d operation. The transposed convolution and the corresponding encoder block outputs are concatenated and passed through a similar residual convolution block as the encoder. The output of the
200
Y. Khasgiwala et al.
transposed convolution operation and the decoder block is concatenated to simulate the identity mapping of the residual network. The output of the last decoder block passes through a 1 1 convolution operation (1 kernel) and a sigmoid activation layer to project it into a single dimension segmentation mask. 3.3
Loss Function and Evaluation Metrics
We have wielded a union of binary cross-entropy (BCE) loss and soft dice loss. BCE loss computes the difference between the class probability of each pixel in predicted and ground truth masks, thereby asserting equal learning to each pixel in an image. This proves to be a disadvantage for datasets with an imbalance between the mask and non-mask pixels. BCE loss is a per-pixel loss that is determined discreetly without considering whether the adjacent pixels are part of the mask or not. On the other hand, soft dice loss is a measure of overlap between predicted and target masks. The ground truth boundary and predicted boundary pixels can be viewed as two sets in semantic segmentation. Dice Loss directs the two sets of masks to overlap. In Dice loss, the numerator considers the overlap between the two sets at the local scale. At the same time, the denominator takes the total number of boundary pixels at a global scale into account. As a result, dice loss effectively accounts for both local and global information, thereby making it possible to achieve high accuracy. As seen in (1) (where pi stands for the prediction, yi stands for ground truth and i ranges from 1 to N), we combine the two-loss functions to benefit from the gradient stability of BCE loss and the local and global context of soft dice loss. BCE Dice Loss ¼
1X 2j X \ Y j yi log pi þ ð1 yi Þlogð1 pi Þ þ 1 N jX [ Y j
TI ¼
TP ; a ¼ 0:6; b ¼ 0:4 TP þ aFN þ bFP J ðA; BÞ ¼
jA \ Bj jA [ Bj
ð1Þ ð2Þ ð3Þ
Tversky index [9], as shown in (2), is helpful for highly imbalanced datasets since it consists of constants ‘alpha’ and ‘beta’ that serve to penalize false negative and false positive predictions respectively to a higher degree. This increased control on the evaluation metric gives a more accurate measure of prediction than the standard dice coefficient while informing us about the model’s performance in edge cases. The Jaccard Index (3) also known as Intersection over Union (IoU), is the ratio between the positive instance overlap of two sets and their mutual combined values. These indices range between 0 and 1, where 0 signifies the worst performance and 1 signifies the best performance.
A Decentralized Federated Learning Paradigm for Semantic Segmentation
201
Fig. 2. Modified Residual Unet architecture. The ‘ +’ sign indicates concatenation of block input and output(identity mapping) i.e. x + F(x). The ‘x’ sign indicates a skip connection from the encoder to the decoder.
202
Y. Khasgiwala et al.
4 Experiments 4.1
Dataset
We utilize the Massachusetts Road Dataset [10] to implement semantic segmentation. The dataset consists of 1171 aerial images of the state of Massachusetts. Each of these images having a height and width of 1500 pixels covers a wide range of urban, suburban, and rural regions encompassing 2600 square kilometers. These images are cropped into image tiles of size 256 256. On account of hardware constraints, we limit our dataset to 3200 images and their corresponding target masks, which include 480 samples to be used as a test set. The remaining 2720 samples are distributed among three satellites to simulate FL wherein satellites A, B and C comprise 960, 1280, and 480 training samples, respectively. All the pixels of the target masks are converted into 0 or 1 depending on the classes they represent(road or background)to achieve the task of binary segmentation. 4.2
Implementation and Results
We implement the two training methods by employing the Keras framework and minimizing the loss function (1) through the Adam [11] optimization algorithm. In this implementation, we utilize fixed-sized images (256 256) to train the model. These images are flipped horizontally in a random fashion to implement data augmentation. We train the model on an NVIDIA TESLA P100 GPU while keeping the mini-batch size 16. The learning rate was initially set to 0.01 and reduced by a factor of 0.1 in every 15 epochs. The network will converge in 40 epochs. Model evaluation results on the test set are listed in Tables 1 and 2. The FL model is compared with the local models trained on each satellite. As seen [Table 1], the proposed FL procedure achieves a much better segmentation performance than the local models, as evidenced with the lowest BCE-Dice Loss of 0.257, the highest mean IoU of 0.581, and the highest Tversky Index of 0.740 without sharing clients’ data. As seen in Fig. 3, the FL-trained model reaches a global minimum more smoothly than locally trained models. Out of the three individual satellites, Satellite C has the least amount of data points. Hence, it will perform relatively worse as compared to the other two satellites, as evidenced by our results. Satellite A comes next, with a very average performance that is still non-satisfactory. Even with more data points, Satellite B’s performance, although satisfactory for the data points it possesses, still falls short of the precision achieved by the FL method. We further try to achieve better performance by training various versions of Unets with the same FL method. Dense Unet performs better than the standard and Residual Unet with the highest mean IoU and Tversky Index of 0.592 and 0.743, respectively. This is because, in each dense block, all the previous layers are concatenated in the channel dimension and fed as input to the next layer. In the residual model, only the identity function of the prior input and respective block output is concatenated and passed ahead. This short circuit connection in the Dense Unet ensures the reuse of prior feature maps, which helps Dense Unet to achieve better results than Residual Unet with fewer parameters and lower computational cost. As a result of the dense connection, it is easier to compute gradients by backpropagation in Dense Unet because each layer can reach directly to the final error signal.
A Decentralized Federated Learning Paradigm for Semantic Segmentation
203
Table 1. Comparison between local models and FL model (Residual Unet) Satellite Sat A Sat B Sat C FL
BCE-Dice loss mIoU Tversky index 0.293 0.549 0.708 0.286 0.557 0.712 0.300 0.542 0.675 0.257 0.581 0.740
Table 2. Comparison between Unet style architectures used with FL Model Unet Dense Unet Residual Unet
BCE-Dice loss mIoU Tversky index 0.271 0.562 0.728 0.266 0.592 0.743 0.257 0.581 0.740
Fig. 3. (clockwise) Mean IoU plot for Residual, Dense, and Standard Unet; BCE-Dice loss, mean Tversky Index and IoU plots respectively for the 4 trained models.
As seen in Fig. 4, the individual models plot fragmented bits of roads while the Dense Unet based FL model plots the roads correctly. Unlike the FL model, the models wrongly classify many houses as roads, which barely makes such errors. As seen in columns 2 and 6 in Fig. 4, our FL-based model further detected roads that were not labeled in the ground truth mask.
204
Y. Khasgiwala et al.
Fig. 4. Segmentation results on the test set. (From top to bottom) Original image, ground truth mask, Sat A model prediction, Sat B model prediction, Sat C model prediction, Residual Unet FL model prediction, and Dense Unet FL model prediction.
5 Conclusion In this paper, we propose an FL-based semantic segmentation system for road extraction from geospatial data. We compare it to training on a single device with the corresponding data constraints. We discover that it performs significantly better than the local models while maintaining the privacy of the data possessed by the individual clients. To further improve the inference accuracy of the aerial federated segmentation system, we train the data on various encoder-decoder models and ascertain that Dense Unet performs the finest. It also detects roads that aren’t present in the ground truth
A Decentralized Federated Learning Paradigm for Semantic Segmentation
205
masks while keeping misclassification errors to a minimum. FL is undoubtedly a promising approach to deliver precise and secure models. By permitting multiple devices to train collaboratively without the necessity to exchange data, FL addresses privacy concerns without impacting performance. In the future, this paradigm can be explored in great detail by experimenting with different loss functions, the number of satellites, and other variations of FL (Asynchronous FL) and applied in various fields thereby extracting maximum value from the data at hand without making it available to entities other than the source.
References 1. Konecny, J., McMahan, H.B., Yu, F.X., Richtarik, P., Suresh, A.T., Bacon, D.: Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610. 05492 (2016) 2. Roy, A.G., Siddiqui, S., Pölsterl, S., Navab, N., Wachinger, C.: BrainTorrent: a peer-to-peer environment for decentralized federated learning. arXiv abs/1905.06731 (2019) 3. Rieke, N., et al.: The future of digital health with federated learning. NPJ Digital Medicine 3, 1–17 (2020) 4. Xu, C., Mao, Y.: An improved traffic congestion monitoring system based on federatedlearning. Information 11(7), 365 (2020). https://doi.org/10.3390/info11070365 5. Li, K., Zhou, H., Tu, Z., Wang, W., Zhang, H.: Distributed network intrusion detection system in satellite-terrestrial integrated networks using federated learning. IEEE Access 8, 214852–214865 (2020). https://doi.org/10.1109/ACCESS.2020.3041641 6. Sprague, M.R., Jalalirad, A., Scavuzzo, M., Capota, C., Neun, M., Do, L., Kopp, M.: Asynchronous federated learning for geospatial applications. DMLE/IOTSTREAMING @PKDD/ECML (2018) 7. Audebert, N., Le Saux, B., Lefèvre, S.: Segment-before-detect: vehicle detection and classification through semantic segmentation of aerial images. Remote Sensing. 9(4), 368 (2017). https://doi.org/10.3390/rs9040368 8. Zhang, Z., Liu, Q., Wang, Y.: Road extraction by deep residual U-Net. IEEE Geosci. Remote Sens. Lett. 15(5), 749–753 (2018). https://doi.org/10.1109/LGRS.2018.2802944 9. Guan, S., Khan, A.A., Sikdar, S., Chitnis, P.V.: Fully dense UNet for 2-D sparse photoacoustic tomography artifact removal. IEEE J. Biomed. Health Inf. 24 (2020) 10. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-31924574-4_28 11. Abraham, N., Khan, N.M.: A novel focal tversky loss function with improved attention UNet for lesion segmentation. In: 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019), pp. 683–687 (2019) 12. Geoffrey, E.H., Mnih, V.: Machine learning for aerial image labeling (2013) 13. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR abs/1412.6980 (2015) 14. Intelligent Computing & Optimization, Conference proceedings ICO 2018, Springer, Cham, ISBN 978-3-030-00978-6
206
Y. Khasgiwala et al.
15. Intelligent Computing and Optimization, Proceedings of the 2nd International Conference on Intelligent Computing and Optimization 2019 (ICO 2019), Springer International Publishing, ISBN 978-3-030-33585-4 16. Intelligent Computing and Optimization 17. Proceedings of the 3rd International Conference on Intelligent Computing and Optimization 2020 (ICO 2020)
Development of Contact Angle Prediction for Cellulosic Membrane Ahmad Azharuddin Azhari bin Mohd Amiruddin1,2, Mieow Kee Chan1(&), and Sokchoo Ng3 1
Centre for Water Research, Faculty of Engineering, Built Environment and Information Technology, SEGi University, Jalan Teknologi, Kota Damansara, 47810 Petaling Jaya, Selangor, Malaysia [email protected], [email protected] 2 Department of Chemical Engineering, Universiti Teknologi PETRONAS, Bandar Seri Iskandar, 32610 Perak, Malaysia 3 Faculty of Arts and Science, International University of Malaya-Wales, Kuala Lumpur, Malaysia [email protected]
Abstract. Contact angle (CA) of a membrane determines its application. Accurate CA prediction models are available. Nevertheless, detailed membrane roughness properties and thermodynamic data of the interaction between the membrane and water droplet are required. These data are not easily accessible, and it is not available for newly developed material. This study aims to apply Artificial Neural Network to estimate the CA by using pure water flux, membrane porosity and its pore size as inputs. This model was tested on two type of filtration processes: dead end (DE) and cross flow (CF). The results showed that the prediction for DE achieve an overall accuracy of 99% with a sample size of 53 data sets. The prediction for CF could be done by using DE + CF model with a maximum R2 at training stage of 0.9456. In conclusion, a novel statistical solution to predict CA for cellulosic membrane was developed with high accuracy. Keywords: Cellulose membrane Pure water flux Cross flow
Artificial neural network Contact angle
1 Introduction Membrane is a selective barrier, which allows the selected component in the feed stream to pass through it, while the rest of the components are retained. This separation process is governed by pressure difference, temperature difference or concentration gradient between the feed stream and product stream. Currently, membranes are widely used in wastewater treatment to remove dyes [1] and oil [2]. In medical and pharmaceutical field, membranes are used to remove uremic toxic from the blood of patients with kidney failure [3] and protein separation [4]. The efficiency of the separation processes is strongly depending on the surface properties of the membrane such as hydrophilicity. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 207–216, 2022. https://doi.org/10.1007/978-3-030-93247-3_21
208
A. A. A. bin Mohd Amiruddin et al.
Hydrophilicity of a membrane is revealed by the value of the contact angle (CA). The membrane is classified as hydrophilic if the contact angle is less than 90° and it is hydrophobic if the CA is more than 90°. Zhang et al. [5] fabricated superhydrophobic polyvinylidene fluoride (PVDF) membranes with CA more than 150° and the membrane showed superior performance in water-oil mixture purification. As a result, oil with more that 99.95% purity was obtained. Wang et al. [6] fabricated hydrophilic electrospun nanofiber membrane-supported thin film composite membrane for water treatment. The low-pressure plasma treatment reduced the CA of the composite membrane from 137° to 0°. The osmotic water flux of the membrane was at least 40% higher compared to the commercial membrane. Young’s equation and Cassie-Baxter model are widely used to determine the contact angle of a surface. However, the accuracy of the equations is strongly depending on the nature of sample. Young’s equation is applicable for ideal homogenous surface, which is rigid, perfectly flat and non-reactive [7]. Meanwhile, the contact angle of flat heterogeneous surface can be estimated by Cassie-Baxter model [8]. Practically, it is hard to obtain sample with perfectly smooth surface. Thus, Luo et al. [9] concluded that Cassie-Baxter need to be modified with appropriate geometrical models to obtain accurate result, which is closed to experimental data. Bahramian and Danesh [10] attempted to evaluate the contact angle of sample by estimating the solid-vapour and solid liquid interfacial tension. However, detailed thermodynamic data such as liquid critical temperature and the melting point of the solid sample were required. These data are not easily accessible, and it may not be available, especially for newly developed materials. Contact angle goniometer has been used to measure the contact angle of a membrane. However, the instrument is costly, and the accuracy of the result strongly depends on the experience of the user. Previous study showed that the contact angle of a membrane was affected by pure water flux (PWF), membrane porosity and its pore size [11]. However, detail modelling work to predict the contact angle has not been done. Thus, the objective of this study is to estimate the contact angle of cellulosic membrane via mathematical approach. A multi-layered Artificial Neural Network (ANN) that models the hydrophilicity of membrane was developed. PWF, membrane porosity and mean pore size were used as the inputs and the network then estimated the contact angle of the membrane by implementing the feed-forward propagation.
2 Methodology 2.1
Data Collection
Data of membrane properties including PWF (L/hr.m2), porosity(dimensionless), pore size (in diameter, nm) and CA were obtained through random sampling of various cellulose-based membranes in the literature [12–27].
Development of Contact Angle Prediction for Cellulosic Membrane
2.2
209
Model Development
The model was developed by using MATLAB Function Fitting Neural Network feature, which is an ANN optimization tool for data fitting analysis. The input to be used in the network was entered as an array of ‘3 by n’, where n was the number of data set to be used for the modelling with the columns corresponding to any consistent order of the membrane properties data. The membrane properties data were grouped into one variable, and the contact angle was the target. Two sets of input-target pair were prepared for two different filtration processes, namely dead end (DE) and cross flow (CF) filtration. Figure 1 shows the algorithm of the contact angle prediction. Firstly, the network was fed with the membrane data set; the performance labelled as the inputs, the CA as the targets, in the nftool GUI of MATLAB, an ANN algorithm optimized for curve fitting. The training method used was the Levenberg–Marquardt (LM) algorithm, with two hidden layers of a tangent-sigmoid transfer function, and linear transfer function. The algorithm also included a normalization of the data set to ranges between [−1, 1]. Initially, a neuron size of 10 was adopted. The cross-validation split of the membrane was pre-set as 70-15-15 at the initial stage of study. This indicated that 70% of the collected data was used as training data to create the contact angle prediction model and 15% of the data was used for verification/validation purpose, which showed the adaptation of the model to additional data. The remaining data was used for testing purpose to evaluate the performance of developed model using the foreign data. The performance of the network was analysed using two indicators: the regression factor, R2 and average mean square errors (MSE). R2 indicated the linear fit of the predicted contact angles while and MSE showed the mean square error between predicted contact angle and actual contact angle. In this study, the desired R2 was 0.99 and MSE was ±5 [28].
Fig. 1. Flowchart of the methodology
210
A. A. A. bin Mohd Amiruddin et al.
Next, the model was optimized by studying the effect of neuron size and TVT split on the developed model. The studied range of neuron size was within 6 to 18 while the TVT split was between 60-20-20 to 80-10-10. Lastly, normality test was conducted on the collected data by using SPSS v22 to further improve the model performance. This was carried out by observing the kurtosis and skewness values of the data. The data are normal distributed if kurtosis and skewness values are within ±3. After excluding the non-normal distributed data, the model was developed using the optimum neuron size and TVT split, which were identified in the earlier study.
3 Result and Discussion 3.1
Effect of Filtration Process
The total data set consist of 77 types of cellulose membranes in which 53 set of the data was collected from DE filtration, while the remaining 24 set of data was collected from CF filtration. Table 1 showed the range of the membrane properties data. It was found that all the contact angle data were less than 90°, which was due to the nature of the hydrophilic cellulose membrane.
Table 1. The range of the data collected from 77 types of cellulose membranes. Parameter Range Mean pore size (nm) 1–10 10–172 PWF (L/(m2.h)) 0–100 100–347 Porosity 0.23–0.50 0.50 0.853
Average CA value 38.67o 57.34o 55.52o 39.78o 43.91o 26.28o
Results shown in Table 2 were the four best performing network models from multiple trials of the different filtration processes. Each model is built from the same data set, but with different seeding for the neuron’s initial weight generation. DE + CF indicated that all the collected membrane properties data was used to develop the prediction model by assuming the effect of filtration mode was negligible. For DE and CF, the collected data was separated according to the filtration process. Table 2 showed that the maximum R2 for DE + CF was 0.9456 for the training stage. The R2 value was improved significantly when the mode of filtration process was considered. Maximum R2 value for DE was 0.9992 for the training stage. This could be explained by the fundamental difference between the filtration processes. The water is flows tangentially across the membrane surface in CF while the water flows perpendicularly to the membrane surface in DE. According to Amot et al. [19], flux collected from CF filtration was generally higher compared to DE under the same pressure difference and velocity of water for the same membrane. This implies
Development of Contact Angle Prediction for Cellulosic Membrane
211
that mode of filtration processes affects the PWF and thus influence the R2 and MSE of the contact angle prediction model. On the other hand, compared to DE, lower R2 value was obtained for CF due to the small sample size. For a sample size of only 24, the CF model consistently showed large error margins during the testing and validation stages of the model. Furthermore, the R2 value from the training stage of the CF model generation failed to reach the desired value, which was 0.99. Comparatively, 53 set of data was collected from the literature for DE and the maximum R2 value of 0.9992 was obtained for the training stage. Table 2. Performance of four models developed from CF + DE, DE, and CF membrane data sets, using TVT split of 70-15-15, with neuron size 10 Model Filtration processes DE + CF Analysis MSE 1 Train 24.3500 Verify 11.1018 Test 80.1626 2 Train 54.8051 Verify 70.8633 Test 23.1905 3 Train 32.3205 Verify 71.3056 Test 154.0834 4 Train 53.6511 Verify 72.2611 Test 23.6551
3.2
R2 0.9456 0.9382 0.8311 0.8512 0.8152 0.9443 0.9253 0.8175 0.5669 0.8665 0.7161 0.9373
DE MSE 3.1576 7.8304 13.5341 6.8173 14.2553 7.1031 6.1354 14.0482 14.5321 0.4987 1.3064 3.3995
CF R2 MSE 0.9992 23.3227 0.9978 5.2629 0.9713 62.7982 0.9821 23.1988 0.9839 8.1841 0.9857 11.3628 0.9865 58.3246 0.9344 29.8945 0.8673 120.9473 0.9989 29.6789 0.9961 9.0283 0.9892 70.9823
R2 0.4213 0.9747 0.7002 0.4277 0.9707 0.6166 0.1195 0.5546 0.5833 0.3441 0.3795 -0.4355
Effect of Neuron Size and TVT Data Split
Attempt to identify the optimal neuron size and TVT setting for DE filtration was carried out and the result were shown in Tables 3 and 4. The R2 values were within the range of 0.98 to 1.0 when the neuron size was increased from 6 to 18. 91% of data showed that the R2 value was at least 0.99 and above. This showed that the neuron size was found to be a non-significant contributor to the prediction performance. It is notable that with a low and high neuron size, MSE values were high. Even though the R2 values were high in relative to higher neuron size, the high MSE values were undesirable. Even though errors were unavoidable, but it should be kept as minimum to ensure the accuracy of the prediction model. The desired MSE values were obtained when the neuron size used was 10 with a TVT data split of 70-15-15, as shown in Table 4. 0.4987, 1.3064 and 3.3995 were obtained for training, validation and testing
212
A. A. A. bin Mohd Amiruddin et al.
stages. Ideally, while errors are unavoidable, it should be kept as minimum as possible to ensure that the priority of an accurate prediction is not compromised. Thus, the neuron size 10 was considered an optimum value for this case, which shows a low MSE value across the entire range of models. This value can easily change with a different set of data and under different TVT split conditions. Figure 2 and 3 below shows the error histogram and regression factor of the contact angle values of the optimum (neuron size 10) model developed.
Fig. 2. Histogram of contact angle errors from the TVT process of the most optimal DE network
Fig. 3. Regression plot of the contact angle errors from the TVT process of the most optimal DE network, (a) Training, (b) Test, (c) Validation
Development of Contact Angle Prediction for Cellulosic Membrane
213
Table 3. R2 for the optimal TVT setting and neuron size analysis for DE membrane data TVT (%) Analysis/Neuron size 6 8 10 12 14 16 18 60-20-20 Train 0.9879 0.9982 0.9991 0.9992 1.000 0.9923 0.9813 Verify 0.9724 0.9053 0.9950 0.9695 0.9687 0.9855 0.9281 Test 0.9639 0.9795 0.9023 0.9822 0.9813 0.9489 0.9771 70-15-15 Train 0.9910 0.9919 0.9989 0.9996 0.9977 1.0000 0.9946 Verify 0.9676 0.9782 0.9961 0.9979 0.9833 0.9796 0.9743 Test 0.9806 0.9679 0.9892 0.9929 0.9865 0.9949 0.9602 80-10-10 Train 0.9936 0.9842 0.9998 0.9975 0.9989 0.9995 0.9950 Verify 0.9832 0.9693 0.9819 0.9994 0.9915 0.9985 0.9980 Test 0.9987 0.9385 0.9987 0.9868 0.9890 0.9795 0.9954
Table 4. MSE for the optimal TVT setting and neuron size analysis for DE membrane data TVT (%) Analysis/Neuron size 6 60-20-20 Train Verify Test 70-15-15 Train Verify Test 80-10-10 Train Verify Test
3.3
8
10
12
14
16
18
5.1909 0.6171 0.2621 0.3222 0.0038 4.1245 8.0640 10.2962 20.2044 9.7183 11.3304 15.9511 6.2275 32.9429 27.0720 14.6689 20.4325 6.0475 8.1152 15.5020 6.8805 3.4760 3.5643 0.4987 0.3961 0.9558 0.0029 2.1038 14.8919 7.5487 1.3064 1.1581 4.8992 5.3336 14.3581 17.6584 8.6214 3.3995 7.3113 6.2851 4.2906 12.1641 2.1808 6.7577 0.0850 1.0030 0.4937 0.1827 1.8193 16.0764 5.8134 5.2753 0.9080 3.5165 13.0305 1.4328 2.1569 5.4514 0.3552 3.5709 1.6201 10.5707 4.6378
Optimization via Normality Approach
Normality test was conducted on all the collected membrane properties data for DE filtration and the results were shown in Table 5. The kurtosis and skewness values for all the data were within the range of ±3 except for mean pore size data as the kurtosis value was 4.376. The normality test was conducted again after excluding six set of the extremely large mean pore size data, which were within 80–100 nm. The normality result was shown in Table 6 and all the remaining data was normally distributed. Table 5. Normality results forall the DE membrane data Membrane properties Mean pore diameter Pure water flux Porosity Contact angle
Skewness Kurtosis 2.449 4.376 0.567 −0.364 0.339 −0.278 0.858 0.365
214
A. A. A. bin Mohd Amiruddin et al. Table 6. Normality results after excluding the extreme DE membrane data Membrane properties Mean pore diameter Pure water flux Porosity Contact angle
Skewness Kurtosis 1.543 2.647 0.616 −0.385 0.285 0.289 0.719 −0.134
Table 7. Performance of model by using only normal distributed DE membrane data Model Analysis 1 Train Verify Test Train 2 Verify Test Train 3 Verify Test Train 4 Verify Test
MSE 3.16 7.83 13.53 6.82 14.26 7.10 6.14 14.05 14.53 0.50 1.31 3.40
R2 0.9992 0.9978 0.9713 0.9821 0.9839 0.9857 0.9865 0.9344 0.8673 0.9989 0.9961 0.9892
The prediction model was developed again using only the normal distributed data with a TVT split of 70-15-15 and 10 neuron sizes. Table 7 shows the performance of the prediction model. The result showed that the normality of the data did not affect the accuracy of the predicted contact angle. This might be due to the magnitude of the membrane parameters, i.e., significance of each data point, contributing more to determining the non-linear relationship of the membrane-contact angle behaviour. Furthermore, the range of the data for each membrane properties was different. For instance, the mean pore diameter and pure water flux are values that are not restricted within a specific range of values, while the porosity and contact angles are essentially restricted within a scale of 1 to 0 and 0 to 180, respectively.
4 Conclusion A black-box model for predicting contact angle for cellulose-based membrane using the properties data such as PWF, porosity, and mean pore size as inputs was successfully developed using ANN. The development of a model combining both CF and DE filtration data sets was not conducive, due to different filtration mechanism. The model for CF membrane parameters failed to achieve high accuracy which could be due to lack of sample availability whereby only 24 sample sizes were used in the model training. A performance value of >0.99 in regression factor, and mean square error of
Development of Contact Angle Prediction for Cellulosic Membrane
215
up to *3, were obtained within the sample size of 53 data sets used in the training, validation and testing of a model for predicting the contact angle in DE membranes. A total of 10 neuron size and 70-15-15 TVT split were adopted to develop this model. This indicated that the prediction was done successfully. It is recommended to evaluate the DE model with more data points from cellulosic membranes to improve its flexibility and applicability. Additionally, more data need to be collected to develop the prediction system for CF model. The experimental results also suggested using the model developed from CF + DE to predict CF. Acknowledgments. The support from SEGi University is highly appreciated.
References 1. Lin, J., et al.: Tight ultrafiltration membranes for enhanced separation of dyes and Na2SO4 during textile wastewater treatment. J. Memb. Sci. 514, 217–228 (2016) 2. Karakulski, K., Gryta, M.: The application of ultrafiltration for treatment of ships generated oily wastewater. Chem. Pap. 71(6), 1165–1173 (2016). https://doi.org/10.1007/s11696-0160108-1 3. Chan, M.K., Idris, A.: Permeability performance of different molecular weight cellulose acetate hemodialysis membrane. Sep. Purif. Technol. 75, 102–113 (2010) 4. Nor, M.Z.M., Ramchandran, L., Duke, M., Vasiljevic, T.: Application of membrane-based technology for purification of bromelain. Int. Food Res. J. 24, 1685–1696 (2017) 5. Zhang, W., Shi, Z., Zhang, F., Liu, X., Jin, J.: Superhydrophobic and superoleophilic PVDF membranes for effective separation of water-in-oil emulsions with high flux. Adv. Mater. 25, 2017–2076 (2013) 6. Gong, L., et al.: In situ ultraviolet-light-induced TiO2 nanohybridsuperhydrophilic membrane for pervaporation dehydration. Sep. Purif. Technol. 122, 32–40 (2014) 7. K. Seo, M. Kim, D. H. Kim, Re-deviation of Young’s equation, Wenzel equation, and Cassie-Baxter equation based on energy minimization, Surface Energy, 2015 8. Cassie, A.B.D., Baxter, S.: Wettability of porous surfaces. Trans. Faraday Soc. 40, 546–551 (1944) 9. Luo, B.H., Shum, P.W., Zhou, Z.F., Li, K.Y.: Surface geometrical model modification and contact angle prediction for the laser patterned steel surface. Surf. Coat. Int. 205, 2597–2604 (2010) 10. Bahramian, A., Danesh, A.: Prediction of solid – fluid interfacial tension and contact angle. J. Colloid Interface Sci. 279, 206–212 (2004) 11. Chan, M.K., Ng, S.C., Noorkhaniza, S., Choo, C.M.: Statistical analysis on the relationship between membrane properties and the contact angle. In: Innovation and Analytics Conference and Exhibition, Malaysia, pp. 53–57 (2016) 12. Maheswari, P., Barghava, P., Mohan, D.: Preparation, morphology, hydrophilicity and performance of poly (ether-ether-sulfone) incorporated cellulose acetate ultrafiltration membranes. J. Polymer Res. 20(2) (2013) 13. Kanagaraj, P., Neelakandan, S., Nagendran, A.: Preparation, characterization and performance of cellulose acetate ultrafiltration membranes modified by charged surface modifying macromolecule. Korean J. Chem. Eng. 31(6), 1057–1064 (2014). https://doi.org/10.1007/ s11814-014-0018-2
216
A. A. A. bin Mohd Amiruddin et al.
14. Jayalakshmi, A., Rajesh, S., Senthilkumar, S., Mohan, D.: Epoxy functionalized poly(ethersulfone) incorporated cellulose acetate ultrafiltration membrane for the removal of chromium ions. Purif. Technol. 90, 120–135 (2012) 15. Dasgupta, J., et al.: The effects of thermally stable titanium silicon oxide nanoparticles on structure and performance of cellulose acetate ultrafiltration membranes. Sep. Purif. Technol. 133, 55–68 (2014) 16. Jayalakshmi, A., Rajesh, S., Senthilkumar, S., Hari Sankar, H., Mohan, D.: Preparation of poly (isophthalamide-graft-methacrylamide) and its utilization in the modification of cellulose acetate ultrafiltration membranes. J. Ind. Eng. Chem. 20(1), 133–144 (2014) 17. Kee, C., Idris, A.: Modification of cellulose acetate membrane using monosodium glutamate additives prepared by microwave heating. J. Ind. Eng. Chem. 18(6), 2115–2123 (2012) 18. Rajesh, S., Shobana, K., Anitharaj, S., Mohan, D.: Preparation, morphology, performance, and hydrophilicity studies of poly(amide-imide) incorporated cellulose acetate ultrafiltration membranes. Ind. Eng. Chem. Res. 50(9), 5550–5564 (2011) 19. Zirehpour, A., Rahimpour, A., Seyedpour, F., Jahanshahi, M.: Developing new CTA/CAbased membrane containing hydrophilic nanoparticles to enhance the forward osmosis desalination. Desalination 371, 46–57 (2015) 20. Kanagaraj, P., Nagendran, A., Rana, D., Matsuura, T.: Separation of macromolecular proteins and removal of humic acid by cellulose acetate modified UF membranes. Int. J. Biol. Macromol. 89, 81–88 (2016) 21. Hossein Razzaghi, M., Safekordi, A., Tavakolmoghadam, M., Rekabdar, F., Hemmati, M.: Morphological and separation performance study of PVDF/CA blend membranes. J. Membrane Sci. 470, 547–557 (2014) 22. Kong, L., et al.: Superior effect of TEMPO-oxidized cellulose nanofibrils (TOCNs) on the performance of cellulose triacetate (CTA) ultrafiltration membrane. Desalination 332(1), 117–125 (2014) 23. Roy, A., De, S.: Extraction of steviol glycosides using novel cellulose acetate pthalate (CAP) – Polyacrylonitrile blend membranes. J. Food Eng. 126, 7–16 (2014) 24. Banerjee, S., De, S.: An analytical solution of Sherwood number in a stirred continuous cell during steady state ultrafiltration. J. Membr. Sci. 389, 188–196 (2012) 25. Ichwan, M., Son, T.: Preparation and characterization of dense cellulose film for membrane application. J. Appl. Polym. Sci. 124(2), 1409–1418 (2011) 26. Mohamed, M., Salleh, W., Jaafar, J., Ismail, A., Abd. Mutalib, M., Jamil, S.: Feasibility of recycled newspaper as cellulose source for regenerated cellulose membrane fabrication. J. Appl. Polymer Sci. 132(43) (2015) 27. Mahdavi, H., Shahalizade, T.: Preparation, characterization and performance study of cellulose acetate membranes modified by aliphatic hyperbranched polyester. J. Membr. Sci. 473, 256–266 (2015) 28. Extrand, C.: Uncertainty in contact angle measurements from the tangent method. J. Adhes. Sci. Technol. 30, 1597–1601 (2016) 29. Amot, T.C., Field, R.W., Koltuniewicz, A.B.: Cross flow and dead end microfiltration of oily-water emulsions Part II. Mechanisms and modelling of flux decline. J. Memb. Sci. 169, 1–15 (2000)
Feature Engineering Based Credit Card Fraud Detection for Risk Minimization in E-Commerce Md. Moinul Islam1 , Rony Chowdhury Ripan1 , Saralya Roy1 , and Fazle Rahat2(B) 1
Chittagong University of Engineering and Technology, Chittagong, Bangladesh 2 Bangladesh University of Business and Technology, Dhaka, Bangladesh [email protected]
Abstract. In today’s financial business, financial fraud is a rising concern with far-reaching repercussions, and data mining has a crucial role in identifying fraudulent transactions. However, fraud detection in a credit card can be challenging because of significant reasons, such as normal and fraudulent behaviours of the profiles change frequently, scarcity of fraudulent data, dataset being highly imbalanced, and so on. Besides, the efficiency of fraud identification in online transactions is greatly impacted by the dataset sampling method and feature selection. Our study investigates the performance of five popular machine learning approaches such as Logistic Regression (LR), Random Forest (RF), Support Vector Classifier (SVC), Gradient Boosting (GBC), and K-Nearest Neighbors (KNN) in terms of feature selection. Feature selection is done by Sequential Forward Selection in addition to extending the models’ performance by handing imbalanced data using Random Undersampling and feature scaling using PCA transformation & RobustScalar for both numerical and categorical data. Finally, the performance of different machine learning techniques is assessed based on accuracy, precision, recall, and F1measure on a benchmark credit card dataset. Keywords: Credit card fraud detection · Cyber security · Sequential feature selection · Machine learning · Comparative analysis
1
Introduction
In this modern era of technology, credit and debit card usage have increased significantly in the last years, with the increasing amount of fraud. Financial fraud in E-Commerce is a pervasive issue with extensive outcomes for the financial industry, and data mining has been crucial in identifying fraudulent credit card transactions. Because of the increase of fraud transactions, each year, many banks lose billions of money. In conformity with the European Central Bank, overall fraud in the Single Euro Payments Region touched 1.33 billion euros in 2012, increasing 14.8% over 2011. Additionally, payments made through nontraditional channels such as mobile, internet, and others account for 60% of c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 217–226, 2022. https://doi.org/10.1007/978-3-030-93247-3_22
218
M. M. Islam et al.
fraud; in 2008, this figure was 46% [5]. As new fraud patterns arise, the detection system faces new obstacles daily. There has been a significant amount of studies conducted in the field of card fraud detection. Data mining is a common approach for detecting credit theft since it can address a lot of difficulties. Identification of credit card fraud involves classifying transactions into two categories: legitimate (valid) transactions and fraudulent ones. Credit card fraud detection relies on tracking a customer’s spending habits. A variety of approaches have been utilized to tackle these challenges, including AI, genetic algorithms, SVMs, decision trees, and naive bayes [1–3,6,8,16]. Many researchers were conducting credit card fraud detection studies utilizing various machine learning methods to derive information from available credit card fraud data. For instance, Lucas et al. [12] suggested a framework that models a series of credit card transactions using HMM from three distinct binary viewpoints, resulting in eight alternative sets of series from the (training) set of transactions. The Hidden Markov Model (HMM) then models each of these sequences by assigning a probability to each transaction based on its preceding transaction sequence. Finally, these probabilities are employed as extra features in a fraud detection Random Forest classifier. Operating an optimized light gradient boosting machine, Taha et al. [17] proposed an approach for fraud detection in online transactions (OLightGBM). Furthermore, the light gradient boosting machine’s (LightGBM) parameters are tuned using a Bayesian-based hyperparameter optimization approach. Using the Artificial Neural Network(ANN) technique and Backpropagation, Dubey et al. [7] developed a model. The customer’s credit card data is collected initially in this model, which includes numerous attributes such as name, time, last transaction, transaction history, etc. The data is then separated into two categories: train data (80%) and test data (20%), which are used to forecast if the transactions are normal or fraudulent. To incorporate transaction sequences, Jurgovsky et al. [10] formulated the fraud detection problem as a sequential classification problem, using Long Short-Term Memory (LSTM) networks. In a further collation, they discovered that the LSTM outperforms a baseline random forest (RF) classifier in terms of accuracy rate for transactions where the consumer is present physically at the merchant location. To train the behavior aspects of normal and aberrant transactions, Xuan et al. [21] employed two kinds of random forests. The performance of credit fraud detection is then compared between the two random forests. In [15], Fraud-BNC based on Bayesian Network Classifier (BNC) algorithm was introduced for solving problems in detecting credit card fraud. Here, this algorithm automatically generates Fraud-BNC and organizes the information of the algorithms into a classification scheme, then searches for combinations of these components that are most potent for locating credit card fraud in a credit card fraud detection dataset using the Hyper-Heuristic Evolutionary Algorithm (HHEA). Yee et al. [22] demonstrated a comparison of several supervised-based classification methods, such as Tree Augmented Naive Bayes (TAN), Naive Bayes, K2, logistics, and J48 classifiers, in terms of credit card fraud detection in a laboratory setting. In addition, they demonstrated how data
Feature Engineering Based Credit Card Fraud Detection
219
preparation techniques such as standardization and PCA might aid the classifiers in achieving higher levels of accuracy.
2
Methodology
Our research study focuses on analyzing credit card fraud data based on Sequential Forward Selection, a feature selection algorithm using several popular machine learning classification techniques to evaluate their performance for detecting credit card fraud data. The overall methodology of our study is illustrated in two steps, (a) data preprocessing module and (b) feature selection module as shown in Fig. 1.
Fig. 1. Architectural overview of our model
2.1
Dataset
There are numerous bank-related characteristics in credit card datasets that are utilized to create a fraud detection framework from the credit card data. Our dataset comprises 284807 instances, each having 31 features, where categorical feature indicates the amount of money spent during the transaction, V 1 − V 28 features are obtained from P CA transformation, and the “class” feature indicates the binary representation of the target. In this study, for evaluation purposes, we use a benchmark credit card fraud detection dataset from Kaggle [19]. 2.2
Data Preprocessing Module
Out of over 200K transactions in our dataset, 492 were flagged as fraudulent. Fraud rates are 0.172% for all transactions, according to the dataset’s real transaction class. There are numerical results from a PCA transformation in the data due to issues with realism. Users’ transaction confidentially would be violated
220
M. M. Islam et al.
if the major components, as well as additional details on the data, were made public. For our prediction models and analysis, this data frame might include errors, and our classifiers might be overfitted since it expects a large majority of the transactions are legitimate. (1) Handling of Imbalanced Data: There are two main approaches to random re-sampling for imbalanced classification, OverSampling, and UnderSampling. In this study, Random UnderSampling is used for imbalanced class distribution, which essentially consists of deleting data to provide a more balanced dataset and therefore preventing the overfitting of our models. Under-sampling techniques delete instances from the training dataset that belong to the majority class in order to better balance the class distribution, such as reducing the imbalance from a 30:1000 to a 30:30, 30:31, etc., distribution. (2) Feature Scaling: In addition, we have V1-V28 features, which are already scaled due to PCA, and from them, two features (time and amount) are not scaled. These two features need to be scaled. Fraud dataset values are given in different ranges, varying from feature to feature. For instance, Fig. 2a and Fig. 2b show the data distributions of two different features, “time” and “amount”, respectively.
Fig. 2. Data distribution of “Time” (a) and “Amount” (b) features
The value is shallow for some data instances, while it is much higher for some data instances, as seen in Figs. 1 and 2. Therefore the process of data scaling is used to normalize the spectrum of feature values. In order to do this, we used a Robust Scaler that normalizes “time” and “amount” features by removing the median and scales the data according to the Interquartile Range (IQR = 75th percentile of the data - 25th percentile of the data). Besides, Robust Scaler is less prone to outliers.
Feature Engineering Based Credit Card Fraud Detection
2.3
221
Feature Selection
The sequential forward selection (SFS) method is used in this bottom-up approach begins with a null set of output sequence and gradually packs it with the characteristics selected by an evaluation metric, as the search proceeds [13]. Every iteration, one feature is removed from the feature list, and a new one is added from the features that haven’t been included yet. As a consequence, compared to other new features, the enlarged feature set should generate the least amount of erroneous classification. It’s commonly used because of how quickly and easily it works. 2.4
Applying Machine Learning Techniques
Using a variety of machine learning classification algorithms, for instance, Logistic Regression (LR), Random Forest (RF), SVM, Gradient Boosting Classifier (GBC), and K-Nearest Neighbor (KNN) on the credit card fraud detection dataset, we can accurately predict whether or not a transaction is fraudulent. Classification algorithms such as Logistic Regression [18] utilize observations to classify them according to a probability distribution. A more complicated cost function is used in Logistic Regression in place of a linear function, which is defined as “Sigmoid Function” or the “logistic function.” We use the Sigmoid Function (σ) that tends to limit the output of the Logistic Regression Model between 0 and 1. We apply Random Forest classifier [9] that is a supervised algorithm that’s based on the decisions using tree models. Using a bootstrap sampling method, at first, it generates K from our dataset to specific training data subsets. After that, by training these subsets, it creates decision trees of K. The categorization of each item in the test dataset is predicted by all of the decision trees based on their votes. We then apply Support Vector Classifier [20] that addresses issues that have not been addressed without any intermediate issues. At first, each data point in n-dimensional space is plotted by the SVM classifier, where n is the number of features of a dataset to figure out the hyper-plane, which differentiates very well between the two classes and performs binary classification with “GridSearchCV” [14] to perform hyperparameter tuning to get the optimum values for SVM and also ’RBF kernel was used as learning parameter. Gradient Boosting Classifier [11] depends on a loss function. A gradient boosting model’s additive aspect comes from the fact that trees are added to the model over time, and as this occurs, the current trees are not changed; their values remain constant. The KNN algorithm is [4] based on the similar items that occur near each other. This algorithm differentiates between the current data point. Next, it stores each data point’s index and distance in sorted order, pointing to a list. Then, the sorted array is parsed, and all instances are classified as belonging to the mode of K.
3
Performance Evaluation
To demonstrate the experimental outcomes, we utilize assessment metrics for example Precision, Accuracy, Recall and F1-Score. A false positive (F P ) in terms
222
M. M. Islam et al.
of detecting credit card fraud means that a non-fraudulent (actually negative) transaction has been categorized as fraud. It’s important to remember that recall is specified as the proportion of genuine positive elements to the entire number of positive elements, i.e., the sum of the true positive (T P ) and the false negatives (F N ). To calculate accuracy, divide the total number of occurrences by the proportion of true positives (T P ) and true negatives (T N ). F 1-score is calculated by averaging accuracy and recall over a certain time period. T rue P ositive T rue P ositive + F alse P ositive T rue P ositive Recall = T rue P ositive + F alse N egative TP + TN Accuracy = TP + TN + FP + FN P recision × Recall F 1 − Score = 2 × P recision + Recall
P recision =
(1) (2) (3) (4)
Handling of Imbalance Data: Like we mentioned in Sect. 2, the dataset is very asymmetrical, with the real transaction class showing that fraud occurs at a rate of 0.172% of all transactions. Another way we can see it as barplot representation of class values as shown in Fig. 3a, where “normal” transactions are much greater than the number of “fraud.” Using the “Random Under-Sampling” method, the imbalance problem is solved in this study which is illustrated in Fig. 3b.
Fig. 3. Bar plot representation of “class” feature before (a) and after (b) using random under-sampling
Feature Scaling: After handling imbalanced data in our dataset, features were scaled using PCA transformation & RobustScaler method. After scaling, “Time” in Fig. 4a and “Amount” in Fig. 4 (b) features have shown much better distribution compared to before as shown in Fig. 2.
Feature Engineering Based Credit Card Fraud Detection
223
Fig. 4. (a) Data distribution of “Time” (a) and “Amount” (b) features after Scaling
Comparison Results: After that, we apply several popular machine learning algorithms, Logistic Regression (LR), Random Forest (RF), Support Vector Machine (SVC), Gradient Boosting (GBC), and K-Nearest Neighbors (KNN). Before applying each classification algorithm, feature selection is done using Selection Forward Selection (SFS) algorithm. In this study, SFS returns the top 15 features from all the features. To assess the action of various classification techniques before and after SFS, we have shown the performance metrics in Table 1 and Table 2 on the basis of precision, accuracy, recall, and F1-score respectively. Table 1. Without feature selection
Table 2. With feature selection
Accuracy Precision Recall F1-score (%) (%) (%) (%)
Accuracy Precision Recall F1-score (%) (%) (%) (%)
KNN 91.88
93
92
92
KNN 95.43
RF
92.89
94
93
93
RF
SVC 92.39
93
92
92
LR
91
91
94
94
90.86
GBC 93.91
96
95
95
96
96
96
SVC 95.43
96
95
95
91
LR
91.37
92
91
91
94
GBC 95.43
96
95
95
95.94
Table 1 shows that classifier Gradient Boosting Classifier(GBC) without using SFS achieves better results than other classifiers to predict credit card fraud data. Gradient Boosting Classifier (GBC) achieves the highest accuracy (93.91%) and 94% in precision, recall, and F1-score. After using the SFS algorithm, the Random Forest classifier achieves a better result (95.94% accuracy) than other classifiers to predict credit card fraud on our dataset as shown in Table 2. It is observed that all classification algorithms except Logistic Regression (LR) have shown much improvement in accuracy after using SFS.
224
M. M. Islam et al.
Fig. 5. Comparison of Classification Accuracy before and after Feature Selection
Besides, a bar chart of comparison of Accuracy before and after using the “SFS” method is shown in Fig. 5. In addition, the Receiver Operating Characteristic (ROC) curves of the classifiers with and without Sequential Forward Selection (SFS) are represented in Fig. 6. From Fig. 6a, it is noted that RF, SVM, and GBC have the highest AUROC score of 0.98 before using the SFS method. From Fig. 6b, it is observed that only RF has the highest AUROC score of 0.99 after using feature selection.
Fig. 6. ROC curve of classifiers before (a) and after (b) using SFS
4
Conclusion and Future Work
On the fraud detection benchmark dataset, we investigated several machine learning approaches in terms of feature selection utilizing Selection Forward Selection (SFS). To successfully detect fraud or legitimate transactions, we used a
Feature Engineering Based Credit Card Fraud Detection
225
variety of prominent machine learning techniques on our dataset, including Logistic Regression (LR), Random Forest (RF), Support Vector Machine (SVM), Gradient Boosting Classifier (GBC), and K-Nearest Neighbor (KNN). Our experimental results indicate that while Gradient Boosting Classifier (GBC) obtains higher accuracy without feature selection, Random Forest achieves higher accuracy after feature selection. Additionally, it is noted that all classification algorithms except Logistic Regression (LR) improved significantly following feature selection. This evaluation may assist financial companies in preventing fraudulent transactions early on by applying this model and making more informed judgments on how to manage fraudulent transactions, therefore saving people money. We hope that our experimental investigation will aid in the development of a control strategy for preventing future fraud transactions.
References 1. Vasant, P., Zelinka, I., Weber, G.-W. (eds.): ICO 2018. AISC, vol. 866. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-00979-3 2. Vasant, P., Zelinka, I., Weber, G.-W. (eds.): ICO 2019. AISC, vol. 1072. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-33585-4 3. Vasant, P., Zelinka, I., Weber, G.-W. (eds.): ICO 2020. AISC, vol. 1324. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-68154-8 4. Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46(3), 175–185 (1992) 5. Bahnsen, A.C., Aouada, D., Stojanovic, A., Ottersten, B.: Feature engineering strategies for credit card fraud detection. Expert Syst. Appl. 51, 134–142 (2016) 6. Bahnsen, A.C., Stojanovic, A., Aouada, D., Ottersten, B.: Improving credit card fraud detection with calibrated probabilities. In: Proceedings of the 2014 SIAM International Conference on Data Mining, pp. 677–685. SIAM (2014) 7. Dubey, S.C., Mundhe, K.S., Kadam, A.A.: Credit card fraud detection using artificial neural network and backpropagation. In: 2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS), pp. 268–273. IEEE (2020) 8. Gaikwad, J.R., Deshmane, A.B., Somavanshi, H.V., Patil, S.V., Badgujar, R.A.: Credit card fraud detection using decision tree induction algorithm. Int. J. Innov. Tech. Explori. Eng. (IJITEE) 4(6), 66 (2014) 9. Ho, T.K.: Random decision forests. In: Proceedings of 3rd International Conference on Document Analysis and Recognition, vol. 1, pp. 278–282. IEEE (1995) 10. Jurgovsky, J., et al.: Sequence classification for credit-card fraud detection. Expert Syst. Appl. 100, 234–245 (2018) 11. Li, C.: A gentle introduction to gradient boosting (2016). http://www.ccs.neu.edu/ home/vip/teach/MLcourse/4 boosting/slides/gradient boosting.pdf 12. Lucas, Y., et al.: Towards automated feature engineering for credit card fraud detection using multi-perspective HMMS. Future Gener. Comput. Syst. 102, 393– 402 (2020) 13. Marcano-Cede˜ no, A., Quintanilla-Dom´ınguez, J., Cortina-Januchs, M., Andina, D.: Feature selection using sequential forward selection and classification applying artificial metaplasticity neural network. In: IECON 2010–36th Annual Conference on IEEE Industrial Electronics Society, pp. 2845–2850. IEEE (2010) 14. Paper, D.: Scikit-Learn Classifier Tuning from Complex Training Sets, pp. 165–188. Apress, Berkeley (2020)
226
M. M. Islam et al.
15. de S´ a, A.G., Pereira, A.C., Pappa, G.L.: A customized classification algorithm for credit card fraud detection. Eng. Appl. Artif. Intell. 72, 21–29 (2018) 16. Singh, G., Gupta, R., Rastogi, A., Chandel, M.D., Ahmad, R.: A machine learning approach for detection of fraud based on SVM. Int. J. Sci. Eng. Technol. 1(3), 192–196 (2012) 17. Taha, A.A., Malebary, S.J.: An intelligent approach to credit card fraud detection using an optimized light gradient boosting machine. IEEE Access 8, 25579–25587 (2020) 18. Tolles, J., Meurer, W.J.: Logistic regression: relating patient characteristics to outcomes. JAMA 316(5), 533–534 (2016) 19. ULB, M.L.G.: Credit card fraud detection (2018). https://www.kaggle.com/mlgulb/creditcardfraud 20. Vapnik, V.N.: An overview of statistical learning theory. IEEE Trans. Neural Netw. 10(5), 988–999 (1999) 21. Xuan, S., Liu, G., Li, Z., Zheng, L., Wang, S., Jiang, C.: Random forest for credit card fraud detection. In: 2018 IEEE 15th International Conference on Networking, Sensing and Control (ICNSC), pp. 1–6. IEEE (2018) 22. Yee, O.S., Sagadevan, S., Malim, N.H.A.H.: Credit card fraud detection using machine learning as data mining technique. J. Telecommun. Electron. Comput. Eng. (JTEC) 10(1–4), 23–27 (2018)
DCNN-LSTM Based Audio Classification Combining Multiple Feature Engineering and Data Augmentation Techniques Md. Moinul Islam1 , Monjurul Haque2 , Saiful Islam3 , Md. Zesun Ahmed Mia4,5(B) , and S. M. A. Mohaiminur Rahman1 1
4
Chittagong University of Engineering and Technology, Chittagong, Bangladesh 2 Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh 3 Ahsanullah University of Science and Technology, Dhaka, Bangladesh Bangladesh University of Engineering and Technology (BUET), Dhaka, Bangladesh 5 University of Liberal Arts Bangladesh (ULAB), Dhaka, Bangladesh
Abstract. Everything we know is based on our brain’s ability to process sensory data. Hearing is a crucial sense for our ability to learn. Sound is essential for a wide range of activities such as exchanging information, interacting with others, and so on. To convert the sound electrically, the role of the audio signal comes into play. Because of the countless essential applications, audio signal & their classification poses an important value. However, in this day and age, classifying audio signals remains a difficult task. To classify audio signals more accurately and effectively, we have proposed a new model. In this study, we’ve applied a brandnew method for audio classification that combines the strengths of Deep Convolutional Neural Network (DCNN) and Long-Short Term Memory (LSTM) models with a unique combination of feature engineering to get the best possible outcome. Here, we have integrated data augmentation and feature extraction together before fitting it into the model to evaluate the performance. There is a higher degree of accuracy observed after the experiment. To validate the efficacy of our model, a comparative analysis has been made with the latest conducted reference works. Keywords: DCNN-LSTM · Spectrograms · Short Time Fourier Transform · Data augmentation · Spectral feature extraction · MFCC Melspectrogram · Chroma STFT · Tonnetz
1
·
Introduction
Digital and analog audio signals both use a varying amount of electrical voltage to delineate sound. Our daily lives depend heavily on audio signals of various origins. No one would be able to hear anything without it. Audio signals are now required not just by humans, but also by man-made machines. Humanlike sound comprehension has several uses, involving intelligent machine control and monitoring, acoustic information use, acoustic surveillance, and categorization and information extraction applications such as exploring audio archives c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 227–236, 2022. https://doi.org/10.1007/978-3-030-93247-3_23
228
M. M. Islam et al.
and audio-assisted multimedia assets [9]. For many years, categorizing audio or sound has been an important area of research. Intending to achieve this classification, multiple models and features have been tried and experimented with over the years, all of which have proved to be helpful and accurate in the process of classifying and separating audio and sound. Many possible applications exist in the area of sound detection and classification, including matrix factorization, the categorization of music genres, wavelet filterbanks, automated music tagging, dictionary learning, bird song classifications, IoT embedded automated audio categorization, and emotion recognition [1–3,6,8,12]. Since deep learning was introduced, it has boosted research in various fields and swiftly superseded traditional machine learning algorithms by exhibiting superior performance on numerous tasks. With or without Artificial Intelligence, there are countless possible approaches for developing audio recognition and classification models that use various audio feature extraction procedures. The detection and categorization of ambient sound is a fascinating subject with several applications, ranging from crime detection to environmental context-aware analysis. For audio classification, prominent classifier models include those that use sensible artificial intelligence or linear predictive coding, as well as those Deep Neural Networks, Decision Tree Classifier, and Random Forest. A few contributions have been made to the field of audio categorization. In recent research studies, convolutional neural networks were shown to be very efficient in classifying brief audio samples of ambient noises. The authors in [11] used the publicly accessible ESC-10, ESC-50, and Urbansound8K data sets and enhanced them by adding arbitrary temporal delays to the original tracks and conducted class-dependent time stretching and pitch shifting on the ESC-10 training set, as well as extracted Log-scaled mel-spectrograms from all recordings, to develop a model composed of two convolutional ReLU layers with maxpooling, two fully connected layers of each ReLU, and a softmax output layer trained on a low-level audio data representation. The authors used 300 epochs for the short segment version and 150 epochs for the long segment variant and tested the model using fivefold cross-validation (ESC-10 and ESC-50) and tenfold cross-validation (UrbanSound8K) with a single training fold to show that CNN outperformed solutions based on manually-engineered features. Palanisamy et al. [10] showed that standard deep CNN models trained on ImageNet might be used as strong foundation networks for audio categorization. They claimed that just by fine-tuning basic pre-trained ImageNet models with such a single set of input character traits for audio tasks, they could achieve cutting-edge results on the UrbanSound8K and ESC-50 datasets, as well as good performance on the GTZAN datasets, and to define spectrograms using qualitative visuals, CNN models might learn the bounds of the energy distributions in the spectrograms. Abdoli et al. [4] presented a method for classifying ambient sound that uses a 1D Convolutional Neural Network (CNN) to attempt to acquire a representation straight from the audio input in order to capture the signal’s precise temporal characteristic. The performance of their proposed end-to-end approach for detecting ambient noises was found to be 89% accurate. The suggested
DCNN-LSTM Based Audio Classification Combining
229
end-to-end 1D design for ambient sound categorization employs lesser parameters than the bulk of previous CNN architectures while reaching a mean accuracy of 11.24% to 27.14% greater than equivalent 2D designs. In our research, we’ve introduced an entirely new technique to audio classification strategy by integrating two separate models: deep CNN and LSTM. Before we train the data in our newly proposed model, we used a unique combination of feature engineering methods to discover the best results. There are three phases to sound classification: audio signal preprocessing, spectral feature extraction, and classification of the corresponding audio signal. The Urbansound8K dataset has been utilized for audio categorization by our team. There are 8732 audio slices in total that have been tagged in this dataset. There are ten groups in the audio file which entails air conditioning, car horns, children’s laughter, dog barks, engine idle, gunshots, jackhammers, sirens, and street music and all of them are examples of ambient noise. Data augmentation is first used to improve the model’s training results so that it can yield good results. Three data augmentation methods were investigated: time-stretching, noise introduction, and pitch shifting. To convert audio data to numerical values, we used the NumPy array in python. The audio was then transformed using spectral features via Fourier Transform from the time domain to the frequency domain. In addition to Zero Crossing Rate, Chroma STFT (Short-Time Fourier Transform), MFCC (Melfrequency Cepstral Coefficient), Mel spectrogram, RMS, and Tonnetz, we have also computed a number of feature extraction approaches like these. Spectral feature extraction approaches are being combined to create a new model. After that, the 34928 numerical data with a total field of 5867904 have been integrated using data augmentation and spectral feature extraction before training the data into the model. We trained with 80% of the data, tested with 10%, and validated with 10%. Finally, we’ve trained the data with our recommended model, a hybrid of deep CNN and LSTM. There are three layers in a deep CNN. We have used Adam optimizer for improved optimization. Hyperparameter tuning uses batch normalization, maximum pooling, and dropout all at once. ReLU and Softmax were used to fit the model, and Softmax was also employed for the output layer. The LSTM model’s input layer receives data from the output layer. The LSTM model makes use of two levels. As with deep CNN, we used Adam optimizer and activation functions like ReLU and Softmax to fit the model better and improve tuning. However, in this case, dropout was used for hyperparameter tuning. After that, the accuracy of audio classification was significantly enhanced. Finally, our novel model has been compared to the models from other recent reference works in order to highlight its worth.
2
Methodology
The overall methodology of our suggested audio classification model is described in this section. We have used a benchmark dataset UrbanSound8K [13] for validating our model. This dataset contains 8732 brief audio samples (with a duration of 4 s or less) taken from a variety of urban recordings, including air condi-
230
M. M. Islam et al.
tioners, vehicle horns, kids playing, barking dogs, drilling, engine revving, gunshots, jackhammers, sirens, and street music, among other things. This dataset is divided into ten (ten) classes, as stated above. It was found that vehicle horns, gunshots, and siren noises were not uniformly dispersed throughout the classrooms. 2.1
Data Augmentation
Data augmentation is a simple technique for generating synthetic data with variations from the current or existing samples to offer the model with larger data samples with more variety, allowing the model to prevent overfitting and be more generalized. There are several augmentation methods in audio, such as Noise Injection, Time Shifting, Pitch Shifting, Changing Speed, Time Stretching, and others. This research has adopted three data augmentation techniques: Background Noise Injection, Pitch Shifting, and Time Stretching. In Noise Injection, the sample data was merged with a separate recording that includes external noise from a variety of acoustics. Each data was generated by, m = xi (1 − w) + wyi
(1)
where xi is the original audio sample of the dataset, yi is the background noise that is injected, and w is the weighted parameter chosen randomly for each merge within a range of 0.001 to 0.009. During Pitch Shifting, the pitch of the audio samples is either increasingly or decreasingly shifted based on a particular value. Each data was pitch-shifted by [−2, −1, 1, 2]. Time stretching is an audio processing technique that lengthens or shortens the duration of a sample without altering its pitch. The augmentation techniques were applied using the Librosa library. Figure 1a, 1b and 1c illustrate the data augmentation techniques applied in the dataset. 2.2
Spectral Feature Extraction
When using feature extraction, the acoustic signal is transformed into a series of acoustic feature vectors that accurately describe the input audio sound. The goal is to condense the several massive amounts of data in each file into a considerably smaller collection of characteristics with a known number. We have used spectral characteristics to solve our classification problem, which involves utilizing the Short-Time Fourier Transformation to transform the enhanced audio samples from time domain to frequency domain displayed in Fig. 2a There are numerous spectral features. Among them, we have employed six: Zero Crossing Rate, Chroma STFT, MFCC, Mel spectrogram, Tonnetz, and computing RMS value for each frame. Figure 2b, 2c and 2d represent the plotting of spectrogram for each feature extraction technique.
DCNN-LSTM Based Audio Classification Combining
(a) After Background Noise Injection
231
(b) After Time Stretching
(c) After Pitch Shifting
Fig. 1. Data augmentation illustration
(a) Audio Conversion to Frequency Domain
(c) Mel Frequency Cepstral Coefficients
(b) Mel-Scaled Power Spectrogram
(d) Chroma Short Time Fourier Transform
Fig. 2. Audio conversion & spectral feature extraction
Zero-Crossing Rate indicates that how many times the signal shifts from positive to negative and vice-versa, and that will be divided by the frame duration [7], where sgn is the sign function.
232
M. M. Islam et al.
Zi =
wL 1 |sgn[xi (n)] − sgn[xi (n − 1)]| 2wL n=1
(2)
The Chroma rate of an audio signal depicts the strength of each of the audio signal’s twelve distinct pitch classes. They can be used to distinguish between the pitch class profiles of audio streams. Chroma STFT contains information regarding pitch and signal structure categorization and uses short-term Fourier transform to generate Chroma properties. MFCC stands for Mel Frequency Cepstral Coefficients are concise representations of the spectrum. By transforming the conventional frequency to Mel Scale, MFCC takes into consideration human perception for sensitivity at correct frequencies. Mel spectrogram is a combination of Mel scale and spectrogram, whereas Mel scale denotes the frequency scale’s nonlinear transformation. The y-axis indicates Mel scale, while the x-axis depicts time. Tonnetz detects harmonic shifts in audio recordings to calculate tonal centroid features. It is an infinite planar representation of pitch relationships in an audio sample. For feature scaling purposes in our proposed method, we have utilized two standard techniques, ‘One Hot Encoding’ and ‘Standard Scaler’. One hot encoding replaces the label encoded categorical data with numbers. It is a standardization technique to scale the independent features to bring them in the same fixed range. The Standard Normal Distribution (SND) is followed by StandardScaler. That’s why the mean is set to 0, and the data is scaled to unit variance. 2.3
Deep CNN-LSTM Model Architecture
In the DCNN-LSTM design, CNN layers for feature extraction on input data are integrated with LSTMs to provide sequence prediction, resulting in a highly efficient feature extraction system. We combined CNN and LSTM models, both of which use spectrograms as their input. In order to generate a DCNN-LSTM model, Deep CNN layers on the front end were combined with LSTM layers and a Dense layer on the output. In this architecture, two sub-models are used for feature extraction and feature interpretation across a large number of iterations: the Deep CNN Model for extracting features and the LSTM Model for feature interpretation (Fig. 3). We presented a model that consists of three layers of 2D convolutional networks, and two layers of MaxPooling2D arranged into a stack of the desired depth. These layers assess the spectral properties of the spectrograms, while the pooling layers help solidify the interpretation. The Conv2D layers interpret the spectrum characteristics of the spectrograms, and the pooling layers consolidate the interpretation. The first Conv2D layer that processes the input shape uses 64 filters, a kernel size of 5, and a stride of 1 before applying a MaxPooling layer to decrease the size of the input shape. Our framework utilizes a 5x5 filtered matrix as the argument defines the kernel’s window’s size. Due to stride being set to 1 in the first layer, the filter moves one unit to converge around the input volume. Using the ‘same padding’ technique, this convolutional layer yields the
DCNN-LSTM Based Audio Classification Combining
233
Fig. 3. Overall proposed methodology
same height and weight as the original. We chose ReLU as the activation function for this layer instead of sigmoid units because of its many advantages over more conventional units, including efficient gradient propagation and faster calculation than sigmoid units. Also, in order to minimize overfitting, there is a dropout of 0.3 in the layers that uses the same padding method and activation function (ReLU) as the other two Conv2D layers. To stack the LSTM layers, we first created two LSTM layers with a total hidden unit count of 128 for each layer, and then we set the return sequence to true in order to stack the layers. To avoid overfitting, the output of both LSTM layers requires a 3D array followed by Time Distributed Dense layers as input with a dropout of 0.2 to be used. As a result, it was determined that ReLU would be used as the activation function in both layers with input sizes of 64 and 128 for the first layer since its input shape is (21,8), which indicates 20 iterations and will inform LSTM how many numbers of instances it should go through the process once the input has been applied. Afterward, the outcome from the time dispersed dense layer is utilized as the input in the flatten layer, and the process repeats until the desired result is achieved. When we were finished with the flattening process, we were left with a vector of input data, which we then passed through the dense layer. We were able to transform the information provided to a discrete probability distribution and use that distribution as an input in the dense layer by utilizing the Softmax activation function in the dense layer of the network. We have utilized the Adam, an optimization technique, which measures the rate of development at which a parameter adapts to changes in its environment. The Adam optimizer outperforms the previous ones in terms of performance and provides a gradient descent that is tuned. For individual parameters, the adaptive learning rates are used to estimate the appropriate level of learning. In many circumstances, it has been shown that Adam favors error surfaces with flat minima, which is a good optimizer. The parameters β1 and β2 only specify the periods over which the learning rates degrade, and not necessarily the learning
234
M. M. Islam et al.
rate itself. The acquisition rates will be all over the place if they degrade quickly. It will take a long time to learn the learning rates if they degrade slowly. The learning rates are automatically calculated based on a moving estimate of the parameter gradient, and the parameter squared gradient in all circumstances.
3
Results
Our research strategy involved identifying attributes that are both effective and accurate for the DCNN-LSTM model. In this section, we assess our model in terms of experiments conducted. We also evaluate the effectiveness of our proposed ensemble method, pre-trained weights, and finally compare to some of the previous state-of-the-art models.
(a) Loss vs no. of epochs
(b) Accuracy vs no. of epochs
Fig. 4. Validation loss & accuracy of our proposed model
In the data preprocessing module, we stacked three data augmentation techniques, background noise injection, time-stretching, and pitch shifting, to reduce overfitting & evaluate the performance of our model. To extract spectral features from the spectrograms, MFCCs, Mel Spectrogram, Chroma STFT, Tonnetz were stacked with one another in addition to computing zero-crossing rate (ZRC) and Root Mean Square (RMS) value for each frame of the audio data and obtained 169 features in total to work with. Stacking the techniques was effective in enhancing our model’s performance considerably. We then fed the data into our proposed DCNN-LSTM model illustrated in Sect. 2.3, evaluated performance metrics, and validated the model for the dataset. Stacking those techniques helped us to reach a better validation accuracy of 93.19% with an epoch size of 26 and used Stratified 10-fold cross-validation to ensure the robustness of the result in terms of modeling CNN (layer 3 Conv2D, epochs of 50) with 86.1% and LSTM (layer 2, epochs of 200) with 87.75% respectively for the training process. Figure 4 illustrates the validation loss and accuracy of our proposed DCNN-LSTM model in the y-axis and the number of epochs in the x-axis as we can see that with the increase in the number of epochs, the validation error of our model decreases exponentially for
DCNN-LSTM Based Audio Classification Combining
235
both the training and testing data. The epoch count was set at 50. Still, the error stopped improving after 26 epochs and returned the accuracy and loss result due to the early Callback function without further increasing the computational time for the model. Table 1 shows the comparison accuracy of our proposed model with the previous state-of-the-art models. Table 1. Proposed model vs previous state-of-the-art models Model
Dataset
Accuracy (%)
logmel-CNN [16]
ESC-50
78.3
DCNN + Mix-up [17]
UrbanSound8K 83.7
DenseNet (Pretrained Ensemble) [10]
UrbanSound8K 87.42
Conv1D + Gammatone [4]
UrbanSound8K 89
DCNN with Multiple Features + Mix-up [14] ESC-50
4
88.5
GoogleNet [5]
UrbanSound8K 93
TSCNN-DS [15]
UrbanSound8K 97.2
Proposed DCNN-LSTM + Stacked Features & Augmentation
UrbanSound8K 93.19
Conclusion
This paper proposes an approach to urban sound classification, which comprises a deep neural network of two different neural network models, CNN and LSTM. Also, in combination with two separate stacks of various multiple data augmentation and feature extraction techniques. UrbanSound8K has been used to train and test our models, one of the finest datasets of this domain. With the aforementioned feature engineering, training, validating, and testing the model on this dataset assists us to acquire a decent result of 93.19% accuracy, which is pretty much close to state-of-the-art result and better than other previous works. Though we have emphasized data augmentation on a single dataset, the comparison would be more relevant if we could also work with other prominent datasets. Our model’s such accuracy comes without any usage of pre-trained models and transfer learning. So, there remains a scope of future work of using these two, possibly improving our existing accuracy. Moreover, a simple stack of DCNN-LSTM has been effectively used for urban sound classification and has achieved a high score, and it is a matter of future research that whether various combinations of more sophisticated models of recurrent neural networks or convolutional neural networks can bring much better score.
References 1. Vasant, P., Zelinka, I., Weber, G.-W.: Intelligent Computing and Optimization. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-00979-3. ISBN 978-3030-00978-6
236
M. M. Islam et al.
2. Vasant, P., Zelinka, I., Weber, G.-W.: Intelligent Computing and Optimization. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33585-4. ISBN 978-3030-33585-4 3. Vasant, P., Zelinka, I., Weber, G.-W.: Intelligent Computing and Optimization. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-68154-8 4. Abdoli, S., Cardinal, P., Koerich, A.L.: End-to-end environmental sound classification using a 1D convolutional neural network. Expert Syst. Appl. 136, 252–263 (2019) 5. Boddapati, V., Petef, A., Rasmusson, J., Lundberg, L.: Classifying environmental sounds using image recognition networks. Procedia Comput. Sci. 112, 2048– 2056 (2017). Knowledge-Based and Intelligent Information & Engineering Systems: Proceedings of the 21st International Conference, KES-20176-8 September 2017, Marseille, France 6. Costa, Y.M., Oliveira, L.S., Silla, C.N.: An evaluation of convolutional neural networks for music classification using spectrograms. Appl. Soft Comput. 52, 28–38 (2017) 7. Giannakopoulos, T., Pikrakis, A.: Audio features. In: Giannakopoulos, T., Pikrakis, A. (eds.) Introduction to Audio Analysis, pp. 59–103. Academic Press, Oxford (2014) 8. Hershey, S., et al.: CNN architectures for large-scale audio classification (2017) 9. Li, J., Dai, W., Metze, F., Qu, S., Das, S.: A comparison of deep learning methods for environmental sound detection. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 126–130. IEEE (2017) 10. Palanisamy, K., Singhania, D., Yao, A.: Rethinking CNN models for audio classification (2020) 11. Piczak, K.J.: Environmental sound classification with convolutional neural networks. In: 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6. IEEE (2015) 12. Salamon, J., Bello, J.P.: Unsupervised feature learning for urban sound classification. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 171–175. IEEE (2015) 13. Salamon, J., Jacoby, C., Bello, J.P.: A dataset and taxonomy for urban sound research. In: Proceedings of the 22nd ACM International Conference on Multimedia, MM 2014, pp. 1041–1044. Association for Computing Machinery, New York (2014) 14. Sharma, J., Granmo, O.C., Goodwin, M.: Environment sound classification using multiple feature channels and attention based deep convolutional neural network. In: INTERSPEECH, pp. 1186–1190 (2020) 15. Su, Y., Zhang, K., Wang, J., Madani, K.: Environment sound classification using a two-stream CNN based on decision-level fusion. Sensors 19(7), 1733 (2019) 16. Tokozume, Y., Harada, T.: Learning environmental sounds with end-to-end convolutional neural network. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2721–2725. IEEE (2017) 17. Zhang, Z., Xu, S., Cao, S., Zhang, S.: Deep convolutional neural network with mixup for environmental sound classification. In: Lai, J.-H., et al. (eds.) PRCV 2018. LNCS, vol. 11257, pp. 356–367. Springer, Cham (2018). https://doi.org/10. 1007/978-3-030-03335-4 31
Sentiment Analysis: Developing an Efficient Model Based on Machine Learning and Deep Learning Approaches Said Gadri(&), Safia Chabira, Sara Ould Mehieddine, and Khadidja Herizi Laboratory of Informatics and Its Applications of M’sila LIAM, Department of Computer Science, Faculty of Mathematics and Informatics, Univ. Mohamed Boudiaf of M’sila, M’Sila, Algeria {said.kadri,safia.chabira,sara.omah, herizikha}@univ-msila.dz
Abstract. Sentiment analysis is a subfield of text mining. It is the process of categorizing opinions expressed in a piece of text. a simple form of such analysis would be to predict whether the opinion about something is positive or negative (polarity). The present paper proposes an efficient sentiment analysis model based on machine learning ML and deep learning DL approaches. A DNN (Deep Neural Network) model is used to extract the relevant features from customer reviews, perform a training task on almost of samples of the dataset, validate the model on a small subset called the test set and consequently compute the accuracy of sentiment classification. For the programming stage, we benefited from the large opportunities offered by Python language, as well as Tensorflow and Keras libraries. Keywords: Machine learning Deep learning Natural language processing Social media
Artificial Neural Networks
1 Introduction Today social media such as Twitter, Facebook, Instagram, etc., become an important means that allow people to share their opinions and sentiments about a product they want to buy or to express their views about a particular topic, company service, or political event [1]. Many business companies need to process these sentiments/opinions and exploit them in many interesting applications such as improving the quality of their services, drawing efficient business strategies, and achieve a large number of customers [2]. In our days, sentiment analysis SA is considered among the hottest research topic in NLP and text mining fields. It can be defined as the process of extracting automatically the relevant information that expresses the opinion of the user about a given topic [1, 3]. A simple form of such analysis would be to predict whether the opinion about something is positive, negative, or neutral (polarity). There exist other forms of sentiment analysis or opinion like predicting rating scale on product’s review, predicting polarity on different aspects of the product, detecting subjectivity and © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 237–247, 2022. https://doi.org/10.1007/978-3-030-93247-3_24
238
S. Gadri et al.
objectivity in sentences, etc. [1, 2, 4]. SA is useful in a wide range of applications, notably: business activities, government services, biomedicine, recommender systems. For instance, in the domain of e-business, companies can study customers’ feedback relative to a product in order to provide better: products, services, marketing strategies and to attract more customers [2]. In the field of recommender systems, we use SA to improve recommendations for books, movies, hotels, restaurants, and many other services [5]. There exist four approaches to process the problem of sentiment analysis, including lexicon-based approach, machine learning approach, deep learning approach, and hybrid approach [1]. The lexicon-based approach was the first approach that has been used by researchers for the task of sentiment analysis. It is based on two principal techniques: the dictionary-based technique which is performed using a dictionary of terms like those in wordnet, and the corpus-based technique which is based on a statistical analysis of the content of documents combined with some statistical algorithms such as hidden Markov models HMMs [6], the Conditional Random Field CRF [7]. The machine learning approach [8] is proposed by many researchers for SA, and based on classical ML algorithms, such as Naïve Bayes NB [9], Support Vector Machine SVM [10], etc. Deep learning approach is recently proposed by researchers and know a large success in many fields, such as computer vision [11–13], image processing [14, 15], object detection [16, 17], network optimization [18], sensor networks [19, 20], system security [21]. It gives better results in terms of accuracy but needs massive data. Many models are currently used, including DNN, CNN, RNN, LSTM. Our main objective in this work is to classify opinions expressed by customers by short reviews to determine whether the reviews’ sentiment towards the movie service is positive or negative. For this purpose, we used the traditional machine learning ML approach and the deep learning DL approach. For the first approach, we applied many ML algorithms including LR, NB, SVM. For the DL approach, we built a DNN model to perform the same task. We finished our work by establishing a comparison between the two approaches.
2 Related Work In 2016, Gao et al. [22] developed a sentiment analysis system using the Adaboost algorithm combined with a CNN model. They performed their experimental work on the movie reviews IMDB dataset. Their main objective was to study the contribution of different filters lengths and exploiting their potential in the final polarity of the sentence. Singhal et al. [23] presented a survey of sentiment analysis and deep learning areas. The survey comprises well-known models such as CNN, RNTN, RNN, LSTM. They applied their experiments on many datasets, including sentiment treebank, movie reviews, MPQA, customer reviews. Al-Sallab et al. [24] realized an opinion mining in Arabic using an RNN model. For the dataset, they used many datasets, namely: online comments from QALB, Twitter, newswire articles written in MSA providing complete and comprehensive input features for autoencoder. Preethi et al. [5] realized a sentiment analysis for recommender systems in the cloud. They used RNN end NB classifiers and the Amazon dataset. The main performed task is to recommend the places that are new for the user’s current location by analyzing the different reviews and computing the
Sentiment Analysis: Developing an Efficient Model Based on Machine Learning
239
score grounded on them. Jangid et al. [25] developed a financial sentiment analysis system and used many models: CNN, LSTM, RNN, and for dataset they used financial tweets. The main performed work was the realization of aspect-based sentiment analysis. Zhang et al. [26] presented a detailed survey of deep learning for the sentiment analysis field. They used: CNN, DNN, RNN, LSTM models. For datasets, they performed experiments on social network sites. As principal tasks realized in this work: sentiment analysis with word embedding, sarcasm analysis, emotion analysis, multimodal data for sentiment analysis. Wu et al. [27] realized sentiment analysis with variational autoencoder using LSTM and Bi-LSTM algorithms and StockTwits dataset. Many interesting tasks have been performed through this work such as encoding and decoding, sentiment prediction. Wang et al. [28] proposed a hybrid method that uses sentiment analysis of reviews related to movies to improve a preliminary recommendation list obtained from the combination of CF and content-based methods. Gupta et al. [29] combine sentiment and semantic features in the LSTM model based on emotion detection. Salas-Zarate et al. [30] developed an ontology-based model to analyze sentiments in tweets related to diabetes datasets. Sharef et al. [31] discussed the use of sentiment analysis in the field of big data. Several other studies applied deep learning-based sentiment analysis in different domains, notably: finance [32], recommender systems for cloud services [5], etc. Pham et al. [11] used multiple layers of knowledge representation to analyze travel reviews and detect sentiment for five parameters, rating value, room, location, deadlines, and services.
3 Sentiment Analysis Process In the present work, we developed a sentiment analysis system based on ML and DL approaches which are considered as the most performant in the last decade. As it was explained in Sect. 1, the main objective is always how to classify opinions expressed by customers using short reviews to determine whether the reviews’ sentiment towards the movie is positive or negative. Our project can be divided into the following steps: 1. Downloading the used dataset, in our case we used the movie reviews dataset IMDB 2. Selecting the most important columns. 3. Performing some preprocessing tasks, including: cleaning spaces, removing punctuation, removing stopwords, removing links and non-characters letters, splitting texts and representing them by individual word vectors, then transforming them into their base form by stemming and lemmatization, converting all term vectors into numerical vectors by using a collection of coding, including binary coding, TF coding, TF-IDF coding, n-grams, embedding words, etc. 4. For the ML approach, we applied the following algorithms: LR, SVM, NB. 5. For the DL approach, we proposed a new model based on many hidden layers composed of simple neurons for each (will be detailed next). 6. Running a training task on 80% of samples of our dataset (the train set) to learn the selected ML algorithms, as well as the new DNN model.
S. Gadri et al.
Train set (80% of samples) Test set (20% of samples)
Reviews dataset Train vectors
Test vectors
Selecting the relevant values
240
Preprocessing Stage - Cleaning spaces -Removing punctuation -Removing stop-words. -Converting texts into lowercase letters. - Other tasks
Reviews dataset Train vectors
Test vectors
Training stage ML algorithms KNN, NB, LR, SVM,... DNN Model
Test stage ML algorithms KNN, NB, LR, SVM,... DNN Model
Model Accuracy
k-fold cross-validation technique
Fig. 1. Sentiment analysis process
7. Validating the ML algorithms and the DNN model on 20% of samples of our dataset (test set). For this purpose, we used also the k-fold cross-validation technique (usually k = 10) in order to determine the performance of the different ML algorithms and the DNN model. 8. For the programming stage, we used python combined with TensorFlow and Keras which offer soft APIs and many rich libraries for ML and DL. Figure 2 presents a detailed diagram representing the SA process in our system.
4 The Proposed Sentiment Analysis Model After applying some preprocessing tasks to prepare texts, we proceed to the development of our sentiment analysis model as follows: 1. First, we Applied many ML algorithms, including; Logistic Regression LR, Gaussian Naïve Bayes NB, Support Vector Machine SVM. For this purpose, we used the scikit-learn library of python containing the most known ML algorithms. 2. Designing a DNN model (Deep Neural Network): We proposed a DNN model composed of ten (06) full connected layers described as follows: Layer 1(750 neurons and expects 2 input variables: text, label), layer 2 (512 neurons), layer 3 (128 neurons), layer 4 (64 neurons), layer 5 (16 neurons), layer 6, or the output layer (2 neurons) to predict the class (1: Positive polarity, 0: Negative polarity) • The six (06) fully connected layers are defined using the Dense class of Keras which permits to specify the number of neurons in the layer as the first argument, the initialization method as the second argument, and the activation function using the activation argument. • We initialize the network weights to a small random number generated from a uniform distribution (‘Uniform‘) Or ‘normal’ for small random numbers, we use the rectifier (‘Relu’) on most layers and the sigmoid function in the output layer.
Sentiment Analysis: Developing an Efficient Model Based on Machine Learning
241
• We use a sigmoid function on the output layer to ensure our network output is between 0 and 1 and easy to map to either a probability of class 1 or 0. • We compile the model using the efficient numerical libraries of Keras under the covers (the so-called backend) such as TensorFlow. The backend automatically chooses the best way to represent the network for training and making predictions to run on your hardware (we have used CPU in our application). • When compiling, we must specify some additional properties required when training the network. We note that training a network means finding the best set of weights to make predictions for this problem. • When training the model, we must specify the loss function to evaluate a set of weights, the optimizer used to search through different weights of the network, and any optional metrics we would like to collect and report during training. Since our problem is a binary classification, we have used a logarithmic loss, which is defined in Keras as “binary_crossentropy“. • We will also use the efficient gradient descent algorithm “adam” because it is an efficient default. • Finally, since it is a classification problem, we report the classification accuracy as the performance metric. • Execute the model on some data. • We train or fit our model on our loaded data by calling the fit() function on the model, the training process will run for a fixed number of iterations through the dataset called epochs, which we must specify using the n-epochs argument. We can also set the number of instances that are evaluated before a weight update in the network is performed, called the batch size, and set using the batch_size argument. For our case, we fixed the following values: Nb-iter = 15, batch-size = 32. These are chosen experimentally by trial and reducing the error. • We trained our DNN on the entire dataset (training set) and evaluated its performance on a part of the same dataset (test set) using the evaluate () function. This will generate a prediction for each input and output pair and collect scores, including the average loss and any metrics you have configured, such as accuracy. Figure 2 shows the architecture and the characteristics of the proposed DNN model.
Input Layer
Hidden Hidden Layer 2 Layer 3
Hidden Layer 4
Hidden Layer 5
Hidden Layer 6
Fig. 2. Architecture of the proposed DNN model
242
S. Gadri et al.
5 Experimental Work 5.1
Used Dataset
In our experimentation, we used Movie Reviews Dataset (IMDB): which is one of the most popular movie dataset used for sentiment analysis classification. It contains a set of 50.000 highly polar reviews. It can be divided into two subsets: the train set, containing 40.000 movie reviews, and the test set containing 10.000 movie reviews for testing. The two subsets are presented in CSV format. IMDB data is available on many websites such as Kaggle. Each CSV file contains, two principal columns which are: Table 1. Description of the used dataset Field Text Label
5.2
Signification The text of the posted review The target variable or the class (positive/negative). Where: positive expresses a positive review about the movie, negative expresses a negative review
Programming Tools
Python: Python is currently one of the most popular languages for scientific applications. It has a high-level interactive nature and a rich collection of scientific libraries which lets it a good choice for algorithmic development and exploratory data analysis. It is increasingly used in academic establishments and also in industry. It contains a famous module called the scikit-learn tool integrating a large number of ML algorithms for supervised and unsupervised problems. Tensorflow: TensorFlow is a multipurpose open-source library for numerical computation using data flow graphs. It offers APIs for beginners and experts to develop for desktop, mobile, web, and cloud. TensorFlow can be used from many programming languages such as Python, C++, Java, Scala, R, and Runs on a variety of platforms including Unix, Windows, iOS, Android. Keras: Keras is the official high-level API of TensorFlow which is characterized by: Minimalist, highly modular neural networks librarys written in Python, Capable of running on top of either TensorFlow or Theano, Large adoption in the industry and research community, Easy production of models, Supports both convolutional networks and recurrent networks and combinations of the two, Runs seamlessly on CPU and GPU. 5.3
Evaluation
To validate the different ML algorithms, and obtain the best model, we have used the cross-validation method consisting in splitting our dataset into 10 parts, train on 9 and
Sentiment Analysis: Developing an Efficient Model Based on Machine Learning
243
test on 1, and repeat for all combinations of train/test splits. For the CNN model, we have used two parameters which are: loss value and accuracy metric. 1. Accuracy Metric: This is a ratio of the number of correctly predicted instances divided by the total number of instances in the dataset multiplied by 100 to give a percentage (e.g., 90% accurate). 2. Loss Value: used to optimize an ML algorithm or DL model. It must be calculated on training and validation datasets. Its simple interpretation is based on how well the ML algorithm or the DL built model is doing in these two datasets. It gives the sum of errors made for each example in the training or validation set. 5.4
The Obtained Results
As we explained in Sect. 3, after applying some preprocessing tasks to prepare texts, we proceed to the development of our sentiment analysis model using the classical ML and the DL approaches. In the present section, we illustrate the obtained results when executing the training and the testing steps on our model for the two approaches (Table 1). Table 2. The accuracy average after applying different ML algorithms Algorithm Accuracy BIN coding LR 54.53% NB 54.36% SVM 54.34%
TF-IDF coding 59.74% 57.34% 60.40%
BIN Vs TF-IDF 100 0 LR LDA KNN CART NB SVM TF coding
TF-IDF coding
Fig. 3. Binary coding Vs TF-IDF coding
The following Table 3 summarizes the obtained results when applying the DNN model:
244
S. Gadri et al. Table 3. Loss and accuracy values obtained when applying the proposed DNN model BIN coding TF-IDF coding Training set Loss: 0.048; Acc 98.69% Loss: 0.043; Acc: 98,84% Test set Loss: 0.3827; Acc: 100% Loss: 0.1082; Acc: 100%
a. Binary coding
b. TF-IDF coding
Fig. 4. Train.loss Vs Val.loss of the DNN model. a. Binary coding. b. TF-IDF coding
a. Binary coding
b. TF-IDF coding
Fig. 5. Train.Accuracy Vs Val.Accuracy of the DNN model.
6 Discussion The main advantage of our designed SA model is that it combines the classical ML and the DL approaches. In the first stage, after downloading the IMBD dataset and performing some preprocessing tasks on it, we apply some ML algorithms including LR, NB, SVM. Table 2 summarizes the obtained results when applying the ML algorithms for the two coding models, the binary model (Col 2) and TF-IDF model (Col 3). We observe the following: (a) The obtained accuracy is not high for all ML algorithms (relatively low 98%). We observe also, that there is not a clear improvement when changing the coding model from binary model to TF-IDF model. Figure 4a and Fig. 4b show the evolution of the training loss and the validation loss over time and in terms of the number of epochs (nb-epochs = 15; Batchsize = 32) for the two coding models. We observe that the value of loss function for both coding models is similar, and no clear improvement is marked here. Similarly, Fig. 5a and Fig. 5b plot the evolution of training accuracy and validation accuracy in terms of the number of epochs. For the two coding models, contrary to the loss function, the accuracy starts relatively low and ends very high (>98%). The value of the accuracy for the binary model and TF-IDF model is approximately the same and no clear improvement is marked. i.e., the coding model does not influence the performance of a DNN model.
7 Comparison Between ML and DL Approaches We concluded our study by establishing a comparison between ML and DL approaches. This comparison proves that the performance of the DNN model in terms of accuracy is always high whatever the used ML algorithm, which is shown in Table 4. Table 4. Comparison between ML and DL Approaches Algorithm
Accuracy BIN coding LR 54.53% NB 54.36% SVM 54.34% CNN Model 100%
TF-IDF coding 59.74% 57.34% 60.40% 100%
8 Conclusion and Future Suggestions In the present paper, we presented the different approaches used in the sentiment analysis field, especially the DL approach. We illustrated the well-known studies and researches done in this interesting area. We note that we have used in the experimental part two coding models: TF and TF-IDF to transform input data into numerical values before providing it to DL models and the well-known IMDB dataset. We also presented the detailed architecture of the proposed DNN by giving the different layers and the number of neurons in each layer. We conducted many experiments to evaluate the different ML algorithms as well as the proposed DNN model on datasets. The experiments performed here show that our model gives high accuracy for sentiment analysis detection which is not the case when applying ML algorithms. As perspectives for this work, we will focus our future studies on the following things:
246
S. Gadri et al.
• Using other coding models, such as embedding words Word2Vec, and N-grams • Exploiting other DL models, notably: CNN, RNN, LSTM, and other hybrid models combined with: TF, TF-IDF, word embedding, and n-grams techniques to improve the obtained accuracy. • We also plan to use other datasets and establish a wide comparison between them.
References 1. Dang, N.C., Moreno-García, M.N., De la Prieta, F.: Sentiment analysis based on deep learning: a comparative study. Electronics 9, 483 (2020). https://doi.org/10.3390/ electronics9030483. www.mdpi.com/journal/electronics 2. Yang, L., Li, Y., Wang, J., Sherratt, R.S.: Sentiment analysis for e-commerce product reviews in chinese based on sentiment lexicon and deep learning, IEEE Acces, 8 (2020). Digital Object Identifier https://doi.org/10.1109/ACCESS.2020.2969854 3. Intelligent Computing & Optimization, Conference Proceedings ICO 2018, Springer, Cham. https://doi.org/10.1007/978-3-030-00979-3. https://www.springer.com/gp/book/978303000 9786. ISBN 978-3-030-00978-6 4. Intelligent Computing and Optimization, Proceedings of the 2nd International Conference on Intelligent Computing and Optimization 2019 (ICO 2019), Springer International Publishing, ISBN 978-3-030-33585-4. https://www.springer.com/gp/book/9783030335847 5. Preethi, G., Krishna, P.V., Obaidat, M.S., Saritha, V., Yenduri, S.: Application of deep learning to sentiment analysis for recommender system on cloud. In: Proceedings of the 2017 International Conference on Computer, Information and Telecommunication Systems (CITS), Dalian, China, 21–23, 2017, pp. 93–97 (2017) 6. Soni, S., Shara, A.: Sentiment analysis of customer reviews based on hidden Markov model. In: Proceedings of the 2015 International Conference on Advanced Research in Computer Science Engineering & Technology (ICARCSET 2015), Unnao, India, 6 March 2015, pp. 1– 5 (2015) 7. Pinto, D., McCallum, A., Wei, X., Croft, W.B.: Table extraction using conditional random fields. In: Proceedings of the 26th International ACM SIGIR Conference on Research and Development in Informaion Retrieval, Toronto, Canada, 28 July–1 August 2003, pp. 235– 242 (2003) 8. Zhang, X., Zheng, X.: Comparison of text sentiment analysis based on machine learning. In: Proceedings of the 15th International Symposium on Parallel and Distributed Computing (ISPDC), Fuzhou, China, 8–10 July 2016, pp. 230–233 (2016) 9. Malik, V., Kumar, A.: Communication. sentiment analysis of twitter data using Naive Bayes algorithm. Int. J. Recent Innov. Trends Comput. Commun. 6, 120–125 (2018) 10. Firmino Alves, A.L., Baptista, C.d.S., Firmino, A.A., Oliveira, M.G.d., Paiva, A.C.D.: A comparison of SVM versus Naive-Bayes techniques for sentiment analysis in tweets: a case study with the 2013 FIFA confederations cup. In: Proceedings of the 20th Brazilian Symposium on Multimedia and theWeb, João Pessoa, Brazil, 18–21 November 2014, pp. 123–130 (2014) 11. Szegedy, C., et al.: Going deeper with convolutions. In: Computer Vision and Pattern Recognition, pp. 1–9 (2015) 12. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Sentiment Analysis: Developing an Efficient Model Based on Machine Learning
247
13. Girshick, R.: Fast R-CNN. In: IEEE International Conf.erence on Computer Vision, pp. 1440–1448 (2015) 14. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: International Conference on Neural Information Proceedings Systems, pp. 1097–1105 (2012) 15. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Comput. Sci. (2014) 16. Ren, S., Girshick, R., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017) 17. Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks (2016) 18. Tu, Y., Lin, Y., Wang, J., Kim, J.-U.: Semi-supervised learning with generative adversarial networks on digital signal modulation classification. Comput. Mater. Continua 55(2), 243254 (2018) 19. Wang, J., Gao, Y., Liu, W., Sangaiah, A.K., Kim, H.-J.: Energy efcient routing algorithm with mobile sink support for wireless sensor networks. Sensors 19(7), 1494 (2019) 20. Wang, J., Gao, Y., Wang, K., Sangaiah, A.K., Lim, S.-J.: An afnity propagation-based selfadaptive clustering method for wireless sensor networks. Sensors 19(11), 2579 (2019) 21. Tang, Z., Ding, X., Zhong, Y., Yang, L., Li, K.: A self-adaptive Bell-LaPadula model based on model training with historical access logs. IEEE Trans. Inf. Forensics Security 13(8), 20472061 (2018) 22. Gao, Y., Rong,W., Shen, Y., Xiong, Z.: Convolutional neural network based sentiment analysis using Adaboost combination. In: Proceedings of the 2016 International Joint Conference on Neural Networks (IJCNN), Vancouver, BC, Canada, 24–29 July 2016, pp. 1333–1338 (2016) 23. Singhal, P., Bhattacharyya, P.: Sentiment analysis and deep learning: a survey. Center for Indian Language Technology, Indian Institute of Technology: Bombay, Indian (2016) 24. Al-Sallab, A., Baly, R., Hajj, H., Shaban, K.B., El-Hajj, W., Badaro, G.: Aroma: a recursive deep learning model for opinion mining in Arabic as a low resource language. ACM Trans. Asian Low-Resour. Lang. Inf. Process. TALLIP 16, 1–20 (2017) 25. Jangid, H., Singhal, S., Shah, R.R., Zimmermann, R.: Aspect-based financial sentiment analysis using deep learning. In: Proceedings of the Companion of the the Web Conference 2018 on The Web Conference, Lyon, France, 23–27 April 2018, pp. 1961–1966 (2018) 26. Zhang, L., Wang, S., Liu, B.: Deep learning for sentiment analysis: a survey. WIREs Data Min. Knowl. Discov. 8, e1253 (2018) 27. Wu, C., Wu, F., Wu, S., Yuan, Z., Liu, J., Huang, Y.: Semi-supervised dimensional sentiment analysis with variational autoencoder. Knowl. Based Syst. 165, 30–39 (2019) 28. Wang, Y., Wang, M., Xu, W.: A sentiment-enhanced hybrid recommender system for movie recommendation: a big data analytics framework. Wire. Commun. Mob. Comput. (2018) 29. Gupta, U., Chatterjee, A., Srikanth, R., Agrawal, P.: A sentiment-and-semantics-based approach for emotion detection in textual conversations. arXiv 2017, arXiv:1707.06996 30. Salas-Zárate, M.P., Medina-Moreira, J., Lagos-Ortiz, K., Luna-Aveiga, H., RodriguezGarcia, M.A., Valencia-García, R.J.C.: Sentiment analysis on tweets about diabetes: an aspect-level approach. Comput. Math. Methods Med. 2017 (2017) 31. Sharef, N.M., Zin, H.M., Nadali, S.: Overview and future opportunities of sentiment analysis approaches for big data. JCS 12, 153–168 (2016) 32. Sohangir, S., Wang, D., Pomeranets, A., Khoshgoftaar, T.M.: Big data: deep learning for financial sentiment analysis. J. Big Data 5(1), 1–25 (2018). https://doi.org/10.1186/s40537017-0111-6
Improved Face Detection System Ratna Chakma, Juel Sikder(&), and Utpol Kanti Das Department of Computer Science and Engineering, Rangamati Science and Technology University, Rangamati, Bangladesh
Abstract. Biometric applications have been using face detection approaches for security purposes such as human–crowd surveillance, many security-related areas, and computer interaction. It is a crucial arena of recent research because there is no fixed system to find the faces in a test image. Face detection is challenging due to varying illumination conditions, pose variations, the complexity of noises, and image backgrounds. In this research, we present a system that can detect and recognize in the face by different pre-processing techniques; Viola-Jones process adds together Haar Cascade, GLCM & Gabor Filter and Support Vector Machine (SVM), is proposed for gaining better accuracy level in the detection of facial portions and recognition of faces. The proposed system has achieved better than other face detection and recognition systems. The experiment has done in a MATLAB environment on different images of the FEI Face, Georgia Tech faces, Faces95, Faces96, and MITCBCL databases. The experimental result achieves detection of faces that represent reasonable accuracy rates of an average of 98.32%. Keywords: Face detection
Face recognition Viola-Jones SVM
1 Introduction The digital image processing field was completed correctly to face related problems as multidimensional to get a better solution, particularly hosting various pictures on websites such as Picasa, Facebook, Twitter, Photo Bucket, etc. Supervised learningbased detection is a more helpful technique today [1]. Existing systems have attracted different fields, for spreading in actual-world conditions, for example, building safety, checking people's identities, trapping criminals, etc. In our proposed methodology, there are several reasons in [2–6] databases such as head scale and outlook variation, decoration variation, changing impulse, brightness variation, pose changing, environment variation, degree variation, accuracy and error problem, expression variation make face detection and face recognition systems fail to perform successfully. ViolaJones of object detection algorithms that used different techniques such as integral image, cascade, contrast stretching, GLCM, Gabor Filter, and SVM classifier showed better performance than previous techniques’ face recognition. In this study, facial part detection of Viola-Jones process added to dimension minimization of GLCM & Gabor Filter, Support Vector Machine (SVM) and Haar Cascade of machine learning methods is suggested for bringing modern outcomes from previous limitations of actions. This paper is divided into five sections: Section 1 presents an introduction of a short © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 248–257, 2022. https://doi.org/10.1007/978-3-030-93247-3_25
Improved Face Detection System
249
description. In Sect. 2, the face detection and recognition areas were completed into the past work that summarises the suggested improved face detection and recognition process in Sect. 3. The system result is presented and analyzed in Sect. 4. Finally, Sect. 5 offers conclusions and future research plans.
2 Literature Review The investigators [7] have proposed the identification process of the face utilizing Principal Component Analysis (PCA), Gabor Filters, and Artificial Neural Networks (ANN) techniques where associating Viola and Jones in the CMU database of the image containing different styles, exposure, and the revelation that was able to be diminished the detection of false acceptance errors and obtained an accuracy of 90.31% than a past time. This exploration [8] has introduced a training process for bringing fast execution and validity using FDDB dataset that dividing training images for detecting features of the face within two isolate networks was performed to fetch outcome utilizing Convolutional Neural Network (CNN). The authors [9] have proposed another face detection system utilizing Viola-Jones, where detecting facial portions from various expressions had gotten 92% accuracy using the Bao database. In [10], the researchers suggested a face recognition system using the machine learning method in the ORL database where PCA and LDA presented 97% and 100% exactness among two parts of aspects. In this research [11], PCA and 2DPCA had applied for getting recognition performance of face using ORL and YALE database, 2DPCA is powerful rather than PCA that works to extract face characteristics whereas gaining fast computational time. A hash-type system for human face recognition has moved by the quintet triple binary pattern (QTBP) [12]. Utilizing alignment of SVM and KNN, they gained adequate recognition achievement in AT&T, Face94, CIE, AR, and LFW databases rather than old activities that minimize high computational complexity time.
3 Methodology The block diagram of the proposed system is shown in Fig. 1. 3.1
Input Image
First, facial test images are taken from FEI Face, Georgia Tech faces, Faces95, Faces96, and MITCBCL databases. Image size would be 640 480 pixels for FEI face database and Georgia Tech face database, 180 by 200 pixels for Faces95 database, 196 196 pixels for Faces96 database, and different dimensions for MITCBCL face database. These databases are performed for evaluation by the proposed idea.
250
R. Chakma et al.
Input Image
Contrast Stretching
Extract Facial Part
Extract Features
SVM Classifier
Feature Database
Identified Face
Fig. 1. Block diagram of the proposed system
3.2
Contrast Stretching
Sometimes input face images may contain noise and maybe blurred in some regions because of technical errors while capturing the image. So, to remove that unwanted noise and blurred portions in the system used partial contrast stretching [13]. By optimal use of available colors and other characteristics, partial contrast stretching strengthens the input image as much as possible. It intensifies the visual outlook, ameliorates the signal-to-noise proportion, smoothes the region’s inner portion preserving its boundaries, eradicates the noise and undesired parts not concerned with the intended bit [14]. 3.3
Extract Facial Part
Three strategies are considered for detecting human face portions utilizing [9] ViolaJones calculation: (1) An integrated image is identified by Haar-like features to extract the facial features that are the shape of rectangular, (2) Machine learning strategy is used for structures of subset selection among all accessible structures by Ada boost algorithm, (3) Combination of numerous features are capably operated on the cascade classifier that is fixed on resulting of the various filters. The facial parts detection process is shown below in Fig. 2. Here, taking the original image from any database that was processed to detect human face by face detection algorithm existing various poses. Then searching face parts in the detected image by the object detection algorithm. The nose detection process is applied to encode an image by Haar features that exist in poorly classifiers that are cropped and detected using the bounding box. The mouth detection process [15] is encoded in the mouth region that is consists of weak classifiers for detectors mouse utilizing Haar features. The eye-detection process is
Improved Face Detection System
251
Fig. 2. Detection of facial parts process
carried out by the left eye and suitable eye detector using the Viola-Jones process and searching that on eye areas in a face to detect the left eye and right eye [9]. The detection of face plays an important role using the Cascade classifier [9]. The number of black pixels was divided by the number of white pixels for every result attribute. Haar features are used quickly to detect human faces that are similar to rectangle features as follows: qðzÞ ¼
Number of black rectangle Number of white rectangle
ð1Þ
Machine learning uses the Adaboost algorithm, a robust classifier less powerful for linear combination, as displayed in Eq. (2). SðzÞ ¼ n1S1ðzÞ þ n2S2ðzÞ þ :::::
ð2Þ
The human databases of each image are crossed every place utilizing cascade process which is considered a face and differently, does not work to detect area of the face. 3.4
Extract Features
To extract feature vectors from detected facial parts, GLCM & Gabor Filter are used. The GLCM functions represent texture features of an image. After seeing facial characteristics, the significant features are extracted. To determine the meaning of a given trial, extracted features are used. The system used the following methods for
252
R. Chakma et al.
feature extraction. Textures are the features whose focus is on the distinct pixels that create an image. The proposed method describes texture features into mainly two types such as signal processing and statistical. Statistical type contains GLCM, grey-level histogram, run-length matrices, and auto-correlation features for texture extraction. GLCM procedures extract 2nd-order statistical texture features [17]. Textural features include entropy, correlation, contrast, homogeneity, and energy. The texture is particularly appropriate for this study because of its possessions. The system also used Gabor filter features. The Gabor filters consist of standard deviation, orientation, and the radial center frequency [18]. The method combined the GLCM and Gabor filters into one feature set. This combination of the Gabor filters and the GLCM feature generates a better outcome on the face dataset. 3.5
SVM Classifier
The proposed system used SVM to identify the face. The extracted features of all database faces are stocked in the feature database. The SVM classifier calculates the feature number of database images and the feature value of the input facial parts; based on these values; the classifier will separate the input image from feature databases. The Support Vector Machine is supplied as a separator. SVM will compare the test sample feature set to all training samples and select the shortest distance. Support vector machines were initially calculated for binary classification. The SVM is learned by features given as an input to its training procedure. During training, the SVM identifies the appropriate boundaries in the feature databases. Using received features, the SVM starts classification steps. The SVM classifier classifies the input test face using extracted features compared with the feature database and identifies the desired person from the face database [19, 20].
4 Result and Analysis FEI face database [2] has 200 subjects where 14 image poses per subject and counting of entire images 2800. Every image resolution was defined by 640 480 pixels and presented a bright, front, and vertical idea with the white type of the same environment revolving 180 dimensions, scale variation 10%, outlook variation, changing the hairstyle, and decoration variation. Georgia tech face database [3] has 50 individuals where each image remains 15 RGB image poses per subject. Few attributes have existed: mutation of brightening, a different expression of face, out-look variation, changing impulse, and scale. Faces95 database [4] has 72 persons that image resolution existed by 180 by 200 pixels. Some features have stayed: image is the red type of screen condition, changing image position as the front because of flaming. Faces96 database [5] has 152 persons whose image resolution consists of 196 by 196 pixels. MIT-CBCL face database [6] contains ten persons. The characteristics are the front position of the face, high-quality image pixel, brightness variation, pose changing, environment variation, and revolving 30 degrees. We have utilized 120, 40, 30, 41, 10 testing images whereas taking 11, 11, 11, 11, 4 poses each person and 1320, 440, 330, 451, 40 training images for our research according to FEI face database, Georgia tech face
Improved Face Detection System
253
Table 1. Comparative analysis of other methods to our proposed method Ref.
Methodological approach
Database
[7]
Viola and Jones Method Artificial Neural Networks (ANN) Gabor Filters Principal Component Analysis (PCA) Convolutional Neural Network (CNN) Viola-Jones Algorithm Machine learning Principal Component Analysis (PCA) Linear Discriminant Analysis (LDA) Support vector machine Naïve Bayes Multilayer Perceptron PCA + LDA (Configuration B) Principal Component Analysis (PCA) Eigenface Support Vector Machine (SVM) Multiclass SVM Principal Component Analysis (PCA) Deep learning Convolutional Neural Network (CNN) SIAMESE network Viola-Jones Algorithm with Haar Cascade, GLCM & Gabor Filter and Support Vector Machine (SVM)
CMU (Carnegie Mellon University) database
[8] [9] [10]
[21] [22]
[23]
Proposed Method
Performance (%) 90.31%
FDDB database Bao database ORL database
88.9% 92% 97%
ORL database Face94 database YALE database
92.5% 92.10% 84%
ORL database LFW database
91% 81%
FEI Face, Georgia Tech face, Faces95, Faces96 and MIT-CBCL
98.32%
database, Faces95 database, Faces96 database, and MIT-CBCL face database respectively. Experimental result-1 shows detected and recognized images in different poses for the FEI face database, the Georgia tech face database, and the MIT-CBCL face database are shown below in Fig. 3, 4, and Fig. 5, respectively. In this research, facial parts detection and recognition have been done with FEI Face, Georgia Tech face, Faces95, Faces96 and MITCBCL face databases, whose descriptions have been given
Fig. 3. Experimental Result-1 (FEI Face Database)
254
R. Chakma et al.
Fig. 4. Experimental Result-2 (Georgia Tech Face Database)
Fig. 5. Experimental Result-3 (MIT-CBCL Face Database)
in Table 1. The accuracy finding of a simple equation for detection and recognition have shown below (Table 2): Accurancy Rate ¼ 100
FAR þ FRR 2
ð3Þ
Table 2. Detection and recognition performance Database FEI Face Georgia Tech face Faces95 Faces96 MIT-CBCL face
False reject rate% False accept rate% Accuracy rate% 2.14 1.13 98.37% 2.18 2.11 97.86% 3.41% 0.00% 98.30% 4.72% 0.00% 97.64% 1.11% 0.00% 99.45%
The execution of the detection and recognition rate in each database is shown in Fig. 6.
Improved Face Detection System
255
Fig. 6. Accuracy graph
5 Conclusion and Future Works In this research, the proposed improved face detection system has been tested FEI Face, Georgia Tech face, Faces95, Faces96, and MIT-CBCL databases using pre-processing techniques, Viola-Jones algorithm with Haar Cascade machine learning method, GLCM & Gabor Filter and Support Vector Machine (SVM). The experimental result achieves detection of facial portions and recognition faces from various human expressions that represent reasonable accuracy rates of 98.37%, 97.86%, 98.30%, 97.64%, and 99.45% from FEI Face, Georgia Tech face, Faces95, Faces96, and MITCBCL databases respectively. We plan to introduce better segmentation techniques, better feature extraction methods, and classification algorithms for future work. We also apply our approach in other domains of interest, such as different facial parts from stream videos with complex backgrounds, medical image analysis, and satellite image analysis.
References 1. Sikder, J., Das, U.K., Chakma, R.J.: Supervised learning-based cancer detection. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 12(5) (2021). https://doi.org/10.14569/IJACSA.2021.0120 5101 2. https://fei.edu.br/cet/facedatabase.html 3. http://www.anefian.com/research/facereco.htm
256 4. 5. 6. 7.
8.
9.
10.
11.
12.
13. 14.
15.
16.
17. 18. 19.
20.
21.
R. Chakma et al. Libor Spacek’s Facial Images Databases. http://cmp.felk.cvut.cz/spacelib/faces/faces95.html Libor Spacek’s Facial Images Databases. http://cmp.felk.cvut.cz/spacelib/faces/faces96.html http://cbcl.mit.edu/software-datasets/heisele/facerecognition-database.html Da'San, M., Alqudah, A., Debeir, O.: Face detection using viola and jones method and neural networks. In: 2015 International Conference on Information and Communication Technology Research (ICTRC 2015). IEEE (2015) Triantafyllidou, D., Tefas, A.: Face detection based on deep convolutional neural networks exploiting incremental facial part learning. In: 23rd International Conference on Pattern Recognition (ICPR), Cancun Center, Cancun, Mexico, 4–8 December 2016 (2016) Vikram, K., Padmavathi, S.: Facial parts detection using Viola Jones algorithm. In: 2017 International Conference on Advanced Computing and Communication Systems (ICACCS 2015), Coimbatore, India, 06–07 January 2017 (2017) Sharma, S., Bhatt, M., Sharma, P.: Face recognition system using machine learning algorithm. In: Proceedings of the Fifth International Conference on Communication and Electronics Systems (ICCES 2020), IEEE Conference Record # 48766; IEEE Xplore (2020). ISBN 978-1-7281-5371-1 Dandpat, S.K., Meher, S.: Performance improvement for face recognition using PCA and two-dimensional PCA. In: 2013 International Conference on Computer Communication and Informatics (ICCCI 2013), Coimbatore, India, 04–06 January 2013 (2013) Tuncer, T., Dogan, S., Abdar, M., Pławiak, P.: A novel facial image recognition method based on perceptual hash using quintet triple binary pattern. Multimedia Tools Appl. 79(39), 29573–29593 (2020) Das, U.K., Sikder, J., Salma, U., Anwar, A.S.: Intelligent cancer detection system. In: 2021 International Conference on Intelligent Technologies (CONIT), pp. 1–6. IEEE, June 2021 Sikder, J., Das, U.K., Anwar, A.M.S.: Cancer cell segmentation based on unsupervised clustering and deep learning. In: Vasant, P., Zelinka, I., Weber, G.-W. (eds.) ICO 2020. AISC, vol. 1324, pp. 607–620. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-68154-8_53 Sikder, J., Chakma, R., Chakma, R.J., Das, U.K.: Intelligent face detection and recognition system. In: 2021 International Conference on Intelligent Technologies (CONIT), pp. 1–5. IEEE, June 2021 El Maghraby, A., Abdalla, M., Enany, O., El, M.Y.: Detect and analyze face parts information using Viola-Jones and geometric approaches. Int. J. Comput. Appl. 101(3), 23– 28 (2014) Mohanaiah, P., Sathyanarayana, P., GuruKumar, L.: Image texture feature extraction using GLCM approach. Int. J. Sci. Res. Publ. 3(5), 1–5 (2013) Li, W., Qian, D.: Gabor-filtering-based nearest regularized subspace for hyperspectral image classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 7(4), 1012–1022 (2014) Mahmud, T., Sikder, J., Chakma, R.J., Fardoush, J.: Fabric defect detection system. In: Vasant, P., Zelinka, I., Weber, G.-W. (eds.) ICO 2020. AISC, vol. 1324, pp. 788–800. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-68154-8_68 Sikder, J., Sarek, K.I., Das, U.K.: Fish disease detection system: a case study of freshwater fishes of Bangladesh. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 12(6) (2021). https://doi. org/10.14569/IJACSA.2021.01206100 Matin, A., Mahmud, F.: Recognition of an individual using the unique features of human face. In: 2016 IEEE International WIE Conference on Electrical and Computer Engineering (WIECON-ECE), AISSMS, Pune, India, 19–21 December 2016 (2016)
Improved Face Detection System
257
22. Sani, M.M., Ishak, K.A., Samad, S.A.: Evaluation of face recognition system using support vector machine. In: Proceedings of 2019 IEEE Student Conference on Research and Development (SCOReD 2009), UPM Serdang, Malaysia, 16–18 November 2009 (2009) 23. Wang, W., Yang, J., Xiao, J., Li, S., Zhou, D.: Face Recognition based on deep learning. In: Zu, Q., Hu, Bo., Gu, N., Seng, S. (eds.) HCC 2014. LNCS, vol. 8944, pp. 812–820. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-15554-8_73
Paddy Price Prediction in the South-Western Region of Bangladesh Juliet Polok Sarkar1(B) , M. Raihan1 , Avijit Biswas2 , Khandkar Asif Hossain1 , Keya Sarder1 , Nilanjana Majumder1 , Suriya Sultana1 , and Kajal Sana1 2
1 North Western University, Khulna 9100, Bangladesh Bangabandhu Sheikh Mujibur Rahman Science and Technology University, Gopalganj 8100, Bangladesh
Abstract. In the current scenario, farmers are losing a lot of profit due to price fluctuations caused by climatic change and other price influencing factors. Farmers are affected emotionally and financially as a result of this. Price forecasting may aid the agriculture supply chain in making critical decisions to reduce and manage the risk of price fluctuations. Predictive analysis is supposed to solve the problems as a result of reduced agricultural productivity due to uncertain climatic conditions, global warming, and other factors. This research focuses on identifying appropriate data models that aid in achieving high price forecast accuracy and generality. Our dataset’s class imbalance was reduced using SMOTE. However, SMOTE was not particularly beneficial because our dataset only comprised data from three districts in the Khulna division. We used Linear Regression, a Machine Learning Classification method, to predict the price of the crop. To compare the prediction results, we used a Neural Network. The data for forecasting paddy prices was originally collected by visiting local farmers in Bangladesh’s Khulna Division. There are 154 instances in this dataset, each with its own set of 10 unique attributes, such as division name, district name, sub-district name, market name, pricing type, crop price, crop quantity, date, crop category, and crop name. The models we built have an RMSE value of 114.48 and MAE value of 80.08 for Linear Regression, while, for Neural Network we got the lowest RMSE value of 338.2241 and MAE value of 293.1295. Thus, it is concluded that the Linear Regression model performed better and there is still potential for improvement. Keywords: Machine learning · Predictive analysis · Neural networks Price forecasting · Price prediction · Linear regression · Prediction
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 258–267, 2022. https://doi.org/10.1007/978-3-030-93247-3_26
·
Paddy Price Prediction in the South-Western Region of Bangladesh
1
259
Introduction
More than half of Bangladesh’s 15 million hectares of the total land is used for agriculture. In our nation, agriculture is the most important economic pillar. Agriculture is the primary source of income for the majority of families. Agriculture accounts for the majority of the country’s gross growth. To meet the needs of the country’s people, 60% of the land is used for agriculture. Modernization of agricultural practices is expected to meet the requirements. As a result, the farmers and the country’s economies are expected to expand. Prices have a big influence on productivity, consumption, and government policies. Prices influence farmers’ production decisions and consumers’ purchasing decisions to a significant degree. Many variables, such as market support and stabilization steps, product export and import, and so on, influence the price of a commodity. Agricultural product prices are much more volatile than nonagricultural products and services. Since food accounts for about 66% of consumer spending, 45% of which is spent on rice (BBS, 1991), and rice covers 70% of cropped land (BBS, 1986), the analysis of rice price is crucial for farmers, merchants, buyers, and the government. Rice prices, which are influenced by a variety of variables, are extremely difficult to predict. Aus, Aman, and Boro are three of the most common crop varieties. Crops in those groups are grown on various time schedules during the year. For Aus, Aman, and Boro, the cultivation weathers are hot and humid (March to June), moist and rainy (July to October), and cold and dry (November to December). Different districts in Bangladesh have different climates, so environmental factors specific to these areas must be considered. This will aid in the selection of the best districts for the cultivation of various crops. This paper attempted to evaluate the changes in crop prices over time, as well as their growth in location, yield, and output, and to calculate the magnitude of annual price fluctuation and measure the degree of volatility to identify the riskiness of selected crops in comparison to other crops. The data produced could aid farmers in deciding how to best distribute their limited resources among less risky crops. Many researchers have focused on supply response (price-supply relationship) studies, i.e. actual price shifts and relationships in corresponding areas, but there has never been any work on price flexibility to understand how much the current quantity harvested affects post-harvest prices in the markets. The principle of price flexibility is especially important for agricultural products. Machine learning has become a vital prediction technique in recent years, thanks to the rising trend toward Big Data, it can forecast prices more correctly based on their features, independent of the previous year’s data. In this study, We tried to use machine learning and mathematical methods to predict paddy prices to help out the farmers to determine the allowable cost of their crop production and plan accordingly. In this study, data is acquired from local farmers of the south-western part of Bangladesh, Khulna division to be exact; therefore, this dataset is distorted. The majority of real-world datasets are skewed. Researchers have suggested strategies for managing unbalanced data at both the data and algorithmic levels, however,
260
J. P. Sarkar et al.
the SMOTE methodology utilized in studies has proved to perform better in the literature. We presented a logistic regression model based on the SMOTE dataset rebalancing method and other approaches in this research. SMOTE was employed to correct for class imbalance in our dataset. Considering our dataset only included data from three districts throughout the Khulna division, SMOTE proved ineffective, and the logistic regression model could not achieve high accuracy. So instead of utilizing SMOTE, We utilized Linear Regression and a Neural Network Model. Our model gives a low mean absolute error but a high root mean squared error value and thus the accuracy of the price prediction is somewhat mediocre but we plan on improving the model in the future. We obtained the lowest Root Mean Squared Error value 338.2241 and the lowest Mean Absolute Error value 293.1295 using Neural Network and the lowest Root Mean Squared Error value 114.48, Mean Absolute Error value 80.08, Median Absolute Error value 41.87, explained variance score value 0.92, and R2 score value 0.92 using Linear Regression. The remainder of the paper is laid out as follows. A brief review of recent applications of analytics and paddy price forecasting is provided in Sect. 2. The proposed techniques are discussed in Sect. 3. The experimental setup and findings are presented in Sect. 4, and the paper is concluded in Sect. 5.
2
Related Works
Machine learning and prediction algorithms such as Logistic Regression, Decision Trees, XGBoost, Neural Nets, and Clustering were used to identify and process the pattern among data to predict the crop’s target price. When compared to all other algorithms, it was found that XGBoost predicts the target better [1]. Rachana et al. proposed a forecasting model based on machine learning techniques to predict crop price using the Naive Bayes Algorithm and crop benefit using the K Nearest Neighbour technique. The assumptions are separate from and unrelated to other factors that can be used to forecast prices [2]. R Manjula et al. have suggested using machine learning technology to forecast crop prices. It briefly discussed how to use four algorithms: SVM (Support Vector Machine), MLR (Multiple Linear Regression), Neural Network, and Bayesian Network, as well as some examples of how they’ve been used in the past. The dataset consisted of 21,000 homes, which were split into training and testing data in an 80:20 ratio. They discovered that a linear model has a high bias (underfit), while a model with a high model complexity has a high variance (overfit). As a consequence of integrating the above models, the desired result can be obtained [3]. A group of academics created a hybrid model that combines the effects of multiple linear regression (MLR), an auto-regressive integrated moving average (ARIMA), and Holt-Winters models for better forecasts. The suggested approach is tested for the Iberian power market data collection by forecasting the hourly day-ahead spot price with dataset periods of 7, 14, 30, 90, and 180 days. The results reveal that the hybrid model beats the benchmark models and delivers promising outcomes in the vast majority of research settings [4]. Similarly,
Paddy Price Prediction in the South-Western Region of Bangladesh
261
another research group proposed a predictive model using three different types of Machine Learning models, namely Random Forest, XGBoost, and LightGBM, as well as two machine learning techniques, Hybrid Regression and Stacked Generalization Regression, to find the best solutions. They used the “Housing Price in Beijing” dataset, which contains over 300,000 data points and 26 variables that reflect housing prices exchanged between 2009 and 2018. These variables, which acted as dataset features, were then used to forecast each house’s average price per square meter [5]. Rohith used machine learning techniques and the support vector regression Algorithm to perform a study to determine crop price. A decision tree regression machine-learning regression technique was implemented, in which features of an object are observed and a model is trained in the structure of a tree to predict future data and generate meaningful continuous output [6]. Furthermore, a group of researchers proposed prediction models based on time-series and machine learning architectures such as SARIMA, HoltWinter’s Seasonal method, and LSTM and analyzed their operation using RMSE value. They found that the LSTM model turned out to achieve the best performance [7]. A prediction model integrated with the fuzzy information granulation, MEA, and SVM was proposed by another research team. It was concluded that the MEA-SVM model obtained greater prediction accuracy [8]. Helin Yin et al. proposed a hybrid STL-ATTLSTM model combined with two benchmarked models the STL and LSTM mechanism and compared their prediction performances. They discovered that the best performance was gained by the STL-ATTLSTM model [9]. Likewise, a deep neural network method was proposed which later was compared with a linear regression model and a traditional artificial neural network and their performances were evaluated and inaugurated by Gregory D. Merkel et al. They observed that the proposed DNN surpassed the ANN by 9.83% WMAPE [10].
3
Methodology
Figure 1 depicts the overall workflow of our research. Our research has been divided into four categories. They are, indeed. – – – –
Information gathering Synthetic Minority Oversampling Technique Data Mining for Exploratory Purposes Relationships: Linear vs. Non-Linear
3.1
Information Gathering
We visited local farmers in Bangladesh’s Khulna Division to collect paddy price data for the forecast. There are a total of 154 instances in this dataset, each with its own set of 10 unique features such as division name, district name, sub-district name, market name, price type, crop price, date, and crop id. Our dataset contains basic information on various types of paddy prices in the Khulna Divisions’ different sub-districts and districts.
262
J. P. Sarkar et al.
Start Import Dataset with 8 Features Data Processing Dataset Training
Neural Network
Linear Regression
Calculate MSE
Calculate MSE
Compare Performance
End
Fig. 1. Work-flow of the study
3.2
Synthetic Minority Oversampling Technique (SMOTE)
SMOTE is an oversampling approach introduced by [11] to avoid a drop in classifier performance due to dataset class imbalance. In contrast to typical over-sampling approaches, SMOTE produces new instances from minor classes “synthetically,” rather than reproducing them. It works in feature space rather than data space, assuming that a minority class instance and its nearest vector have the same class value. Each instance is treated as a vector in SMOTE, and synthetic samples are created at random along the line separating the minority sample from its nearest neighbor. To make the produced instances comparable to the original minority class instances, they are assigned based on the characteristics of the original dataset [12].
Paddy Price Prediction in the South-Western Region of Bangladesh
3.3
263
Data Mining for Exploratory Purposes
We first plotted our results, looking for linear relationships and thinking about dimensionality reduction. Specifically, the problem of multicollinearity, which can increase the explainability of our model while reducing its overall robustness. Then, to gain more insight, we built a correlation heatmap. We were able to see right away if there were any linear relationships in the data concerning each function thanks to the correlation heatmap we developed. 3.4
Relationships: Linear vs. Non-linear
A linear regression model is built on the assumption of linear relationships, but a neural network can detect non-linear relationships. A positive linear relationship, a negative linear relationship, and a non-linear relationship are all depicted in the graph below. Linear Regression: The relationship between the independent and dependent variables is discovered using regression analysis. In Linear Regression, the outcome variable is a numerical value. In Logistic Regression, the outcome variable is a categorical value. To get the smallest mistake, we fitted the best straight line. We weighted every function in every observation in our regression model and calculated the error against the observed performance. We used Python to create a linear regression and examined the findings inside this dataset. The root mean squared error was used to compare the two models. Neural Network: When data is distributed and a straight line cannot be used to bisect it, neural networks are used to club the typical data using circles and ellipses. With a simple sequential neural network, we’ve achieved the same results. As a consequence of matrix operations, a sequential neural network is simply a series of linear combinations. However, there is a non-linear variable in the form of an activation function that enables non-linear relationships to be defined. We’ll use ReLU as our activation function in this example. Since we haven’t normalized or standardized our data, this is a linear function, and tanh would be useless to us. (Again, based on our results, this was another aspect that had to be chosen on a case-by-case basis.)
4
Experimented Results and Discussions
The SMOTE method was used on our dataset to fix the issues associated with unbalanced occurrences within our dataset. However, as we can see from Table 1, SMOTE was not particularly beneficial because our dataset only comprised data from three districts in the Khulna division. Therefore, we used Linear Regression, a Machine Learning Classification method, to predict the price of the crop. We have prepared a comparison table for our research work to best evaluate our work (Table 2).
264
J. P. Sarkar et al. Table 1. Accuracy comparison with and without SMOTE Attribute Without SMOTE Without SMOTE Accuracy 0.99
0.82
f1-Score
0.99
0.90
Precision 0.99
0.99
Recall
0.82
1.00
Table 2. Comparison with other existing systems Reference number
Algorithm name
RMSE
[1]
XGBoost
56.50
[3]
Simple, Polynomial, Multivariate Regression 16,545,470
[7]
LSTM
7.27
[9]
STL-ATTLSTM
380
Our research study Linear Regression Neural Network
4.1
114.48 338.2241
Neural Network
Figure (a) depicts the RMSE values of several models of neural networks. We got Root Mean Squared Error value 790.7305 for adam optimizer and 2 hidden layers, RMSE value 344.6472 for adam optimizer and 3 hidden layers, RMSE value 338.2241 for adam optimizer and 4 hidden layers, RMSE value 350.9243 for adam optimizer, and 5 hidden layers. Figure (b) depicts the MAE values of several models of neural networks. We got a Mean Absolute Error value of 645.6945 for adam optimizer and 2 hidden layers, MAE value 307.1933 for adam optimizer and 3 hidden layers, MAE value 473.4105 for adam optimizer and 4 hidden layers, MAE value 293.1295 for adam optimizer, and 5 hidden layers.
(a) Neural Network RMSE
Paddy Price Prediction in the South-Western Region of Bangladesh
265
(b) Neural Network MAE
4.2
Linear Regression
Figure (a) depicts the error values of both RMSE and MAE values of the linear regression model. We got Root Mean Squared Error value 114.48, Mean Absolute Error value 80.08, Median absolute error value 41.87, Explain variance score value 0.92, and R2 score value 0.92. Figure (c) depicts the partial pair plot of the features used in the model. Figure (b) depicts the heatmap of the features used in our model for training.
266
J. P. Sarkar et al.
(c) Pair Plot
5
Conclusion
It was decided that this study should be conducted since Bangladesh is a significant paddy producer. As paddy prices fluctuate, farmers, traders, and consumers who are involved in the production, selling, and consumption of paddy are exposed to risk. Because of this, it is necessary to predict the price of paddy. We surveyed several districts, sub-district, unions to predict the actual price of paddy. We also added local markets for predictions of prices. SMOTE was used to balance out the classes in our dataset. SMOTE wasn’t very beneficial because our dataset only comprised data from three districts in the Khulna division. We obtained the lowest Root Mean Squared Error value 338.2241 and the lowest Mean Absolute Error value 293.1295 using Neural Network and the lowest Root Mean Squared Error value 114.48, Mean Absolute Error value 80.08, Median Absolute Error value 41.87, explained variance score value 0.92, and R2 score value 0.92 using Linear Regression. The outcome of the prediction demonstrates that using a Neural Network is not a smart option, as Linear Regression is more efficient and faster. Because all attributes do not have the same balance in all regions, predicting the crop price is challenging. Only a few features, such as location, date, crop amount, and so on, are included in the dataset we acquired from local farmers, which is insufficient for higher accuracy. Other efficient models could be built in the future by incorporating more parameters, such as weather, profit and loss statements, and so on, for potentially better outcomes.
Paddy Price Prediction in the South-Western Region of Bangladesh
267
References 1. Samuel, P., Sahithi, B., Saheli, T., Ramanika, D., Kumar, N.A.: Crop price prediction system using machine learning algorithms. Quest J. Softw. Eng. Simul. 6(1), 14–20 (2020) 2. Rachana, P., Rashmi, G., Shravani, D., Shruthi, N., Kousar, R.S.: Crop price forecasting system using supervised machine learning algorithms. Int. Res. J. Eng. Technol. (IRJET) 6, 4805–4807 (2019) 3. Manjula, R., Jain, S., Srivastava, S., Kher, P.R.: Real estate value prediction using multivariate regression models. In: IOP Conference Series: Materials Science and Engineering. vol. 263, p. 042098. IOP Publishing (2017) 4. Bissing, D., Klein, M.T., Chinnathambi, R.A., Selvaraj, D.F., Ranganathan, P.: A hybrid regression model for day-ahead energy price forecasting. IEEE Access 7, 36833–36842 (2019) 5. Truong, Q., Nguyen, M., Dang, H., Mei, B.: Housing price prediction via improved machine learning techniques. Procedia Comput. Sci. 174, 433–442 (2020) 6. Rohith, R., Vishnu, R., Kishore, A., Deeban, C.: Crop price prediction and forecasting System using supervised machine learning algorithms. Int. J. Adv. Res. Comput. Commun. Eng. 9(3), 27–29 (2020). https://doi.org/10.17148/IJARCCE. 2020.9306 7. Sabu, K.M., Kumar, T.M.: Predictive analytics in agriculture: forecasting prices of Arecanuts in Kerala. Procedia Comput. Sci. 171, 699–708 (2020) 8. Zhang, Y., Na, S.: A novel agricultural commodity price forecasting model based on fuzzy information granulation and MEA-SVM model. Math. Probl. Eng. 2018 (2018). https://doi.org/10.1155/2018/2540681 9. Yin, H., Jin, D., Gu, Y.H., Park, C.J., Han, S.K., Yoo, S.J.: STL-ATTLSTM: vegetable price forecasting using STL and attention mechanism-based LSTM. Agriculture 10(12), 612 (2020) 10. Merkel, G.D., Povinelli, R.J., Brown, R.H.: Short-term load forecasting of natural gas with deep neural network regression. Energies 11(8), 2008 (2018) 11. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) 12. Chawla, N.V.: Data mining for imbalanced datasets: an overview. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 875–886. Springer, Boston (2009). https://doi.org/10.1007/978-0-387-09823-4 45
Paddy Disease Prediction Using Convolutional Neural Network Khandkar Asif Hossain1(B) , M. Raihan1 , Avijit Biswas2 , Juliet Polok Sarkar1 , Suriya Sultana1 , Kajal Sana1 , Keya Sarder1 , and Nilanjana Majumder1 1
2
North Western University, Khulna 9100, Bangladesh [email protected] Bangabandhu Sheikh Mujibur Rahman Science and Technology University, Gopalganj 8100, Bangladesh
Abstract. The economy of Bangladesh depends on the agricultural development of the country. In any year, the loss of crops will affect the gross economy of the country. So, during the cultivation of the crops, the farmers need to pay attention to the growth of the crops. But, the crops get infected with different diseases such as Blight and Spot even after the farmers pay attention to cultivation. So, it is required to detect the diseases and provide appropriate measures as soon as possible. Image capturing, pre-processing, segmentation, feature extraction, and classification are all processes in the disease detection process. The methods for detecting plant diseases, mostly utilizing photographs of their leaves, were described in this paper. Method of Convolutional Neural Networks (CNN) predicts the disease of paddy using SGD optimizer with a learning rate of 0.00004 and two hidden layers which is 73.33% accurate with disease images as input. Keywords: Machine learning · Supervised learning · CNN · Classification · Image processing · Crop disease · Prediction
1
Introduction
Bangladesh is an agricultural nation. According to a National Agricultural Census report and a World Bank collection of growth indicators for 2020, 16.5 million farmer families reside in Bangladesh and 37.75% of the population works in agriculture. Rice is Bangladesh’s key food harvest, and is estimated that 75% of agricultural land is used for rice cultivation, with 28% of GDP in 2020. Bangladesh, India, China, Pakistan, Thailand, Burma, Japan, and Indonesia are the best places to develop Rice. It comes in a wide variety in our region. Aus, Aman, and Boro are the main ones to notice. In Bangladesh, an additional form of rice known as “IRRI” grows well. Rice provides us with puffed rice, popcorn, rice cake, cloths, paper, wine, and other products. We use straw as a fuel source and to build straw huts. Above all, we must ensure that rice is produced properly. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 268–276, 2022. https://doi.org/10.1007/978-3-030-93247-3_27
Paddy Disease Prediction Using Convolutional Neural Network
269
However, rice leaf diseases reduce yield. There are a variety of rice leaf diseases, but due to the prevalence of these diseases in Bangladesh, we have identified two diseases blight and spot in this paper. As a result of fungal infection, bacterial blight causes drawn-out lesions around the leaf tips and edges that turn white to yellow, then grey. The study of plant patterns and what we see visually when a plant is infected are referred to as plant disease studies. Farmers typically use a procedure in which agricultural experts inspect the plants with their own eyes before concluding which disease the plants have based on basic tests. What this method lacks is that a large number of farmers are delegated to a limited number of agriculture experts. As a result, before the specialist has time to inspect a plant, it is severely afflicted, and the disease spreads to other plants. Furthermore, in poor countries like Bangladesh, most farmers lack the awareness of when to seek professional aid when a disease hits. The method we’re discussing is a machine learning technique that employs a set of specific algorithms. We used Google to find 140 photos of diseased paddy leaves for the forecast. Images of two common paddy diseases, Blight and Spot, are included in our dataset. Our database includes solutions for a variety of plant textures, including several types of blight and spot. We tried to make a model that can successfully identify blight and spot disease. Our model successfully identified blight and spot disease with 73.33% accuracy. Our model accuracy is low for lack of proper sample images. Our model can only identify general blight and spot disease but those diseases have subcategories like bacterial blight and brown spot. We plan on working on a system that can successfully identify any crop disease with a higher accuracy rate and might also provide an immediate remedy without any help from an agriculture officer.
2
Related Works
A deep convolutional neural network model was presented to identify apple leaf disease prompted by the classical AlexNet, GoogLeNet, and their performance improvements. A collection of 13,689 photos of damaged apple leaves was used to define the four most frequent apple leaf diseases. The suggested model had a 97.62% overall accuracy [1]. Instead of considering the entire leaf, Jayme and his collaborators suggested a method that used individual lesions and spots to diagnose disease. They used Transfer learning on a GoogLeNet CNN that had been pre-trained. The identification of moderately diseased images was found to be the most unsuccessful, although success rates in the other cases were much higher [2]. Ritesh et al. specified a CNN-based predictive model for disease classification and prediction in paddy crops. They used disease images from the UCI Machine Learning Repository, which included three forms of diseases, and found that the test set had 90.32% accuracy and the training set had 93.58% accuracy [3]. Similarly, a group of researchers proposed a prediction model based on various CNN architectures and compared it to a previous model that used feature
270
K. A. Hossain et al.
extraction to extract features before classifying with KNN and SVM. Transfer learning was used to achieve the highest accuracy of 94%, with training accuracy of 92% and testing accuracy of 90% [4]. Mr. V Suresh et al. proposed a CNNbased predictive model to identify paddy crop disease. They used a dataset of 54,305 photos that covered 14 different crop organisms. They discovered that the highest accuracy for a particular disease was 96.5% in that report [5]. Likewise, SVM, Bayesian Network, Neural Network, and Multiple Linear Regression methods were evaluated and inaugurated by Yun Hwan Kim et al. [6]. Besides, for establishing video inspection methods of crop diseases, a group of researchers suggested a customized deep learning-based architecture in which faster-RCNN was used. The proposed approach was shown to be effective. for video inspection than VGG16, ResNet-50, ResNet-101, and YOLOv3 [7]. Shima Ramesh et al. Proposed a machine learning model which was applied to a database containing 160 papaya leaf images for training the model. The proposed model had a 70% overall accuracy [8]. Furthermore, Sharada P. et al. suggested a deep convolutional neural network for species classification and disease prediction in crops. They used 54,306 diseased as well as healthy images and found that the test set had 99.35% accuracy [9].
3
Methodology
Figure 1 depicts the overall workflow of our research. Our research has been divided into four categories. They are, – – – –
Information gathering Data pre-processing Data conditioning Machine Learning Algorithms in Action
3.1
Information Gathering
We used Google to find 140 photos of diseased paddy leaves for the forecast. Images of two common paddy diseases, Blight and Spot, are included in our dataset. Our database includes solutions for a variety of plant textures, including several types of blight and spot. We tried to make a model that can successfully identify blight and spot disease. Our model successfully identified blight and spot disease with 73.33% accuracy. Our model’s accuracy is low for lack of proper sample images. Our model can only identify general blight and spot disease but those diseases have subcategories like bacterial blight and brown spot. We plan on working on a system that can successfully identify any crop disease with a higher accuracy rate and might also provide an immediate remedy without any help from an agriculture officer.
Paddy Disease Prediction Using Convolutional Neural Network
271
Start
Organize the Data Visualize and Process Datasets Build CNN using Keras Sequential Model
Train CNN
Plot Predictions with Confusion Matrix Compare Performance
End
Fig. 1. Work-flow of the study
3.2
Data Preprocessing
Our data has been divided into three sets: training, validation, and testing. We accomplished this by dividing the data into sub-directories for each data set. There are 140 pictures in total, half of which are Blight and half of which are Spot. We don’t have nearly as much data as we need for the activities we’ll be performing, so we’ll just use a portion of it to have a similar amount of images in both classes. The script adds 40 samples to the training set, 15 to the validation set, and 15 to the evaluation set. There is an equal amount of Blight and Spot in each package. 3.3
Data Conditioning
We used Keras’ ImageDataGenerator class to generate batches of data from the train, valid, and test directories to train the model. We used ImageDataGenerator.flow from the directory() to construct a DirectoryIterator that produces batches of normalized tensor image data from the data directories.
272
3.4
K. A. Hossain et al.
Machine Learning in Action
We’ve used a Keras Sequential model to construct the CNN after we get the preprocessed and qualified dataset. Convolutional Neural Network (CNN): A convolutional neural network (CNN, or ConvNet) is a type of deep neural network that evaluates visual images and is based on the share-weighted design of the convolution kernels that explore the hidden layers and translation invariance properties. Construction: A 2-dimensional convolutional layer is the model’s first layer which consists of 32 output filters each with a kernel size of 3 × 3, and relu activation function. We only defined the input shape on the first layer, which was the shape of our data. Our images had a resolution of 224 pixels. The image is 224 pixels wide and has three color channels: Red, Green, Blue. As a result, we have an input shape of (224, 224, 3). After that, the data’s dimensionality was reduced by adding a max-pooling layer. The production of the convolutional layer was then Flattened and moved to a Dense layer. Since this Dense layer is the network’s output layer, it has two nodes, one for blight and one for spot. The Softmax activation function was applied to our output, resulting in a probability distribution over the blight and spot outputs for each sample. The Adam, SGD, RMSprop optimizer with a learning rate of 0.0001 to 0.00001, a loss of categorical cross-entropy, and accuracy as our output metric was used to construct the model. The verbose parameter was set to 2, which simply specifies the verbosity of the log output printed to the console during training. We’ve defined 10 as the number of epochs we’d like to run.
4
Experimented Results and Discussions
Using the SGD optimizer, 0.0004 learning rate, 2 hidden layers, ReLu, and Softmax activation functions, we were able to achieve a result of 73.33% accuracy. Using the RMSProp optimizer, 0.00001 learning rate, 2 hidden layers, ReLu, and Softmax activation functions, we were able to achieve a result of 70.00% accuracy. Using the Adam optimizer, 0.00002 learning rate, 2 hidden layers, ReLu, and Softmax activation functions, we were able to achieve a result of 56.67% accuracy (Figs. 2, 3, 4, 5 and 6).
Paddy Disease Prediction Using Convolutional Neural Network
(a) Plot
(b) SGD,0.00004,hl2 Own Dataset
Fig. 2. Matrices and Plot 1
(a) RMSProp,0.00001,hl2 Dataset
Own
Fig. 3. Matrices and Plot 2
273
274
K. A. Hossain et al.
(a) Adam,0.00002,hl2 Dataset
Own
Fig. 4. Matrices and Plot 3
(a) SGD,0.00004,hl2 UCI Dataset
Fig. 5. Matrices UCI Dataset
(a) Plot
Fig. 6. Matrices and Plot 2 UCI Dataset
Paddy Disease Prediction Using Convolutional Neural Network
275
Comparison: We compared our model using a publicly available dataset from UCI to get a better grasp of our model efficiency. Although our model achieved 73.33% accuracy with our dataset, it performed well above the average of 95% with the UCI dataset in terms of accuracy. We compared similar work of other researchers also (Table 1). Table 1. Comparison with other existing systems Reference number
Sample size Accuracy
[9]
160
UCI Dataset
80
Our Research Study 140
5
70% 95% 73.33%
Conclusion
Present work demonstrates how to use a sequential model and CNN to predict paddy disease. A system that can successfully predict paddy disease by using sample images of the leaves was proposed. But, the sample dataset of paddy disease was scarce. So, a digital archive was made that can help agricultural officers to know about crop diseases. Using the SGD optimizer, 0.00004 learning rate, 2 hidden layers, ReLu, and Softmax activation functions, the model achieved a prediction accuracy of 73.33% in identifying paddy diseases named Blight and Spot using our dataset. The model achieved a prediction accuracy of 95% in identifying paddy diseases named Blight and Spot using UCI Dataset. A flask web application was suggested to integrate with the above-mentioned model which demonstrated successful prediction of Blight and Spot. This web application cannot identify other paddy diseases except Blight and Spot. In this work, a separate website for collecting disease image data and a separate website for predicting the disease were constructed which are not userfriendly. This system can be fine-tuned further and more diseases can be added for identification. Two separate websites can be merged to provide a seamless experience to the user by using the Django framework. Then the data acquisition system can be automated by using robots or drones.
References 1. Liu, B., Zhang, Y., He, D., Li, Y.: Identification of apple leaf diseases based on deep convolutional neural networks. Symmetry 10(1), 11 (2018) 2. Barbedo, J.G.A.: Plant disease identification from individual lesions and spots using deep learning. Biosys. Eng. 180, 96–107 (2019)
276
K. A. Hossain et al.
3. Sharma, R., Das, S., Gourisaria, M.K., Rautaray, S.S., Pandey, M.: A model for prediction of paddy crop disease using CNN. In: Das, H., Pattnaik, P.K., Rautaray, S.S., Li, K.-C. (eds.) Progress in Computing, Analytics and Networking. AISC, vol. 1119, pp. 533–543. Springer, Singapore (2020). https://doi.org/10.1007/978-981-152414-1 54 4. Sharma, P., Hans, P., Gupta, S.C.: Classification of plant leaf diseases using machine learning and image preprocessing techniques. In: 2020 10th International Conference on Cloud Computing, Data Science and Engineering (Confluence), pp. 480–484. IEEE (2020) 5. Ferentinos, K.P.: Deep learning models for plant disease detection and diagnosis. Comput. Electron. Agric. 145, 311–318 (2018) 6. Kim, Y.H., Yoo, S.J., Gu, Y.H., Lim, J.H., Han, D., Baik, S.W.: Crop pests prediction method using regression and machine learning technology: survey. IERI Procedia 6, 52–56 (2014) 7. Li, D., et al.: A recognition method for rice plant diseases and pests video detection based on deep convolutional neural network. Sensors 20(3), 578 (2020) 8. Ramesh, S., Hebbar, R., Niveditha, M., Pooja, R., Shashank, N., Vinod, P., et al.: Plant disease detection using machine learning. In: 2018 International Conference on Design Innovations for 3Cs Compute Communicate Control (ICDI3C), pp. 41–45. IEEE (2018) 9. Mohanty, S.P., Hughes, D.P., Salath´e, M.: Using deep learning for image-based plant disease detection. Front. Plant Sci. 7, 1419 (2016)
Android Malware Detection System: A Machine Learning and Deep Learning Based Multilayered Approach Md Shariar Hossain(&) and Md Hasnat Riaz(&) Department of Computer Science and Telecommunication Engineering, Noakhali Science and Technology University, Noakhali, Chittagong, Bangladesh
Abstract. In our growing world of technology, mobile phones have become one of the most used devices. From the very beginning, Android has become the most popular Operating system. This vast popularity naturally invited cybercriminals to attack this OS with malware applications to steal or access important user data. It is critical to detect whether an app is malware or not during installation or run time. This paper proposes a malware detection system for android operating system, which is a combination of static and dynamic analysis for both Machine learning and deep learning classifiers. We tried to evaluate a multilayer detection process that will work on user permissions data for static analysis and network traffic data for dynamic analysis. User permissions will help the model to detect the malware before it is installed from AndroidManisfest.xml file and the network traffic data will help the model to detect the malware in the runtime. We have applied this dataset both on the machine learning and deep learning classifiers to make the model more accurate and efficient. These features were extracted from real android applications where we used deep Auto Encoder for pre-processing and clean the dataset. We have got some interesting accuracy rates as it is a combined multilayer model, then we are proposing the most accurate classifier to make the final verdict about malware for our model, which will work with a high rate for both static and dynamic analysis. Keywords: Android malware detection Machine learning Static and dynamic analysis Combined multilayer model Deep auto encoder
1 Introduction Mobile malware targets the devices, attacks them and cause them loss or leakage of the user’s confidential information. Mobile malware has started to progress at an alarming rate that has been much more predominant on Android OS because of being an open platform and beating the other mobile OS platform available in the market according to its popularity. Android has owned 71.81% of Mobile Operating System Market where iOS has 27.43%, Samsung 0.38% & KaiOS 0.14% globally till March 2021 [1]. Androidpowered gadgets are evaluated to be 1.2 billion in 2018, and the figure of worldwide conveyances in 2022 demonstrates that there will be 1.4 billion Android Gadgets [2]. Furthermore, on the official marketplace of Android named Google play store contains more than 3 million Android applications. By 2016 the Google play store adds about 1300 © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 277–287, 2022. https://doi.org/10.1007/978-3-030-93247-3_28
278
Md S. Hossain and Md H. Riaz
new apps per day [3]. Android is the most growing operating system for a long time, it has created an enormous space for hackers and cybercriminals to play on this ground. The production rate of android malware reaches 9411 per day in 2018, so in every 10 s, a new malware is produced [2]. Now a day’s android-based applications have become major source of service provider for various smart city applications [21, 22]. Any infiltration in these service based applications can cause catastrophic results. More than 350000 new malwares are registered by AV-TEST institute in a single day. They are classified on the basis of their characteristics [4]. Researchers recorded the foremost perilous malware of 2020 for the android platform. These are BLACKROCK, FAKESKY, The EventBot malware, AGENT SMITH Malware, ExoBot, NotCompatible, Lastacloud, Android Ransomware, Android Police virus, Svpeng virus, Ghost Push virus, Mazar Malware, Gooligan Malware, HummingBad virus, HummingWhale virus, GhostCtrl virus, Lockdroid ransomware, Invisible Man, LeakerLocker ransomware, DoubleLocker ransomware, Lokibot virus, Tizi Android, Anubiscrypt ransomware, Opt-Out virus [5].
Applications
Application Framework
Libraries
Android Runtime (ART)
Linux Kernel Fig. 1. Basic software stack of Android framework.
Figure 1 shows the basic software stack of Android framework. On the top of the Linux Kernel there are libraries, Application Framework and Application Software. Application software running in framework which includes Java-compatible libraries. This is the target zone of most of the hackers and cyber criminals. They use malicious libraries which make the application malicious. Application Framework includes Activity Manager, Content Providers, various services, user permission section. Usually, user gives the permissions to the application while installing without noticing what the permissions are, are those necessary for that corresponding app?, and most importantly what are the possible risk factors to be occurred by giving this access to the permissions. Even Google’s security firewall in their official app marketplace, Google Play Store, can still be bypassed which are considered safer than any other third-party application. Security experts reported a list of 75 applications from the Google Play Store which were infected with the Xavier Android virus [5]. However, they were infected; it is not the only way. Even if, after the installation of a static verified app that already has been scanned with its pre-permissions, there is still a great chance to be infected by network traffic functionalities at the run time.
Android Malware Detection System: A Machine Learning and Deep Learning
279
Malware Detection is the core part of cyber and device security. There are two known analytical approaches for malware detection, namely, static analysis and dynamic analysis. Detection is done through the source code, Manifest file, API in static analysis before the program's execution. Static Analysis is faster to detect a malicious app and prevent malware before it is installed; however, it can be ignored by obfuscation or encryption techniques. On the other hand, dynamic analysis works on the execution behavior of the application. This analysis method monitors the application execution characteristics and android malware activity at the runtime. Researchers showed both static and dynamic way to detect the android malware applying machine learning classifiers. There are different accuracy rates for different classifiers—some of those worked with permissions-based data set for static analysis [6]. For the Dynamic approach, network traffic data were used. But there is an interesting relation of accuracy when we preprocess the dataset with the Deep Auto Encoder (DAE) and apply them to both Machine Learning and Deep learning classifiers for static and dynamic analysis. In our research, we evaluated both static and dynamic analysis methods. We have proposed a combined analysis system with a good accuracy rate on both machine learning and deep learning classifiers where data was extract from the AndroidManifest.xml file for permissions and APK file, java files for network traffic dataset. We used Deep Auto Encoder (DAE) for cleaning the data [7]. We showed the change of accuracy rate before and after using pre-processing by DAE. Information for the dynamic analysis collected from the operating system during the runtime, such as system calls, network traffic access, files, and memory modifications. For Static analysis, we extracted permission features from AndroidManifest.xml file using various feature selection technique [8].
2 Related Works A basic behavior of mobile malware is exchanging delicate data of the cell phone client to malevolent inaccessible servers. Suleiman Y. Yerima at [9] proposed and investigated a parallel machine learning based classification approach for early discovery of Android malware. Utilizing genuine malware samples and benign applications, a hybrid classification model is created from the parallel combination of heterogeneous classifiers. To find the best combination of both features selection and classifier, different feature selection were applied to different machine learning classifiers [10]. There are two methods of Malware detection System, Static Analysis & Dynamic Analysis. Static analysis is a strategy that surveys malevolent behavior within the source code, the data, or the binary files without the direct execution of the application, where Dynamic analysis is a set of strategies that considers the behavior of the malware in execution through signal reenactments [11]. Here this system screens different permission based features and events obtained from the android applications, and investigates these features by utilizing machine learning classifiers to classify whether the application is good-ware or malware for static analysis [6, 12, 13]. Where in dynamic analysis for identifying Android malwares by first analyzing their network traffic features and after that building a rule-based classifier for their detection [14, 15].
280
Md S. Hossain and Md H. Riaz
Ankur Singh Bist at [16] presented an outline of deep learning strategies like Convolutional neural network, deep belief network, Auto-encoder, Restricted Boltzmann machine and recurrent neural network for Malware detection. Where data will be pre-processed by an Auto-encoder to make a clean dataset with high accuracy [2]. This research focused on examining the behavior of mobile malware through crossover approach. The hybrid approach relates and remakes the result from the static and dynamic malware analysis in creating a follow of malevolent event [17]. As an example Crowdroid is a machine learning-based system that recognizes Trojan-like malware on Android devices, by investigating the number of times each system call has been issued by an application amid the execution of an activity that requires client interaction [18].
3 Data and Features Our system is a combination of Static and Dynamic Analysis. In the Static part, we have taken Data from the Permission, Intent, uses-feature & API extracted from the AndroidManifest.xml file. In the Dynamic Analysis, we are taking data from Network traffic based on DNS, TCP & UDP. Datasets have been taken in Comma Separated Value (CSV) format, which was obtained by feature extraction and converted into CSV for both static & dynamic approaches [11]. For the static analysis, we are using 398 331 dimensions data, where the training data set will start from 20% and increase correspondingly. For the Dynamic analysis, we are using 7845 17 dimensions data including 4704 benign data & 3141 malicious data. Here the training data set will start from 20% and increase correspondingly with random state = 45%. Table 1. Features from permissions and network traffic datasets Top 10 permission features for static analysis
Features from network traffic for dynamic analysis
1.android.permission.INTERNET 2. android.permission.READ_PHONE_STATE 3. android.permission.ACCESS_NETWORK_STATE 4. android.permission.WRITE_EXTERNAL_STORAGE 5. android.permission.ACCESS_WIFI_STATE 6. android.permission.READ_SMS 7. android.permission.WRITE_SMS 8. android.permission.RECEIVE_BOOT_COMPLETED 9. android.permission.ACCESS_COARSE_LOCATION 10. android.permission.CHANGE_WIFI_STATE name, tcp_packets, dist_port_tcp, external_ips, vulume_bytes, udp_packets, tcp_urg_packet, source_app_packets,remote_app_packets, source_app_bytes, remote_app_bytes, duracion, avg_local_pkt_rate, avg_remote_pkt_rate, source_app_packets.1, dns_query_times, type, dtype = object
Android Malware Detection System: A Machine Learning and Deep Learning
3.1
281
Analysis of Android Manifest and Permissions
Every Android app must have a Manifest file which is written at the root of the source code set. It describes essential information of the app to the Android Build tools, the Operating System and Google Play.
This is an AndroidManifest.xml file from an Android Application, where permissions are taken before the app is installed. This whole file is in the block < manifest > where there are many sub-blocks. In the < application > sub-block, all the user permissions are given, which is under the control of a developer, not the users. It is up to the developer whether he takes the permissions from the user at the runtime or not. In our study, we used 398 permissions from 331 different applications.
4 Research Methodology 4.1
Deep Auto Encoder (DAE)
An auto encoder is a type of artificial neural network utilized to learn efficient unsupervised data coding. The target of an auto encoder is to learn a representation (encoding) for a set of data, ordinarily for reduction of dimensionality to disregard the signal clamor by preparing the network. Simply an auto encoder is a feedforward neural network which is non-recurrent and comparative to single-layer perceptron that take part in multilayer perceptron (MLP) – utilizing an input layer, one or more hidden layers and an output layer and all are associated respectively. The number of nodes (neurons) of the output layer are the same as the input layer. Its purpose is to reproduce its inputs (minimizing the difference between the input and the output) rather than anticipating a target esteem Y given inputs X. Hence, auto encoders are unsupervised learning models. It consists of two parts, the encoder and the decoder, which can be defined as transitions u and W such that: U:X!F W:F!X U; W ¼ arg min k X ðWoUÞ k 2U;W
282
Md S. Hossain and Md H. Riaz
Hidden Layer
Input Layer
Output Layer
Fig. 2. Structure of deep auto encoder (DAE)
4.2
Research Approach
As shown in Fig. 3, we have two different analysis layers named static and dynamic analysis in our model. Static analysis analyzes the user permissions (i.e. Internet, read phone state, read network state, write external storage, read SMS, access coarse location), Activity intents, features and Application Programming Interface (API). Dynamic analysis analyze the network traffic features such as DNS, TC, UDP, TCP Packets, Source App Packets, Remote App packets. We extracted our datasets from these features as comma separated values (CSV) and applied these CSV datasets to the machine learning and deep learning classifiers. We elected some popular classifiers for both machine learning and deep learning respectively Naïve Bayes (NB), K Neighbors, Decision Tree, Random forest, and Multilayer perceptron (MLP), Convolutional neural network (CNN). Table 3 and Table 2 are the tabular representation of accuracy percentages for the static and dynamic analysis. After applying these classifiers we have got the different accuracy rate for the different classifiers which are tabulated in Table 2. For the machine learning classifiers we have got Decision tree algorithm with 94% of accuracy for static analysis and Random forest classifiers with 92% of accuracy rate for dynamic analysis which are the highest among all the machine learning. As stated we used MLP and CNN as deep learning algorithms where we have got 90% and 75% of accuracy for Multilayer perceptron (MLP) in static analysis and dynamic analysis respectively. Table 2 shows the accuracy percentages of different classifiers for the static and dynamic analysis before applying the Deep Auto Encoder (DAE) for data preprocessing. After the measurement of the performance in Table 2 we applied DAE to the datasets.
Android Malware Detection System: A Machine Learning and Deep Learning
283
Permission Based Data Training Sample
Training Sample Static Analysis Machine Learning Classifiers
Naive Bayes (NB)
Permission
Intent
Deep Learning Classifiers
API
Uses-Features
Data Pre-Processing Auto-Encoder
Multi-Layer Perceptron (MLP) Convolutional neural network (CNN)
K-neighbors (KNN) Decision Tree
Dynamic Analysis
Random Forest
Network Traffic
DNS
TC
UDP Testing S l
Testing Network Based Data
Most Accurate Classifier
Is App Malicious or Not? Alert the Malicious Information
Alert the Benign Information
Fig. 3. Methodology of combined multilayer android malware detection system
Table 3 shows the performance measurement of the classifiers after applying the DAE on the datasets for pre-processing where machine learning classifier K Neighbors was the most accurate with 88% and 94% respectively for static and dynamic analysis. Deep learning classifiers showed a great increment after applying DAE as both of the classifiers achieved 90% of accuracy where Multilayer perception (MLP) was the highest accuracy rate of 88% and 90% respectively for Static and Dynamic analysis. According to Fig. 3, our study considers the most accurate classifiers from the combination of machine learning and deep learning algorithms for both static and dynamic analysis where the considered classifiers will be used in our final proposed model. We considered precision, recall, f1 score and support as the parameters of the classifiers.
284
Md S. Hossain and Md H. Riaz
Table 2. Accuracy of static and dynamic according to algorithms before preprocessing Algorithms and Classifiers Machine learning
Deep learning algorithms
Naïve Bayes (NB) K Neighbors Decision Tree Random forest Multilayer perceptron (MLP) Convolutional neural network (CNN)
Static analysis (Accuracy) 84% 89% 94% 91% 90%
Dynamic analysis (Accuracy) 45% 89% 88% 92% 75%
89%
71%
Table 3. Accuracy of static and dynamic according to algorithms after preprocessing Algorithms and Classifiers Machine learning
Deep learning algorithms
Naïve Bayes (NB) K Neighbors Decision Tree Random forest Multilayer perceptron (MLP) Convolutional neural network (CNN)
Static analysis (Accuracy) 82% 88% 81% 84% 88%
Dynamic analysis (Accuracy) 80% 94% 89% 92% 90%
86%
90%
5 Results and Discussion As we see in Fig. 4 and Fig. 5 are showing the plot of accuracy according to our dataset train-test splitting. For both Static and Dynamic analysis we started splitting data from 50%–50% train-test and gradually came up to 80%–20%. For both dynamic and static we have the same graph of accuracy. Table 4 and 5 are explaining the parameters for decision tree and k-neighbors. Kang, H., Jang, J. W., Mohaisen, A., & Kim, H. K compared the average accuracy of most popular android malware detection system Crowdroid and Andro-profiler categorizing on malware like Adwo, AirPush, Boxer, FakeBattScar, FakeNotify and Gin Master with 350 benign applications where the found 99% accuracy for Andro-profiler and 35% accuracy for Crowdroid [19]. TinyDroid has the 97% of accuracy rate as it is a light weight detection system [20]. But we have got for both dynamic and static analysis the average accuracy rate 94% because of our combined multilayered process.
Android Malware Detection System: A Machine Learning and Deep Learning
0.94 0.92
0.92 0.88
0.91
Dynamic Analysis (K-Neighbors)
0.91
0.88
0.88
80%-20% 70%-30% 60%-40% 50%-50%
Train-Test split accuracy
Accuracy and F1-score
Accuracy and F1-score
Static Analysis (Decision Tree) 0.96 0.94 0.92 0.9 0.88 0.86 0.84
285
0.96 0.94 0.92 0.9 0.88 0.86 0.84
0.94 0.92
0.92
0.91
0.88
0.91 0.88
80%-20% 70%-30% 60%-40% 50%-50%
Train-Test split accuracy
f1-score
0.88
f1-score
Poly. (accuracy)
Poly. (accuracy)
Fig. 4. Accuracy and F1-score of Static analysis (Decision Tree) according to TrainTest split.
Fig. 5. Accuracy and F1-score of Dynamic analysis (K-Neighbors) according to TrainTest split.
Table 4. Parameters of the decision tree classifier.
Table 5. Parameters of the K-Neighbors classifier
Decision tree
Precision Recall F1-score Support
K neighbors 3 Precision Recall F1-score Support
0 1 Accuracy Macro avg Weighted avg
0.97 0.91
0.89 0.98
0.94 0.94
0.93 0.94
Benign Malicious Accuracy Macro avg Weighted avg
0.93 0.94 0.94 0.94 0.94
37 43 80 80 80
0.96 0.85
0.86 0.96
0.91 0.91
0.91 0.91
0.91 0.90 0.94 0.91 0.91
955 612 1567 1567 1567
According to our proposed model two best classifiers are: 1) 2)
Static Analysis Dynamic Analysis
Decision Tree (DAE not applied) K Neighbors (DAE applied)
94% of accuracy 94% of accuracy
Among all of the classifiers we used, we got our best two classifiers for dynamic and static analysis according to Table 2 and 3. The goal of this study was proposing a malware detection system for android operating system, with a combination of static and dynamic analysis for both Machine learning and deep learning classifiers to evaluate a multilayered detection process. We made our final model with the combination of static and dynamic layers and picked the most accurate classifiers according to our datasets. After applying these our final proposed model is shown in Fig. 6. Our accuracy rate is not the best compared to others but the layers and model we provided for our proposed system with average accuracy rate of 94% has a great impact on Android malware detection sector for its installation (static) and runtime (dynamic) detection algorithm.
286
Md S. Hossain and Md H. Riaz Static Analysis Input
Decision Tree
Malicious
Data Preprocessing by Deep Auto Encoder
Malicious ?
Dynamic Analysis
Input
K Neighbors
Benign
Fig. 6. Final proposed model
6 Conclusion In this paper, a combination of machine learning and deep learning classification approaches to a multilayer Android malware detection using the static and dynamic analysis methods where an Auto Encoder was used to preprocess the datasets. The proposed model utilized a wide range of features from the Android manifest file, including permission features and Network traffic features. The growing malware community is a brutal threat where existing detection tools, even Google’s official marketplace can be bypassed. These approaches are always calling for alternatives where the area of features and rate of accuracy will be more efficient. The combined detection model in this paper proposed a multilayer scheme that provides a tool that will work before the installation and runtime. If Android malware can pass static layer detection, it will be detected by dynamic layer where both of them work with the most accurate datasets according to their features. This proposed model will be very effective in classifying a new application because 1) dynamic app features use Deep Auto Encoder and 2) Permission features input will be taken from one of the root files of an application before installation. As future work, we aim to develop and evaluate an Android Malware Detection engine using a more Neural Network Algorithm with Normalized & more specific Preprocessing of the dataset. We are targeting to analyze the AndroidManifest.XML file to extract features from the Services, Activity, and Broadcast Manager, giving us more specific features for dynamic analysis in runtime.
References 1. Statcounter. https://gs.statcounter.com/os-market-share/mobile/worldwide. Accessed 05 Apr 2021 2. Naway, A., Li, Y.: Android malware detection using autoencoder. arXiv preprint arXiv: 1901.07315(2019) 3. CleverTop. https://clevertap.com/blog/mobile-growth-statistics/. Accessed 13 Apr 2021 4. AVTEST.Org. https://www.avtest.org/en/statistics/malware. Accessed 10 Apr 2021 5. GIZMEEK. https://gizmeek.com/researchers-listed-the-most-dangerous-malware-virusesandroid-viruses-of-2020. Accessed 11 Apr 2021
Android Malware Detection System: A Machine Learning and Deep Learning
287
6. Tchakounté, F.: Permission-based malware detection mechanisms on android: analysis and perspectives. J. Comput. Sci. 1(2) (2014) 7. Wang, W., Zhao, M., Wang, J.: Effective android malware detection with a hybrid model based on deep autoencoder and convolutional neural network. J. Ambient. Intell. Humaniz. Comput. 10(8), 3035–3043 (2018). https://doi.org/10.1007/s12652-018-0803-6 8. Mahindru, A., Sangal, A.L.: FSDroid:-A feature selection technique to detect malware from android using machine learning techniques. Multimedia Tools Appl. 1–53 9. Yerima, S.Y., Sezer, S., Muttik, I.: Android malware detection using parallel machine learning classifiers. In: 2014 Eighth International Conference on Next Generation Mobile Apps, Services and Technologies, pp. 37–42. IEEE, September 2014 10. Mas’ud, M.Z., Sahib, S., Abdollah, M.F., Selamat, S.R., Yusof, R.: Analysis of features selection and machine learning classifier in android malware detection. In: 2014 International Conference on Information Science & Applications (ICISA), pp. 1–5. IEEE, May 2014 11. López, C.C.U., Cadavid, A.N.: Framework for malware analysis in Android 12. Zarni Aung, W.Z.: Permission-based android malware detection. Int. J. Sci. Technol. Res. 2 (3), 228–234 (2013) 13. Tchakounté, F.: A malware detection system for android (2015) 14. Arora, A., Garg, S., Peddoju, S.K.: Malware detection using network traffic analysis in android based mobile devices. In: 2014 Eighth International Conference on Next Generation Mobile Apps, Services and Technologies, pp. 66–71. IEEE, September 2014 15. Zaman, M., Siddiqui, T., Amin, M.R., Hossain, M.S.: Malware detection in Android by network traffic analysis. In: 2015 International Conference on Networking Systems and Security (NSysS), pp. 1–5. IEEE, January 2015 16. Bist, A.S.: A survey of deep learning algorithms for malware detection. Int. J. Comput. Sci. Inf. Secur. (IJCSIS), 16(3) (2018) 17. Mas’ud, M.Z., Sahib, S., Abdollah, M.F., Selamat, S.R., Yusof, R., Ahmad, R.: Profiling mobile malware behaviour through hybrid malware analysis approach. In: 2013 9th International Conference on Information Assurance and Security (IAS), pp. 78–84. IEEE, December 2013 18. Burguera, I., Zurutuza, U., Nadjm-Tehrani, S.: Crowdroid: behavior-based malware detection system for android. In: Proceedings of the 1st ACM workshop on Security and Privacy in Smartphones and Mobile Devices, pp. 15–26, October 2011 19. Kang, H., Jang, J.W., Mohaisen, A., Kim, H.K.: Detecting and classifying android malware using static analysis along with creator information. Int. J. Distribut. Sensor Netw. 11(6), 479174 (2015) 20. Chen, T., Mao, Q., Yang, Y., Lv, M., Zhu, J.: TinyDroid: a lightweight and efficient model for Android malware detection and classification. Mobile Inf. Syst. (2018) 21. Haque, A.K., Bhushan, B., Dhiman, G.: Conceptualizing smart city applications: Requirements, architecture, security issues, and emerging trends. Expert. Syst. (2021). https://doi. org/10.1111/exsy.12753 22. Haque, B., Shurid, S., Juha, A.T., Sadique, M.S., Asaduzzaman, A.S.M.: A novel design of gesture and voice controlled solar-powered smart wheel chair with obstacle detection. In: 2020 IEEE International Conference on Informatics, IoT, and Enabling Technologies (ICIoT), pp. 23–28 (2020). https://doi.org/10.1109/ICIoT48696.2020.9089652
IOTs, Big Data, Block Chain and Health Care
Blockchain as a Secure and Reliable Technology in Business and Communication Systems Vedran Juričić1(&), Danijel Kučak2, and Goran Đambić2 1
Faculty of Humanities and Social Sciences, University of Zagreb, Zagreb, Croatia [email protected] 2 Algebra University College, Zagreb, Croatia {danijel.kucak,goran.dambic}@algebra.hr
Abstract. Today we are witnessing a constant growth of data from different sources and in different fields, like government, health, travel, business and entertainment. Traditionally, this data is stored in one central place located on company’s or organization’s server, providing easier control, management, analysis and business intelligence. However, this architecture has certain disadvantages like existence of single point of failure, vulnerability and privacy issues. Distributed ledger technology like Blockchain presents new architecture that does not store data in one location and that does not have central authority. Although Blockchain technology offers improvements in security, transparency and traceability, it is criticized for its resource consumption, complexity and its lack of clear usefulness. Because of its complexity and distributed architecture, users are still suspicious about its security and privacy of its data. This paper analyses various aspects of applying Block-chain technology in modern systems, focusing on data storage, processing and protection. By enumerating common threats and security issues, this paper tries to show that Blockchain is trustworthy, reliable and that can be a potential solution or participant for current security problems. Keywords: Data protection Data privacy Security Distributed system Smart contracts Anonymity
Vulnerability
1 Introduction The constant growth of available government and commercial services, cloud computing, social networks and the Internet of Things leads to a constant increase of the amount of generated and stored data. About 90% of all the data in the world has been generated from 2011 until 2013 [34] and there are justified predictions that data will continue to increase exponentially. For example, IDC has predicted that global data volume in 7 years will grow from 4.4. zettabytes to 44 zettabytes [11]. Some authors claim that data will be growing 40% per year in the next decade [15]. Data gathering, processing and analysis in traditional business, government and public organizations was performed on computers in their local network and © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 291–301, 2022. https://doi.org/10.1007/978-3-030-93247-3_29
292
V. Juričić et al.
organizations attempted to provide their services to users available through Internet. Organizations commonly had a shared data storage available through network infrastructure [26], allowing simultaneous data access to more users or computers without unnecessary redundancy and allowing storage and operations on data from different sources. Processing, i.e. computing, in modern information systems is physically decoupled from the storage, because this architecture provides hardware and software independence, fragmentation and unused space reduction and easier scalability [8]. This allows data to be located remotely, in an external network or in a cloud, without affecting the actual services or applications provided by organizations. Personal data, bank accounts, asset information etc. are therefore scattered on multiple locations, unknown to the users or owners. Also, some organizations are required to perform daily backup of their entire data on different locations and applications of different vendors are required to communicate one with another to perform certain tasks. Every application needs to follow security policies, granting or restricting access to its resources [6], which means that user credentials are shared between applications through network. Each layer, application or communication introduces certain security or privacy risk. The foundation of relationship between users and organization is trust [1], which can be defined as “the belief that the vendor will act cooperatively to fulfil the customer’s expectations without exploiting their vulnerabilities” [30]. User’s perception of trust in certain technology or application has an important impact in whether he adopts and uses that technology [17]. This is especially relevant when organizations are dealing with sensitive user data, for example financial and medical institutions, where users expect their data is completely secure and their privacy completely protected. Blockchain, a distributed ledger technology, was introduced about 10 years ago [25]. It belongs to a class of disruptive technologies, which means it significantly impacts and alters traditional businesses, industries and organizations. It was successfully applied in cryptocurrency Bitcoin [4] in 2008. and is rapidly being integrated in various aspects of modern applications and processes, because if its architecture, design and specific characteristics. Technology does not require central authorization authority, but relies on a principle of equality and justice, and is promising numerous benefits and innovations in modern information and communication systems. Marc Andreesen, one of the leading Silicon Valley investor, claims that “blockchain is the most important invention since the internet itself”. Johann Palychata said that blockchain should be considered as an invention like the steam or combustion engine [9]. This paper presents the most relevant characteristics of the Blockchain technology focusing on data, security, protection and privacy. Because of its advantages, this technology has very wide application in various sectors. This paper presents and analyses the most recent or important applications in finance, public and social services and education. However, users are often not aware that the technology is not completely safe and without any risk, primarily because it relies on the existing vulnerable technologies, but also because of the Blockchain mechanisms and operations. Paper enumerates possible threats and issues with data security and protection, and provides Blockchain based solutions or proofs of Blockchain usage suitability.
Blockchain as a Secure and Reliable Technology
293
2 Blockchain Technology The blockchain is a sequence of blocks that contains a complete list of transactions [20]. Each block consists of the block header and the block body. Body contains a transaction counter and a certain number of transactions, which depends on the block size and the size of each transaction. Header contains a timestamp, a number of seconds since 1/1/1970, a nonce (random number) and a hash value to the previous block or its parent. The first block in this chain is classed a genesis block and has no parent. Each block also contains a hash value of all the transactions in the block (Merkle tree root hash). Hash function used is SHA-256, a cryptographic mathematical function that was chosen from Satoshio Nakamoto in an implementation of the cryptocurrency Bitcoin [25]. Hash functions have an unidirectional characteristic, which means that for a given input an output is really fast and easy to calculate, but it is almost impossible to find an input for a given output. They also have a characteristic of uniqueness; for given input only one unique hash value exists and it is almost impossible to calculate two equal hash values for two different inputs, regardless of their bit distance. As a result of this characteristics, it is easy for anyone to calculate a hash value in a block header from the transaction data in a block body. If any bit in a transaction data is modified, the calculated hash value will be different from the value found in a block header, which means that someone was tampering with transaction data (or the hash value). As already mentioned, a block also contains a hash of its parent block, which enables validation of all previous blocks and all transactions in the whole chain. Blockchain is a distributed database of records and all transactions that have been created are available to all participants, or from the technical aspect, to all nodes in its network [9]. When this databases changes, for example when a node creates a new transaction, the database is resent and updated throughout the network. All nodes can verify all transactions from the current state to the first transaction in a genesis block. The problem this technology has successfully solved is who to trust in a completely untrusted environment, a node that has calculated one hash value or a node with another hash value, i.e. which user can publish the next block in a chain. Blockchain solves this through implementing one of many possible consensus models, like Proof of Work, Proof of Stake, Round Robin, Proof of authority, Proof of Elapsed Time and Byzantine faulttolerant variants [14, 39]. Each node in a network has a chance to publish an entire block, but competes with other nodes to receive a prize, most likely a financial reward [39]. The competing nodes are called miners and currently receive a reward of 12.5 Bitcoins or about 69 000 US dollars [5]. Neither consensus model is perfect and a model is mainly suitable only for small number of usage scenarios. One of the most popular models is Proof of Work (PoW) consensus model, where nodes are solving a computationally intensive puzzle and where their solution is the proof they have performed work. The puzzle complexity is variable and adapted by the network itself, in order to achieve its solution in approximately 10 min. In the Proof of Stake (PoS) all users that create blocks are obligated to invest a certain amount of money (cryptocurrency). If created block can be validated, network returns them the whole amount.
294
V. Juričić et al.
It is often criticized for its enormous resource (power) consumption. Miners’ processing power is six to eight times greater than today’s 500 most powerful supercomputers [31]. These resources are spent on solving unimportant and virtual mathematical problems. Current trends in blockchain development are aimed at finding reallife problems in science that can be used instead of an original puzzle and that ways increase a blockchain usefulness. For example, Folding@home project simulates biomedical processes, SETI@home processes data from telescopes, and there are some attempts to implement solving NPcomplete problems [7]. Current blockchain systems can be classified as public, private and consortium blockchains, that differ in consensus models, efficiency, etc. [7] Public blockchain is completely open, permitting users from the whole world as its participants and therefore have the greatest number of active systems and nodes. Private blockchains are mostly limited to one organization and consortium to a selected set of nodes. Wang et al. [37] have made a comparison between those classes, based on permissions, immutability, efficiency, centralization and consensus process. It is shown that public blockchain has the most suitable characteristics for security, authentication and equality, but very low efficiency. Efficiency is a measure of block propagation, i.e. a time required to create a new block of transactions. Private and consortium blockchain have high efficiency, sacrificing integrity and increasing a risk for data tampering. One of the extensions of blockchain functionality is smart contract, which marks the beginning of the Blockchain 2.0 era. Smart contract contains functions and state data, and is executed within the blockchain network. That way, blockchain can perform additional operations, like store data, perform calculations, send funds to accounts and expose data to public [39] and that way a network can be developed or adopted to support different procedures or applications.
3 Applications of Blockchain Technology Blockchain application can be observed following the evolution of blockchain technology. First, blockchain was known only as technology standing behind digital currencies. Because of its characteristics, it was then applied in the whole financial sector. In the latest years, blockchain technology can be found in various areas of everyday life, such as education, culture, healthcare etc. [38] However, most blockchain applications and models are still developed in financial sector while the benefits of the technology is noticed for example in transportation where it helps improve the path of products and recording of the paths, in healthcare where it also makes health records easier to manage [3]. One of the sectors where blockchain can be seen, not only as a technology that should be applied, but also as a technology that must be applied soon because of the big increase in use is online learning and education in general. Online education has become more popular over the past years, but there are many negative aspects of it. As everything is online, students’ privacy is questionable, mostly in terms of students’ academic achievements and available student works. Also, sharing learning materials among students and teaching materials among teachers can be challenging regarding security. Maybe the most important part of online education are knowledge and ability
Blockchain as a Secure and Reliable Technology
295
tests, and many doubts in the credibility of scores student achieve when being examined online. Sharing data and privacy challenges are the main reason why new data storing and sharing methods should be included in online education systems where data should not be available to the public or any other interested parties. Blockchain technology is expected to decrease the possibility for public to get access to students’ work, profiles or any relevant data. Another important feature that is available in online education because of blockchain technology, and it is of great importance, is evidence of finished online courses. In other words, it is possible to give valid certificates that prove that a specific student finished online course. Online education platforms save all the relevant data, such as information about the course, teachers, students, student's grades, date of exam, etc. and then encrypts the data, so that digital certificates with all the data can be given to student. Student can then give the digital certificate to the employer or any other interested institution who can then have all the information about the person (in this case, former student) by using the public key [35]. Not only sharing information is easier, but also getting to certificate in case of losing one. Using blockchain in healthcare also has a great potential. Hölbl et al. [16] made a research about currently active examples of blockchain technology implementation in healthcare systems and prototypes made for academic research. They outlined that blockchain technology could contribute to healthcare as it could be used in exchanging medical records, drug prescriptions, risk data management concerning patients and medicines, etc. It could also make patient data more secure as these data are extremely important and sensitive. Nowadays, patient-driven interoperability is desirable approach in healthcare, instead of institution-driven interoperability. However, patient-driven interoperability brings out many questions, for example privacy, security, patient engagement, legal procedures and regulations etc. Blockchain technology could solve some of problems that are mentioned, regarding data transaction, privacy and security of medical and patients’ data [13]. Since it is already recognized among researchers that blockchain technology could improve healthcare system, there is increasing number of research connected to it. There are many descriptive works on the subject, but there are not many prototype implementations [16]. One of the main reasons is that blockchain technology itself should be upgraded so that it could respond to all the challenging needs of health systems. Many researchers claim that blockchain technology can help decentralizing standard voting systems [36]. Although blockchain technology has many advantages regarding security and privacy, taking in consideration previous research, it is concluded that blockchain technology still has many weaknesses to be implemented in voting systems. At first glance, implementing blockchain technology could make voting easier, it could reduce costs and people engaged in the whole process. On the other hand, it could have serious impact on privacy and data security. As it can be seen in financial sector, education or healthcare, personal data security is often a problem in modern technology. Blockchain was first used as a technology in cryptocurrency and it is proven that it can be trustworthy. Zyskind et al. [43] developed a platform that makes users able to own and control their data, but also to let companies provide customized services for them. It enables users to be familiar with all the
296
V. Juričić et al.
situations when their personal or other important data about them is collected, bus the users will always be known as the real owners of their important data. On the other hand, it makes companies easier to work because they do not have to fear they will violate privacy policy. The decentralized platform they developed using blockchain technology makes working with sensitive data easier when taking in consideration legal rights and other regulations. Zikratov et al. [42] explained how blockchain technology can help provide data integrity. It can be done using transactions and authentication so that the data integrity security is increased. They also mentioned disadvantages in using blockchain as it is hard to obtain the system because there is not enough computing power for it. Also, key that is used to encrypt the content could be cracked using brute force approach. Feng et el. [12] also recognized protecting privacy as one of the greatest challenges when using blockchain technology. They introduced several methodologies that should help protect privacy and they also highlighted their disadvantages together with the practical implementation of the methods. Security objectives in most of the methodologies are blurring transactional relationship and hiding origin and/or relationship. Main disadvantages in the methodologies observed are waiting delay, no protection on data and/or transaction target, storage overhead and not enough anonymity. There are many known advantages of blockchain technology mentioned, but the authors pointed out that it will not be possible to make the most of the technology as long as the privacy and security issues are completely resolved. Among all applications of blockchain, it is also argued that blockchain technology could help improve vehicular ecosystem. Smart vehicles are connected with vehicle owners, car manufacturers, road infrastructure etc., in other words they are connected to Internet. As they are highly connected, it is hard to secure them, mostly because of the great number of data. Security of vehicle is then endangered, but also the security of the passengers. Centralization, lack of privacy and safety threads are the main reasons why security and privacy methods that are used in smart vehicles. Dorri et al. [10] suggested to implement security architecture based on blockchain technology. Public keys provide security, and as one on the most common blockchain technology properties is decentralization, the problem with centralized control is also solved in that case. The comparison of blockchain technology and conventional technologies is made for insurance, electric vehicles, wireless remote software and car sharing services. The advantages of using blockchain technology are distributed data exchange, secured payment, user data privacy, and distributed authorization.
4 Data Protection and Privacy Blockchain is one the core technologies in field of finance, business and science. There are some differences in perception of its security and privacy, where blockchain is used because of its improvements in security and privacy, but on the other hand, there exists certain scenarios where its usage is not yet recommended. There are many reports of security incidents in blockchain systems. This technology relies on the existing infrastructure and computer equipment, making it vulnerable to all “ordinary” or “classical” attacks. Sengupta et al. [33] identified the most relevant
Blockchain as a Secure and Reliable Technology
297
attacks and grouped them into four categories: physical, network, software and data attacks. Some of them include: Man in the Middle, Sybil, Spyware, Worms, etc. There are known, more or less successful, countermeasures for them but are not directly related to the blockchain technology. 4.1
Vulnerabilities and Attacks
There are some vulnerabilities arising from the blockchain architecture, consensus models and protocols. These vulnerabilities are used to perform an attack on the blockchain network and often result in users’ financial damage. For example, in 2013, hackers stole over 4000 bitcoins from a company that stores cryptocurrencies in wallets [24]. In 2017, over seven million dollars in Ethereum cryptocurrency were stolen from the startup CoinDash [41]. The same year a crucial system’s component was destroyed, resulting in inaccessible funds in almost 600 wallets with over 500.000 US dollars [28]. Some of the most fundamental attacks in the blockchain network are double spending and eclipse attack. Double spending happens when an attacker buys good from a merchant. He creates a transaction and waits until it appears in the next block. When it appears, he takes the purchased goods. Then he releases two more blocks, one containing a transaction that transfers funds to a second attackers address. Eclipse attack exploits bandwidth limitations in network, which causes that devices are not directly connected to all or many other devices. Attacker exploits this characteristic and floods the target with his own IP addresses, making a victim’s device connect only to attacker’s devices. That way, an attacker can send his victims incorrected data. Zamani et al. [40] reviewed 38 security incidents with the aim to identify their platform and root cause. The result of their research shows that only a small number of incidents is caused by protocols, insider threats and social engineering. Application vulnerability is found in 7 and server or infrastructure breach in 12, making them causes for more than half observed incidents. Application vulnerability refers to flaws and errors in functionality and most of them are a result of using smart contracts, i.e. there is a programming or logical error allowing access to other people wallets and data, theft and unauthorized money generation. For example, the Zerocoin had a programming error, allowing an attacker to generate multiple spends that were then sold and withdrawn [18]. An example of the infrastructure breach is a Coindash website hack that allowed an Etheruem user address to be changed to the hacker’s [21]. Li et al. [22] made a taxonomy of blockchain’s risks and their causes, separately analysing systems with and without smart contracts. They have enumerated 5 risks that both systems have in common (blockchain 1.0 and 2.0): 51% vulnerability, private key security, criminal activity, double spending and transaction privacy leakage and another 4 risks they have found are related to smart contracts (exclusive to blockchain 2.0). 51% vulnerability is a vulnerability in a consensus mechanism that is used to confirm new transaction blocks. In a Proof of Work consensus mechanism, when a user or a pool of users control more than half of computing resources in a network, they gain control to the entire blockchain. An attacker is then able to present his own chain as a genuine one and that enables performing the double spending attack [32]. Users are usually grouped into mining pools (AntPool.com, BW.com, NiceHash, GHash.io etc.) because it is more likely for a pool to solve a puzzle then an individual. Prize for solving
298
V. Juričić et al.
a puzzle is then distributed to all users in a pool. Proof of Stake and Delegated Proof of Stake are also vulnerable to this attack, even if a pool has less than 51% computing power. The loss from this attack in a Bitcoin network is 18 million US$ [32]. Private key in a blockchain represents user’s identity and security credential. It is not generated by a thirdparty agency, but by the user itself. It is discovered that a ECDSA algorithm has a vulnerability because it does not generate required randomness, which allows an attacker to recover the user’s private key. Because there exist no centralized trusted thirdparty organization, it is difficult to track this kind of activities. 4.2
Improving Blockchain Security
Although blockchain security weaknesses have been identified, science and technical community constantly work on solving different issues and proposing new approaches. This chapter enumerates those that successfully solve the most critical security issues or that have an important impact on other solutions. Karame et al. [19] analysed double spending attacks and discovered that techniques recommended by Bitcoin developers are not always effective. They performed a detail analysis of requirements for this attack to occur and proved that proposed techniques like introducing listening periods, promoting regular users as observers, etc. have difficulties detecting doublespent transactions in certain scenarios. They argue that it is crucial to implement an effective detection technique against this attack and proposed their own modification to the current Bitcoin implementation. They have performed an evaluation and tests that seems to show great success. Wang et al. [37] claim that blockchain can even be used to improve security in distributed networks, because the existing centralized solutions for malware detection are also vulnerable to attacks. Noyes [27] proposed an interesting antimalware solution that enables users in a blockchain to distribute virus patterns between themselves. The solution is called BitAV and the results of a performed testing showed that it can improve scanning speed and enhance tolerance to denial-of-service attacks. One of the approaches dealing with a private key risk is multisig. Multisignature scheme means that a group of signers can together produce a compact signature of a single transaction [23]. When using multisig, a transaction is allowed only when signed multiple times and can be used as an additional wallet protection. For example, a transaction is not accepted if not signed with user’s personal key and the signature of the online wallet site [29]. [29] have proposed a combination of blockchain and cloud computing. While security of storing and transmitting data in cloud computing is already studied in detail, it lacks privacy protection and anonymity. Because blockchain shows good characteristics in this area, it could be combined with cloud computing and that way upgraded to a more secure service. This has great potential in healthcare for storing patient data in an offchain storage. Blockchain transaction are relatively small and linear, and are not designed for storing large files, specially images and scans. They are then stored in a distributed database or in a cloud (offchain) and immutable hashes of this data are stored onchain guarantying their authenticity.
Blockchain as a Secure and Reliable Technology
299
Mechanisms that ensure users’ privacy are also being researched. Self-Sovereign Identity model is a successor to a usercentric approach that provides users a possibility to share their identity across different services. SSI model preserves rights for partial disclosure of someone’s data and identity and enables users to have control over their personal data. Personal data in this model is no longer stored in a raw format across different online services, enabling information about users’ actions, transactions and general communication to be anonymized. The next step in identity model development is implementing self-sovereign in blockchain [2] that, except self-sovereign mechanism, enables controlled access to users’ identities and data to anyone in a blockchain network. This includes an exchange of digital assets between users, like documents, attributes and claims, without need of a central, thirdparty authority.
5 Conclusion This paper describes the blockchain technology and its most common usage scenarios in different fields, like science, education and business. Blockchain, because of its architecture, used technology and protocols, shows some very good characteristics desirable in modern information systems and guarantees data immutability, trust and transparency in an untrusted distributed environment. It makes no special requirements on its users because its implementation is based on common protocols and technology. This also makes it vulnerable to the known security issues of those technologies, but also many authors identified the vulnerabilities in the blockchain itself, caused by issues in consensus mechanisms, cryptography algorithms and smart contracts. Each security issue has more or less successful solution and some of them are enumerated in this paper. Even though various solutions were found and tested, each of them focuses on a single problem. Little research has been made for the blockchain network behavior in conditions where more solutions are implemented simultaneously. The challenges of the blockchain technology still exist, but it has a large user, developer and science community and is very attractive nowadays, not just for research but for usage in real and complex information systems. It has been significantly improved, mature and stable, and authors generally agree that its advantages and possibilities far surpass its disadvantages.
References 1. Al-Omari, H., Al-Omari, A.: Building an e-government e-trust infrastructure. Am. J. Appl. Sci. 3(11), 2122–2130 (2006). https://doi.org/10.3844/ajassp.2006.2122.2130 2. Bernabe, J.B., Canovas, J.L., Hernandez-Ramos, J.L., Moreno, R.T., Skarmeta, A.: Privacypreserving solutions for Blockchain: review and challenges. IEEE Access 7, 164908–164940 (2019) 3. Beck, R., Avital, M., Rossi, M., Thatcher, J.: Blockchain technology in business and information systems research. Bus. Inf. Syst. Eng. 59, 381–384 (2017) 4. Bitcoin – Open source P2P money (n.d.). https://bitcoin.org/en/
300
V. Juričić et al.
5. BitInfoCharts. Bitcoin (BTC) price stats and information. BitInfoCharts (2019). https:// bitinfocharts.com/bitcoin/ 6. Blaze, M., Feigenbaum, J., Ioannidis, J., Keromytis, A.D.: The role of trust management in distributed systems security. In: Vitek, J., Jensen, C.D. (eds.) Secure Internet Programming. LNCS, vol. 1603, pp. 185–210. Springer, Heidelberg (1999). https://doi.org/10.1007/3-54048749-2_8 7. Buterin, V.: On Public and Private Blockchains (2015). https://blog.ethereum.org/2015/08/ 07/onpublic-and-private-blockchains/ 8. Cao, W., et al.: PolarFS: an ultra-low latency and failure resilient distributed file system for shared storage cloud database. Proc. VLDB Endow. 11, 1849–1862 (2018) 9. Crosby, M., Nachiappan, P., Verma, S., Kalyanaraman, V.: BlockChain technology: beyond bitcoin. Appl. Innov. Rev. 2, 7–19 (2016) 10. Dorri, A., Steger, M., Kanhere, S.S., Jurdak, R.: BlockChain: a distributed solution to automotive security and privacy. IEEE Commun. Mag. 55(12), 119–125 (2017). https://doi. org/10.1109/MCOM.2017.1700879 11. EMC: The Digital Universe of Opportunities: Rich Data and the Increasing Value of the Internet of Things. EMC (2014). https://www.emc.com/leadership/digital-universe 12. Feng, Q., He, D., Zeadally, S., Khan, K., Kumar, N.: A survey on privacy protection in blockchain system. J. Netw. Comput. Appl. 126, 45–58 (2018) 13. Gordon, W., Catalini, C.: Blockchain technology for healthcare: facilitating the transition to patient-driven interoperability. Comput. Struct. Biotechnol. J. 16, 224–230 (2018) 14. Gramoli, V.: From blockchain consensus back to Byzantine consensus. Future Generation Computer Systems (2017) 15. Hajirahimova, M., Aliyeva, A.: About big data measurement methodologies and indicators. Int. J. Modern Educ. Comput. Sci. 9, 1–9 (2017) 16. Hölbl, M., Kompara, M., Kamisalic, A., Nemec Zlatolas, L.: A systematic review of the use of blockchain in healthcare. Symmetry 10(10), 470 (2018) 17. Hoehle, H., Huff, S., Goode, S.: The role of continuous trust in information systems continuance. J. Comput. Inf. Syst. 52(4), 1–9 (2012) 18. Insom, P.: Zcoin's Zerocoin Bug explained in detail. Zcoin (2017). https://zcoin.io/zcoinszerocoin-bug-explained-in-detail/ 19. Karame, G., Androulaki, E., Capkun, S.: Double-spending fast payments in bitcoin. In: Proceedings of the ACM Conference on Computer and Communications Security, pp. 906– 917 (2012) 20. Lee Kuo Chuen, D.: Handbook of Digital Currency. 1 edn. Elsevier, Amsterdam (2015) 21. Leyden, J.: CoinDash crowdfunding hack further dents trust in cryptotrading world. The register (2017). https://www.theregister.com/2017/07/18/coindash_hack/ 22. Li, X., Peng, J., Chen, T., Luo, X., Wen, Q.: A survey on the security of blockchain systems. Future Generation Computer Systems (2017) 23. Ma, C., Jiang, M.: Practical lattice-based multisignature schemes for blockchains. IEEE Access 7, 179765–179778 (2019). https://doi.org/10.1109/ACCESS.2019.2958816 24. Mcmillan, R.: $1.2M Hack Shows Why You Should Never Store Bitcoins on the Internet. Wired (2013). https://www.wired.com/2013/11/inputs/ 25. Nakamoto, S.: Bitcoin: A Peer-to-Peer Electronic Cash System. Bitcoin.org (2009). https:// bitcoin.org/bitcoin.pdf 26. Narayan, S., Chandy, J.: Parity redundancy in a clustered storage system. In: 4th International Workshop on Storage Network Architecture and Parallel I/Os SNAPI, pp. 17– 24 (2007) 27. Noyes, C.: Bitav: fast anti-malware by distributed blockchain consensus and feedforward scanning (2016). arXiv preprint arXiv:1601.01405
Blockchain as a Secure and Reliable Technology
301
28. Parity Technologies. A Postmortem on the Parity Multi-Sig Library Self-Destruct. Parity (2017). https://www.parity.io/a-postmortem-on-the-parity-multi-sig-library-self-destruct/ 29. Park, J., Park, J.: Blockchain security in cloud computing: use cases, challenges, and solutions. Symmetry. 9, 164 (2017) 30. Pavlou, P.A., Fygenson, M.: Understanding and predicting electronic commerce adoption: an extension of the theory of planned behavior. MIS Q. 30(1), 115–143 (2006) 31. Santos, M.: Not even the top 500 supercomputers combined are more powerful than the Bitcoin network. 99 Bitcoins (2019). https://99bitcoins.com/not-even-the-top-500supercomputers-combined-are-more-powerful-than-the-bitcoin-network/ 32. Sayeed, S., Marco-Gisbert, H.: Assessing blockchain consensus and security mechanisms against the 51% attack. Appl. Sci. 9(9), 1788 (2019) 33. Sengupta, J., Ruj, S., Dasbit, S.: A comprehensive survey on attacks, security issues and blockchain solutions for IoT and IIoT. J. Netw. Comput. App. (2019) 34. SINTEF. Big Data, for better or worse: 90% of world's data generated over last two years. ScienceDaily (2013). www.sciencedaily.com/releases/2013/05/130522085217.htm 35. Sun, H., Wang, X., Wang, X.: Application of blockchain technology in online education. Int. J. Emerg. Technol. Learn. (iJET). 13, 252 (2018) 36. Taş, R., Tanrıöver, Ö.Ö.: A systematic review of challenges and opportunities of blockchain for e-voting. Symmetry 12(8), 1328 (2020). https://doi.org/10.3390/sym12081328 37. Wang, H., Zheng, Z., Xie, S., Dai, H., Chen, X.: Blockchain challenges and opportunities: a survey. Int. J. Web Grid Serv. 14, 352–375 (2018) 38. Wu, J., Tran, N.: Application of blockchain technology in sustainable energy systems: an overview. Sustainability 10, 3067 (2018) 39. Yaga, D., Mell, P., Roby, N., Scarfone, K.: Blockchain Technology Overview. National Institute of Standards and Technology Internal Report 8202 (2019) 40. Zamani, E., He, Y., Phillips, M.: On the security risks of the blockchain. J. Comput. Inf. Syst. (2018) 41. Zhao, W.: $7 Million Lost in CoinDesk ICO Hack. Coindesk (2017). https://www.coindesk. com/7-million-ico-hack-results-coindash-refund-offer 42. Zikratov, I., Kuzmin, A., Akimenko, V., Niculichev, V., Yalansky, L.: Ensuring data integrity using blockchain technology. In: 2017 20th Conference of Open Innovations Association (FRUCT), pp. 534–539 (2017) 43. Zyskind, G., Nathan, O.: Decentralizing privacy: using blockchain to protect personal data. In: 2015 IEEE Security and Privacy Workshops, pp. 180–184. IEEE (2015)
iMedMS: An IoT Based Intelligent Medication Monitoring System for Elderly Healthcare Khalid Ibn Zinnah Apu, Mohammed Moshiul Hoque(B) , and Iqbal H. Sarker Department of Computer Science and Engineering, Chittagong University of Engineering and Technology (CUET), Chittagong 4349, Bangladesh {khalidex,mmoshiul 240,iqbal}@cuet.ac.bd
Abstract. In recent years, the ageing population is growing swiftly and ensuring proper healthcare for the elderly and physically challenged people has gained much attention from academic, medical or industrial experts. Many older people undergo sickness or inability, causing it challenging to look out of themselves concerning timely medicine taking. Any trivial ignorance, such as ignoring to take medications or taking shots at the wrong schedule, may cause potentially disastrous problems. This paper presents an intelligent medication system (called ‘iMedMS’) using IoT technology to monitor whether the patient is taking medicine according to the physician’s prescription and schedule. The system includes an alert system that notifies the patient about the exact time of medication and a feedback system that send SMS to the patient, physician or caregiver while any pillbox is empty. Moreover, the system embedded seven physical buttons to express the patient’s various feelings after taking medicine. Several experimental results show the functionality of the proposed medication system. Keywords: Health informatics · Internet of Things · Intelligent medication monitoring · Elderly healthcare · Medication remainder
1
Introduction
The proportion of older adults living alone has increased dramatically in recent years. Many of them undergo various illnesses or disabilities, causing it challenging to manage their healthcare. There are many diseases where taking medicine at a scheduled time has been considered of utmost importance. Any trivial unconsciousness, such as disregarding to take medications or using medications at the unscheduled time, may cause disastrous difficulties. Elders are more affected by the timing of taking a particular drug than others; in order to prevent any dysfunction or illness, appropriate timing is an obligation [2,3]. According to WebMD research, about 46% of people forget to take their medication at a scheduled time, where most older adults are in total proportion. Moreover, the c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 302–313, 2022. https://doi.org/10.1007/978-3-030-93247-3_30
iMedMS: An IoT Based Intelligent Medication Monitoring System
303
elderly are usually prescribed various drugs that need to be taken at specific times. Maintaining track of taking the correct drugs at a specific time each day can become an arduous practice for the elderly, as it is not as apparent as it could be for a younger individual. Older people usually suffer from sight, memory or logical capabilities, decreasing proportionately with age. Thus, it is very challenging for older people to remember which pill to take at which time. A human caregiver or nurse may be employed to care or monitor the older patient, but past surveys resulted in the case of the nurse who provided a patient with a person with paralysis in place of an antacid that was commanded by the physician, causing the patient’s demise [4]. It is also challenging for physicians or caregivers to know the up-to-theminute information about a patient. Therefore, IoT based patient’s medication monitoring system can be a helpful solution for older people. IoT is considered a practical solution in health monitoring to track any patient health status in real-time. It facilitates that the individual’s affluence physiological data is secured inside the cloud and stays in the clinics lessened for routine checkups. Moreover, the patient’s health can be observed and disease diagnosed by any physician remotely. In highly infectious diseases such as covid-19, it is always a more reliable idea to monitor affected patients using remote health monitoring technology. Thus, the IoT based device or technique is the best option for the patient, caregiver and physician [5]. Several medicine containers or pillboxes are available in the market, but most of them have restricted usage and do not adapt for the elderly due to their extended size, high-priced, and user-friendliness. Therefore, it is essential to receive the specified feedback from the patient after consuming a particular drug. This work proposes an intelligent medication intake monitoring system for the elderly and impaired users to manage their scheduled medicine effectively. The proposed system is easy to use, portable and cost-effective. The specific contributions are highlighting in the following: – Develop an intelligent medication monitoring system using IoT devices and cloud-based technology. – Develop a software interface to interact with the medication system incorporating medicine intake reminder module, notification generation module, and patient’s feedback module. – Investigate the functionality of the proposed system using quantitative data collected from patients and doctors.
2
Related Work
Developing an intelligent medication monitoring system or smart pillbox is rare. Kiruthiga et al. [1] developed an IoT based medication system that can observe whether the proper amount of medicine is received by the patient or not at the scheduled time. This system cannot inform the status of medicine amount, not portable and deficiency to share the patient’s feeling if any complicacy is occurred after intaking a particular pill. Some works presented a magic pillbox
304
K. I. Z. Apu et al.
[6,7] that promotes both the monitoring and physiological features of a patient’s healthcare. Nevertheless, it merely informs the patient to accept pills according to her/his vital symptoms without extensive schedules, and this system can not use to a circumstance when the patient is away from the residence. David et al. [8] presents a medicine dispensing scheme comprising a pill closet and dispenses pill dosages system for single and multiple patients, but it provides limited healthmonitoring services. Chen et al. [9] developed a micro-controller based pillbox that can deliver a pill at a pre-defined time. However, this system did not provide any means to save time regarding patient’s pill intake. Huang et al. [10] proposed a smart pillbox based on the pill case technique, which supplies a pill from the pill case at a scheduled time. However, the proper functioning of this system solely depends on the availability of Internet connectivity. A design of an intelligent drug box proposed in [11] that can remind the patient regarding his/her schedule of taking a medication. Minaam et al. [12] proposed a smart pillbox containing nine distinct sub-boxes as a medicine reservoir. However, this box worked for nine distinct pills only, and the provision of the patient’s feedback sharing is absent. Dhukaram et al. [13] demonstrate a pill management system containing different kinds of medication schedules, but this system did not generate any notification or feeling sharing components after taking a pill. Wang et al. [14] presented a smartphone-based medication intake remainder system that can alert the patient if he/she forgot to take a scheduled pill. However, this system cannot notify the doctor about the patient status and compatibility issues with a particular medicine. Features to design medication reminder apps have proposed in [15]. This system generates too many short messages, which may be challenging to follow and lack to share any feeling if any complicacy arises after intaking a pill. An IoT-based health monitoring system used numerous wearable sensors to capture various physiological data and share them with doctors and patients’ relatives [16]. However, this system uses remote monitoring that is costly, and wearing so many sensors are not practical and increase uncomfortableness. Bansal et al. [17] used ZigBee transceiver to transfer physiological information within a small range. However, this system did not provide any feedback module. An android-based solution was proposed by Lakshmanachari et al. [18] which, gathers data from the patients and send it to the data centre for further processing. Nevertheless, the proposed system does not contain any pill management module or alert system. Haider et al. [19] proposes a medicine planner which can fill pills automatically into the pillbox. However, the patient feedback system and monitoring functions are missing in this system. The proposed medication intake monitoring system develops a pill container with an automatic cloud-based pill scheduling and patient feeling-sharing facility to address the previous method’s shortcomings.
3
Proposed System Architecture
This work aims to develop a tool that will allow the owner to track each pill to ingest naturally and easily. No unusual training or detailed knowledge is required
iMedMS: An IoT Based Intelligent Medication Monitoring System
305
to run the tool. This device can log the pill name, time to be used, the actual time of taking a pill. The tool is connected wirelessly through a cloud framework to record data and manage patient’s medication intake monitoring. Figure 1 shows an overview of the proposed medicine intaking monitoring system. The proposed architecture comprises three main modules: (i) prescription generation, (ii) pill management and (iii) feelings sharing.
Fig. 1. Proposed architecture for IOT based medication system
Each of the modules interacts with the cloud-based system so that every interaction will be updated in real-time. Each user tag with a unique identification number for tracking him/her. 3.1
Smart Prescription Generation Module
Pre-processing prescriptions into the desired format is essential as the data is used to implement the IoT system. Medicine name, dose, duration, writes and saves in the cloud database system to interact with the Arduino-Uno. At the bottom of the prescription, various remarks are attached so that the doctor can suggest issues of the patients’ relative health. The prescription can be implemented on a digital pill dispenser system and formatted in javascript (metadata) for further processing. Figure 2 shows an interface of the prescription generation module in which a doctor can write the name and dose of a specific drug. For
306
K. I. Z. Apu et al.
example, doctor’s input 1+1+1 denotes 3 times a day (morning: 9.00 am, noon: 2.00 pm and night: 10.00 pm) and 0 indicates no medicine will require at that specific time.
Fig. 2. Cloud based IoT integrated prescription interface
The prescription and patient data are stored on the cloud server in JSON format. The medicine parameter (M) can represent as the function (Eq. 1), and the prescription for JSON denotes by the symbol P. To generate the prescription, a function parameter for each medicine is defined by a doctor pre-processed by the function M1 , M2 , ..., Mn where the subscript n denotes the medication number. F (M [C])||F (D)||F (E) (1) M1 = F (i T ) here, F(M[C]) denotes a function to define medicine category (i.e., capsule or tablet), F(D) represents a function to specify medicine intake duration in the number of days, F(E) denotes a function for instruction to take medicine (i.e., take in empty or full stomach), F (iT ) indicates a function to define interval time of taking a dosage of the medicine (i,e., after 8 h or 3 times a day). 3.2
Medicine Management Module
The medicine management system is the integration of hardware and software. Various hardware compositions are used to develop the medication management module. A GSM module (SIM900) establishes a connection between the device and the prescription server. The system sends data to the Arduino Mega 2560 embedded on ATmega2560 CMOS 8-bit microcontroller (MC). This MC contains 54 digital input/output pins; 15 are used as PWM outputs with 16 combinations of inputs, 4 UARTs and a 16 MHz crystal built-in oscillator, a USB power cable jack, an ICSP header, and there is also a reset switch [14]. Figure 3 depicts significant components of the medicine management module. The IR sensor components contain four components: IR emitter, IR receiver, pill compartment, and empty pill detection unit. The input of the sensor emitter (pill-taking schedule provided by the cloud) is given to the IR receiver. The empty detection unit checks whether the compartment is empty or not. If it is found empty, an SMS will send to the relatives’ phone via the GSM module. If the
iMedMS: An IoT Based Intelligent Medication Monitoring System
307
Fig. 3. Components of medicine management module
pill compartment is not empty, then a medicine intake alert is generated by the system. Figure 4 shows the developed module includes four main constituents: pillbox, SIM 900 module, Arduino, and feeling sharing.
Fig. 4. Hardware implementation of smart pill box system
3.3
Medicine Intake Alert Module
The alert system takes input from the prescription unit via the cloud-server through SIM900 and GSM modules. When a medication intake schedule activates, the device communicates with the server to activate the buzzer with an
308
K. I. Z. Apu et al.
LED glowing. Thus, the patient is aware of the pill-taking time. When the alarm system gets activated, the specific compartment door is triggered automatically by using a step motor governed by the Arduino [12]. The medication management system fetches the server’s data and activates the alert system with a flashing LED according to the scheduled medicine intake time. The alarm activation depends on the time, quantity, and medicine type, which can stimulate as in Eq. 2. (P [qty])||L(S) (2) O(A) = D(T ) here, O(A) denotes the buzzer alert, (M [qty]) represents the pill quantity (amount of doges), L(S) indicates LED status on (1)/off (0), and D(T ) indicates pill taking time. For example, O(A) = 11 indicates the alert activated for 1st row-1st column pill to be taken from compartment and buzzer activated, and O(A) = 21 indicates the alert activated for 2nd row-1st column with buzzer alarm. 3.4
Notification Generation Module
If a specific drug is finished, the medicine box door automatically closes and alerts to indicate the medicine stock is empty. When an alarm is triggered, a notification is sent to the patient through SMS using SIM900 to reminds medicine intaking time. The system notifies the doctor about the dose and pill after taking the medication by the patient based on the predefined rule such as “If the patient X NOT TAKEN a medicine at Day Y Dose A, then send”. Figure 5 shows a sample snapshot of notification through SMS. 3.5
Feeling Sharing Module
A patient may feel complications or discomfort after taking a medication. Thus, the device includes a feeling sharing module to share the patient’s various feelings with the doctor or caregiver after receiving a prescriptive medicine. The device embedded seven push buttons to convey seven kinds of feelings to the patient. The following patient’s feelings are stored and send to the doctor for taking necessary measures. – Neutral: button can use while the patient is not feeling any change in his/her health condition after medication. – Good: This button can use for the positive change in the patient’s health condition. – Better: while a patient is on the path to cure, this button will use. – Pain: if the patient feels an ache in any body part, then this button can use – Headache: button can be utilized when the patient feels severe headaches. – Vomiting: button can push in the case of nausea. – Major complication: If the patient is in a dire situation or severe health condition, this button is functional.
iMedMS: An IoT Based Intelligent Medication Monitoring System
309
Fig. 5. Sample SMS notification by iMedMS
4
Experiments
The developed iMedMS investigates the efficiency and accuracy of the cloudbased prescription module and intelligent IoT pill-management system. We used Linus CentOS 7, CPU 8 cores Xeon Processor, 1 Gbps dedicated connection, including load balancing to implement the cloud server. The doctor prepares prescriptions through the prescription generation module using the cloud server. The medication box interacts with the cloud system through SIM 900 module and Arduino. A buzzer connected to the Arduino starts to ring at a scheduled time of pill taking while receiving a cloud server command. The system notifies the doctor and caregiver if the patient has taken medicine, and the corresponding data is stored. The system also stored data related to any feelings shared by the patient. The system can also detect and notify if any of the desired pillboxes is empty. The doctor initiates the process by completing prescriptions in a cloud-based system. Then the user device (iMedMD) will automatically communicate with the cloud server and would be prepared for alert generation. While the patient takes the pill, this data will be sent to the cloud server via GSM (SIM900)GPRS modules. After taking a pill, feelings sharing data will be collected from the patient to the server. 4.1
Data Collection
The performance of iMedMS is analyzed after setting up the prescription in the cloud system concerning the medication schedule, buzzer alarm, LED flashing,
310
K. I. Z. Apu et al.
and patient’s feeling sharing responses. The iMedMS experimented and simulated for interacting 76 patients and 11 doctors. A total of 8567 medication trials is carried out over 14 days in 228 prescriptions, having three medicine instances in 4 doges per day. Table 1 shows the summary of the collected data from iMedMS. Table 1. Summary of collected data Attributes
Values
Number of medicine 372 Number of doges
8567
Feedback received
1750
Total patient
76
Total prescription
228
Total doctors
11
Accuracy = 4.2
N o. of correct events × 100% T otal N o. of events
(3)
Results
To evaluate the proposed system’s performance, the success and failure ratio is measured in three functionalities: (i) prescription analysis and dose detection, (ii) buzzer and light interaction, and (iii) feedback propagate to the doctor via cloud system. Equations 2–5 are used to measure the success ratio (SR) and failure ratio (FR). SR =
N o. of correct events × 100% T otal N o. of events
(4)
N o. of undetected events × 100% (5) T otal no. events The results indicate that the proposed system achieved a success rate of 93.75%, whereas it failed to perform its functionality in 6.25% cases. The errors occurred due to the communication failures between IoT devices and the cloud server in a few cases. Table 2 illustrates the result various measures (SR and error rate (ErrR)) in performing three functionalities based on experimental simulation. In Prescription analysis and dosage detection, error rate is calculated based on failure of prescription text processing and fail to separate the medicate doges indicated by doctor. In function (ii), error rate is calculated based on whether buzzer alarmlight interacts within the specified medication doges time or not, in (iii) error rate is calculated based on feedback received by the doctor via cloud system. FR =
iMedMS: An IoT Based Intelligent Medication Monitoring System
311
Table 2. Performance on different functions Functionality
SR (%) ErrR (%)
(i) Prescription analysis and dosage detection
95.50
(ii) Buzzer and light interaction
93.80
2.50
(iii) Feedback propagate to doctor via cloud system 91.20
1.75
3.10
Additionally, execution time and power consumption of the system were also measured. The execution time of the system refers to the time requires (in ms) for triggering various events (such as run task, layout, function call and so on) in the cloud system. The event utilization time also shows IoT communication breakdown of data processing queue path in a cloud environment where CPU utilization represents the IoT communication system’s bandwidth. The parsing data (31.3%) and run task (31.7%) utilized the top portion (63% in total) of the total run time events. Figure 6 shows the IoT communication with the server in the waterfall model. In communication with server JS, markup, CSS, font, and other assets are processed as a waterfall model where CPU utilization and execution bandwidth have been visualized in time and bar graphs.
Fig. 6. IoT communication with the cloud
Power consumption varies in a different state while sensor nodes are activated. The device consumes the lowest amount of power while it is in an idle state. The system needs a certain amount of power to be kept active, consumed in the idle mode for running the system. The sensor needs additional power to sense, process and transmit data. The power consumption increases with the
312
K. I. Z. Apu et al.
increased number of processing. Thus, the total power consumption (Pc ) by the system can be calculated using Eq. 6. Pc = Pi + P s + P p + P t
(6)
The total power consumed by the system is 6.451 W. The highest power consumption occurred for data transmission (Pt ) amount to 2.686 W, where the idle state (Pi ) consumed 1.136 W, the sensing state (Ps ) consumed 1.166 W, and 1.436 W was consumed for processing (Pp ).
5
Conclusion
This paper developed an IoT-based intelligent medication intake motoring system called iMedMS. The developed system can generate a cloud-based prescription indicating the medicine name, schedule, and dosage amount. An alert system through a buzzer and SMS are generated to remind the patient’s medication schedule. The iMedMS also embedded a feeling sharing module to express seven kinds of feelings if any discomfort or complicacy occurs after taking any medicine. The cloud-server framework manages all communication between the patient and the system in which relevant data can be recorded for further investigation. Although the proposed framework is inexpensive and automatic, some functionality should include to improve performance. The whole system can be implemented on mobile apps and its interface to the cloud for enhancing portability. Integration of iMedMS with computerized medical and private health records can be actualized to ensure real-time motoring. The size of iMedS may be curtailed to be realistic and manageable. Moreover, a plan of voice interplays can be incorporated for ease or effortless operation.
References 1. Kiruthiga, S., Arunthadhi, B., Deepthyvarma, R., Divya Shree, V.R.: IOT based medication monitoring system for independently living patient. Int. J. Eng. Adv. Technol. 9(3), 3533–3539 (2020) 2. Sankar, A.P., Nevedal, D.C., Neufeld, S., Luborsky, M.R.: What is a missed dose? Implications for construct validity and patient adherence. AIDS Care 19(6), 775– 780 (2007) 3. Sawand, A., Djahel, S., Zhang, Z., Na¨ıt-Abdesselam, F.: Multidisciplinary approaches to achieving efficient and trustworthy eHealth monitoring systems. In: Proceedings of IEEE/CIC International Conference on Communications in China (ICCC), Shanghai, China, pp. 187–192 (2014) 4. Chelvam, Y.K., Zamin, N.: M3DITRACK3R: a design of an automated patient tracking and medicine dispensing mobile robot for senior citizens. In: Proceedings of 2014 International Conference on I4CT, Langkawi, Malaysia, pp. 36–41 (2014) 5. Almotiri, S.H., Khan, M.A., Alghamdi, M.A.: Mobile health (m- health) system in the context of IoT. In: Proceedings of 2016 IEEE 4th International Conference on FiCloudW, Vienna, Austria, pp. 39–42, August 2016
iMedMS: An IoT Based Intelligent Medication Monitoring System
313
6. Wan, D.: Magic medicine cabinet: a situated portal for consumer healthcare. In: Gellersen, H.-W. (ed.) HUC 1999. LNCS, vol. 1707, pp. 352–355. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48157-5 44 7. Wan, D., Gershman, A.V.: Online Medicine Cabinet, US Patent. no. US6539281B2 (2003) 8. Haitin, D., Asseo, G.: Medication Dispensing System including Medicine Cabinet and Tray Therefor. US Patent. no. US20040054436A1 (2006) 9. Chen, B.-B., Ma, Y.-H., Xu, J.-Li.: Research and implementation of an intelligent medicine box. In: Proceedings of 2019 4th International Conference on IGBSG, Hubei, China, pp. 203–205 (2019) 10. Huang, S.-C., Chang, H.-Y., Jhu, Y.-C., Chen, G.-Y.: The intelligent pill box-design and implementation. In: Proceedings of 2014 IEEE International Conference on Consumer Electronics, Taiwan, Taipei, pp. 235–236 (2014) 11. Shashank, S., Tejas, K., Pushpak, P., Rohit, B.: A smart pill box with remind and consumption using IOT. Int. Res. J. Eng. Technol. 4(12), 152–154 (2017) 12. Minaam, D.S.A., Abd-ELfattah, M.: Smart drugs: improving healthcare using smart pill box for medicine reminder and monitoring system. Future Comput. Inf. J. 3(2), 443–456 (2018) 13. Dhukaram, A.V., Baber, C.: Elderly cardiac patients’ medication management: patient day-to-day needs and review of medication management system. In: Proceedings of IEEE ICHI, Philadelphia, PA, USA, pp. 107–114 (2013) 14. Wang, M.Y., Tsai, P.H., Liu, J.W.S., Zao, J.K.: Wedjat: a mobile phone based medicine in-take reminder and monitor. In: Proceedings of 2009 9th IEEE International Conference on BIBE, Taichung, Taiwan, pp. 423–430 (2009) 15. Stawarz, K., Cox, A.L., Blandford, A.: Don’t forget your pill! designing effective medication reminder apps that support users’ daily routines. In: Proceedings of SIGCHI Conference on Human Factors in Computing Systems, Toronto, Canada, pp. 2269–2278 (2014) 16. Valsalan, P., Baomar, T.A.B., Baabood, A.H.O.: IOT based health monitoring system. J. Crit. Rev. 7(4), 739–742 (2020) 17. Bansal, M., Gandhi, B.: IoT based smart health care system using CNT electrodes. In: Proceedings of 2017 ICCCA, Greater Noida, India, pp. 1324–1329 (2017) 18. Lakshmanachari, S., Srihari, C., Sudhakar, A., Nalajala, P.: Design and implementation of cloud based patient health care monitoring systems using IoT. In: Proceedings of 2017 ICECDS, Chennai, India, pp. 3713–3717 (2017) 19. Haider, A.J., Sharshani, S.M., Sheraim, H.S., et al.: Smart medicine planner for visually impaired people. In: Proceedings of ICIoT, Doha, Qatar, pp. 361–366 (2020)
Internet Banking and Bank Investment Decision: Mediating Role of Customer Satisfaction and Employee Satisfaction Jean Baptiste Bernard Pea-Assounga(&) and Mengyun Wu School of Finance and Economics, Jiangsu University, 301 Xuefu Road, Zhenjiang 212013, China {aspeajeanbaptiste,mewu}@ujs.edu.cn
Abstract. This study looked into the effects of internet banking, staff, and customer satisfaction on bank investment decisions using certain Congolese banks. The ten surveyed banks were administered a total of 1800 questionnaire items, out of which 1500 representing 83.33% were considered. Using SPSS, SmartPLS, and Stata statistical software, the data was analyzed employing percentages of respondents, correlation analysis, and the System of Regression Equations approach. The overall findings have shown that internet banking and bank investment decisions have had a positive effect, and also customer satisfaction and employee satisfaction partially mediate the nexus between internet banking and bank investment decision, reinforcing prior studies and leading to generalization. According to the findings, the resulting recommendations are suggested to both individual and institutional investors: investors must be told that several factors can influence their investment decision-making, including customer satisfaction and employee satisfaction, and they should be aware of these factors. Keywords: Bank investment decision Customer satisfaction satisfaction Internet banking Republic of Congo
Employee
1 Introduction The internet and advanced innovations have influenced new business models and bank investment decisions [1, 2]. Internet banking is defined as a channel of banking that lets consumers perform an extensive range of nonfinancial and financial services via a website of bank [2, 3]. Numerous banks experience the deployment of the internet banking system by attempting to decrease cost whereas enhancing customer services [4, 5]. Despite the potential benefits that consumers may gain from internet banking, its implementation in a firm is limited and does not always meet expectations [5, 6]. To be innovative, organizations must effectively use and obtain input from many resources, including human capital, customers, etc. [7]. Traditional investment theory dictated company investment by fluctuations in interest rates. Lower market rates reduce capital expenditures and increase beneficial investment projects. However, research from several countries shows that investment decisions are mainly based on rules of thumb rather than standard financial models [8]. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 314–330, 2022. https://doi.org/10.1007/978-3-030-93247-3_31
Internet Banking and Bank Investment Decision: Mediating Role
315
In the practical aspect, this may show that the investments choices are less delicate to loan cost changes as suggested by the traditional theory of investment. Also, Hjelseth, Meyer, and Walle [9] stated that investment decisions are frequently based on reliable criteria that disregard financing costs. A common financial assumption is that firms make investment decisions based on a required profit rate that is influenced by the costs of capital. A lower interest rate should raise the frequency of beneficial investments. From some papers we have read, many researchers are focusing only on the impacts of internet banking on bank performance [4, 5], which does not include employee satisfaction, customer satisfaction, and bank investment decision variables. Therefore, this study seeks to fill the gap by investigating the mediating role of Customer Satisfaction (CSAT) and Employee Satisfaction (ES) in the nexus Internet Banking (IB) and Bank Investment Decision (BID) in the Congolese context. Firms and individuals must make three types of financial decisions namely capital, dividend, and investment decisions. Making investment decisions is a core duty for many companies, notably those in the financial sector. Choosing the correct mix of short-term and long-term investments enhances an organization’s overall revenues. The increased use and frequent changes of technology and stakeholders’ demands have made these decisions more complex for financial and banking organizations. A study of this rising issue in the Republic of Congo will enable banks better understand the complexities of financial investment decisions as well as how to boost their total profit through the right investment decisions. The effects of not carrying out this research can be tremendous on banks, government, and the other stakeholders of the banks as well as academicians. For instance, banks stand to lose a large amount of profit, through a couple of wrong decisions. This could deny banks the funds needed to propel growth and expansions. Also, the consequence of making the wrong decisions could deny the stakeholders the opportunity to maximize their dividends and therefore could have an indirect impact on the banks’ dividend decisions. In addition, the traditional financial theory posits that people make rational decisions, but at the same time some people do irrational decisions as well that affect their future [10]. Furthermore, the customers and the employees of the banks risk losing service quality and good working conditions respectively if banks continuously make wrong investment decisions through their lack of understanding of how the growing technology in the industry and stakeholder demands affect investment decisions [11]. The objective of this research is therefore to address the question of how banks in the Republic of Congo are supposed to make investment decisions in a growing competitive global financial industry, which is mostly driven by changing technology and stakeholders’ satisfaction or demands. In this paper, we examined how internet banking can favorably affect banks’ investment decisions by using SmartPLS-3, Stata, and SPSS to analyze data from a survey of banking institutions in the Republic of Congo. This study contributes to banks’ management and strategic innovations theory and practice. The study also contributes to banking literature and practice in two ways. 1) Better knowledge of how banking innovation services affect management and marketing concepts like BID, ES, and CSAT. 2) Showing the role of ES and CSAT in the relationship between IB and bank investment decisions. This research will help bank
316
J. B. B. Pea-Assounga and M. Wu
and organization managers to develop their markets by boosting their technology and innovations services. Lastly, the findings of this research are vital in developing countries as they will encourage politicians and bank management to pursue policies that foster technology and innovation in the banking sector, finally benefiting the overall financial system. The study is organized as the first section of this paper provides background information and a brief introduction. The second section discusses the research literature. The third part describes the data source, sample, and analytical model used. The study’s practical implications, limitations, and recommendations for further research and conclusion are discussed in the last section of the paper.
2 Review of the Literature and Formulation of Hypotheses The review of the literature and hypotheses formulation are discussed in this part. 2.1
Literature Review and Conceptual Framework
According to scholars, human capital (employees) and customers are the most important sources of competitive advantage [7]. Using human capital is frequently more significant than having it for esteem creation [12], execution [13], and risk recovery [7]. An investment is a process of putting your money into various financial resources or foundations with uncertain prospective returns. Nandini [14] defined investment as a financial commitment with favorable returns. Investment returns might be in the form of financial gains, regular income, or a combination of both. The daily investment decisions made by individuals or institutions such as banks determine tomorrow’s losses or profits [14]. However, not every investment pays off since investors are not always rational [10]. Innovation technology, stakeholders’ satisfaction, knowledge, and judgment all affect bank investment decisions. To invest may have significant effects on the future of banks if they know when, where, and how to invest. Essentially, everyone makes investments at some point in their lives, whether it is saving or depositing money in a bank, buying stocks, insurance, equipment, or building the infrastructures, etc. Nevertheless, every investment entails risks as well [14]. Antony and Joseph [15] described investment decisions as a psychological procedure since organizations and individuals make decisions based on available options. Financial experts commonly utilize basic, specialized, and judgmental investment investigations. Technical, fundamental, and instinctive analyses all rely on established financial hypotheses that are in line with rationality [16]. A study by Flor and Hansen [17] stated that Technological advances impact a firm’s investment decision, as they affect the investment cost. The empirical literature on this subject is extensive, as business investment has long been a focus of financial research. Corporate investments and CEO confidence, efficiency and investment management forecast [18], and volatility and investment [19]. Unlike these studies, we use the key marketing variables namely, employees’ and customers’ satisfaction, and analyze their effects on the companies’ investment decisions (see Fig. 1).
Internet Banking and Bank Investment Decision: Mediating Role
317
Customer Satisfaction (CSAT) H4
H2
Internet Banking (IB)
Bank Investment Decision (BID)
H1
H3
Employee Satisfaction (ES)
H5
Fig. 1. The study’s conceptual framework
2.2
Study Hypotheses Development
Relationship Between Internet Banking and Bank Investment Decision The marginal benefits of technology continue to outweigh the marginal costs. Hu and Xie [20] recognized the value of internet banking in growing the productivity, efficiency, and profitability of the banking industry. Also, some studies suggest a link between innovation technology and Investment decisions. For instance, Turedi and Zhu [21] observed a positive moderating effect of IT decision-making structure mechanisms on the IT investment–organization performance relationship. Božić and Botrić [22] have argued that the decision to invest adequately in innovation is to a great degree of complexity, on the one hand, due to insufficient resources, and on the other, due to the different innovation directions that businesses have to choose from. In the same sense, per previous studies, they also show that a firm’s decision to invest in innovation (R&D) increases with its size, market share, and diversification, and with demand-pull and technology push forces [23]. Based on the foregoing explanation, we hypothesized that; H1: Internet banking has a significant positive effect on bank investment decisions. Internet Banking and Customer Satisfaction Nazaritehrani and Mashali [24] investigated “Development of E-banking channels and market share in underdeveloped countries”. Their findings show that internet banking lowers bank-operating expenses and boosts consumer satisfaction and retention. Similarly, in the banking sector, the availability of internet banking services and userfriendliness appear to be correlated with high customer satisfaction and retention. In internet banking, there is a considerable relationship between e-customer satisfaction and loyalty. Moreover, another work by Rahi, Ghani, and Ngah [2] suggests that numerous banks have implemented IB to decrease costs whilst enhancing customers’ services. H2: There is a positive significant relationship between internet banking and customer satisfaction.
318
J. B. B. Pea-Assounga and M. Wu
Internet Banking, and Employee Satisfaction As technology advances, so do citizens’ expectations of banking services. While some people still prefer traditional banking, new technology has had a favorable impact on banking. People prefer using ATMs and posing devices for shopping to waiting in bank queues. With the advent of new technology like mobile banking, the internet, and ATMs, bank branches are getting quieter and employees are less stressed. Overall, technology influences employee satisfaction positively [25]. Here are several e-banking services that influence banking and job satisfaction. Also, Hammoud, Bizri, and El Baba [26] argued that technology and innovation have significantly strengthened the banking system. H3: Internet banking has a significant positive effect on employee satisfaction. Customer Satisfaction, and Bank Investment Decision How does customer satisfaction contribute to more capital investment? Firstly, as customer satisfaction includes both new consumer perceptions and experiences of the quality of a business’s services and products, high customer satisfaction can lead to highly predictable revenue flows and prospective chances for growth. Customers tend to buy more from companies with which they are more engaged. Fornell et al. [27] illustrate that customer satisfaction results in high clients expenditure and potential demand. As a consequence, companies with high customer satisfaction can generate more income. Secondly, a steady and loyal client base is formed by customer satisfaction [28], which reduces cash flow volatility, and future capital costs as well as enhancing consumer loyalty, improving the company’s reputation, lowering transaction costs, and increasing workforce efficiency and productivity. Consequently, great customer satisfaction, regarding neoclassical investment theory, encourages enterprises to invest more in resources. Vo et al. [29] found that enterprises with higher consumer satisfaction will spend more on future capital expenditures. Overall, this study suggests that consumer satisfaction influences a firm’s investment policy. H4: There is a positive significant relationship between CSAT and Bank investment decisions. Employee Satisfaction and Bank Investment Decision The human resource is regarded as an organization’s most valuable asset. Also, employee motivation and job satisfaction play a big role in employees’ performance. In other words, employees’ contentment with their jobs is critical to their performance. Research by Bai et al. [30] demonstrates that restricting businesses’ power to fire employees has two opposed effects on investment. The protection from wrongful termination and the fear of being fired might help employees to focus on their tasks, take creative risks, and develop skills that benefit their current employment. These impacts may lead to greater profitability and new or more desirable opportunities, resulting in higher investments. Increasing the cost of labor transition reduces resource spending as organizations prepare for increasing irreversibility investments. H5: Employee satisfaction has a positive significant relationship with Bank investment decisions.
Internet Banking and Bank Investment Decision: Mediating Role
319
The Mediating Roles of CSAT and ES on the Nexus Between IB and BID Economic theory states that the expected return on investment (ROI) and the cost of capital (COC) drive a company’s investment decisions. Tobin’s famous Q theory compares the company’s ratio Q (the marginal market value of assets unit) with the marginal cost of investment. As Tobin’s Q incorporates both expected cash flows and capital costs, organizations can spend more when there is tremendous growth potential or low costs. Sorescu and Sorescu [31] show that investing in companies with great customer satisfaction generates financial returns with low risk. Similarly, Merrin et al. [32] asserted that consumer satisfaction can be used to gauge stock price sentiment. These banks are giving creative solutions to aid their customers and employees with scarce resources in reducing operating costs, risks, and inefficiencies while enhancing customer satisfaction and employee productivity [33]. Given the theoretical underpinnings and empirical evidence, and the fact that internet banking may be necessary for staff creativity, customers, and investment decisions, we predict that ES and CSAT mediate the relationship between IB and BID. The following hypotheses will be investigated in light of this assumption: H7: Customer satisfaction would mediate the relationship between IB and Bank investment decisions. H8: Employee satisfaction can mediate the relationship between IB and Bank investment decisions.
3 Data and Methodology This research employs quantitative tools to collect and analyze the data. The survey questionnaires quantified the study variables. This study’s goal is to examine the effects of internet banking on bank investment decisions using employee and customer satisfaction as mediators. The study’s participants were randomly selected from among bank staff and customers. The survey instruments were translated into French and then back into English to ensure that the intended implications of each question were understood. The population of the investigation consisted of 2,011 employees and 496,009 customers, source: Bank of Central African States (BEAC) and Central African Banking Commission (COBAC), 2019 and according to the number of commercial banks operating in the country. The data were gathered from 11 commercial banks in the Republic of Congo, mainly in Brazzaville and Pointe-Noire. Based on the provided population, a sample size of 1800 participants with a 5% confidence level was determined. The research engages a sample size of 1200 customers and 600 banks’ staff. There were 1500 valid surveys gathered and evaluated, 1000 customers and 500 employees from the 1800-targeted respondents, reflecting an 83.33% response rate. The study also employed a Likert scale ranging from 1 and 5, with 1 denote strongly disagree and 5 denote strongly agree, and incorporated questions from other experts who had undertaken similar research. The data was gathered from October 2019 to June 2020. The paper’s measurements were derived from past research and altered for this present investigation. The internet banking items were measured using the questions adopted from Rahi, Ghani, and Ngah [2]; the items for employee satisfaction was
320
J. B. B. Pea-Assounga and M. Wu
adopted from Yee, Yeung, and Cheng [34], while the items for customer satisfaction was taken from Hammoud, Bizri and El Baba [26] and finally items for bank investment decision are adopted from Ogunlusi and Obademi [16]. Justification for Path Analysis Using System of Regression Equations Path analysis using System of Regression Equations (SRE) is one of the most extensively utilized multivariate investigative procedures due to its ability to manage nonstandard information disseminations encountered in social sciences. Also, the System of Regression Equations characterizes the coefficients of path analysis as regression coefficients [35]. A well-defined hypothesis test can convincingly reject theoretical predictions. Additionally, it enables the plotting of residuals and the examination of data concerns such as heteroskedacity, outliers, autocorrelation, and non-normality [35, 36]. As a result, employing SRE modeling in Path analysis makes progressively a rational sense for achieving the study’s objectives [36]. We derived the following system of equations from the conceptual framework depicted in Fig. 1: 8 > > > > > > > > > >
> > > BID ¼ a5 þ B5 IB þ B6 CSAT þ e6 > > > > > BID ¼ a6 þ B6 IB þ B7 ES þ e7 > : BID ¼ a7 þ B8 IB þ B9 ES þ B10 ES þ e8
ð1Þ
With B0 ; B1 ; B2 ; B3 ; B4 ; B5 ; B6 ; B7 ; B8 ; B9 and B10 6¼ 0. Confirmatory Factor Analysis (CFA) To guarantee the data’s trustworthiness, we used the statistical package for social sciences (SPSS V.26) to perform discriminant and convergent validity tests. The Principal Component Analysis (PCA) with Varimax Rotation (Varimax with Kaiser Normalization) was used to find factors with eigenvalues greater than one. This study also employs exploratory factor analysis (EFA). The CFA represents a statistical multivariate approach employed to assess the accuracy of measured items in describing constructs. This study uses CFA to examine the common method variance. Normally, variables having a loading factor of 0.500 and higher are considered for analysis [37]. The exploratory factor analysis (EFA) identified four factors accounting for 74.23% of the overall variance in the research variables, with KMO = 0.763 and Scree plot direction differing by a factor of five. The Bartlett’s Test of Sphericity was used to assess construct validity, whilst Kaiser-Meyer Olkin (KMO) has been used to assess individual variable sampling adequacy. Bartlett’s Test of Sphericity revealed that the correlation between research’s constructs is 30020.196 and that is significant and relevant (P < 0.000). Each factor’s scales exhibit a high connection with one another, confirming the scales’ convergent validity [35]. The CFA and EFA results demonstrate that the loadings values of 18 items are greater than 0.70. This implies that each item is strongly integrated into its relying construct, as shown by the CFA and EFA results, and demonstrates the indicators’ (items’) reliability and sufficiency [38].
Internet Banking and Bank Investment Decision: Mediating Role
321
Measurements Validity and Reliability The Cronbach’s Alpha, composite reliability, and Average Variance Extracted coefficients were determined. The Cronbach’s Alpha (CA) is a measure of scale or item internal consistency or reliability and is defined as: k 1 CA ¼ k1
Pk
2 i¼1 rY i r2X
! ð2Þ
Where r2X denotes the observed total item scores variance and r2Y i represents the construct variance. The average variance extracted (AVE) and composite reliability (CR) are defined as follows: P AVE ¼
k2 n
P ð kÞ 2 CR ¼ P 2 P 1 k2 ð kÞ þ
ð3Þ ð4Þ
Here k denotes the factor’s loading value and n denotes the total number of indicators. The Cronbach’s Alpha (CA) values were over the acceptable range of 0.70. The Composite reliability (CR) is also higher than 0.70 and indicates that the constructs are very reliable [35]. The extracted average variance (AVEs) coefficients were assessed for convergence validity. The AVEs values for this research range between 0.571 and 0.663, which is acceptable and above the threshold of 0.5. Table 1 summarizes the validity and reliability findings. The discriminant validity of estimating models was also assessed using the FornellLarcker criterion. According to the Fornell-Larcker criterion, the AVEs square root must be larger than any other correlation of the constructs’ associations with others [39]. Table 2 shows that the results of this examination satisfy the Fornell-Larcker criteria. We conducted descriptive statistics to evaluate the variables’ standard deviations, means, and correlations. The study found significant and positive correlations between research variables, with coefficients ranging from 0.389 to 0.842. The correlation coefficients in Table-3 are below 0.9, indicating no common methods bias concerning the research constructs [40]. Common Method Bias (CMB) Test Since both endogenous and exogenous constructs data were collected via questionnaires, a common bias test is required [40]. Harman’s single factor test was used to evaluate the study’s constructs. The results showed that a single component merged explains around 44% of the model’s variance, which is below the threshold of the level of 50%, suggesting no common method bias in the constructs [40]. The measurement model’s fit indices such as SRMR, RMSEA, NFI, and CFI were also examined. As reported in Table 4, all indices of fit were acceptable. Therefore, the indicators of scale were considered suitable for subsequent investigations.
322
J. B. B. Pea-Assounga and M. Wu Table 1. Indicators loadings and reliability of construct Indicators FL CA CR AVE IB1 0.747 0.819 0.869 0.571 IB2 0.771 IB3 0.746 IB4 0.772 IB5 0.741 ES1 0.732 0.814 0.878 0.644 ES2 0.861 ES3 0.861 ES4 0.745 CSAT1 0.832 0.823 0.883 0.653 CSAT2 0.792 CSAT3 0.804 CSAT4 0.804 BID1 0.773 0.870 0.907 0.663 BID2 0.732 BID3 0.895 BID4 0.762 BID5 0.895 Notes: ES- Employee Satisfaction, IB–Internet Banking, BIDBank Investment Decision, CSAT-Customer Satisfaction, FL-Item Loadings, CR-Composite Reliability, AVE-Average Variance Extracted, and CA- Cronbach’s Alpha.
Table 2. Validity of discriminant (Fornell-Larcker criterion) BID CSAT ES IB 0.854 BID CSAT 0.542 0.808 ES 0.464 0.446 0.802 IB 0.667 0.610 0.410 0.756 Note: Bold and underlined Values are the square root of AVE
Table 3. Inter-Items means (M), Std. Deviation, and Correlation Variables Mean Std. Deviation (1) (2) (3) (4) (1) BID 17.04 4.239 1.000 (2) IB 16.51 3.991 0.620** 1.000 (3) CSAT 13.32 3.400 0.842** 0.566** 1.000 (4) ES 14.75 3.339 0.456** 0.389** 0.434** 1.000 Note: N = 1500; **. Correlation is significant at the 0.01 level (2-tailed)
Internet Banking and Bank Investment Decision: Mediating Role
323
Table 4. Model fit summary Measure Saturated model Threshold value SRMR 0.075 0.08 RMSEA 0.029 0.05 – 0.08 NFI 0.891 >0.95 or >0.90 CFI 0.972 >0.95 Note: CFI-Comparative Fit Index, SRMRStandardized Root Mean Square Residual, NFI-Normed Fit Index, and RMSEA-Root Mean Square Error of Approximation
4 Results The questionnaire yielded the following demographic information: gender, age, educational level, and monthly income. According to the results, 752 participants were males (50.1%) and 748 respondents were females (44.9%). 400 respondents represented 26.7% of the population between the ages of “18–25 years”; 503 respondents represented 33.5% of the population between the ages of “26–35 years”; and 356 respondents represented 23.7% of the population between the ages of “36–45 years.“ Additionally, the results indicate that 123 respondents, or 8.2%, are from the elementary level; 240 respondents, or 16.0%, are from high school; 361 respondents, or 24.1%, are from the diploma level; 511 respondents, or 34.1%, are from the undergraduate level; and 265 respondents, or 17.7%, are from the postgraduate level. 4.1
Hypothesis Analysis
We used Stata (reg3, estimate 2sls) to perform the structural equation model (Path analysis), as proposed by Westland [35] that two stages least squares regression represents one of the strongest multivariate techniques to assess path analysis coefficients. The outcomes of the variables that were tested revealed a positive and significant association with the predictor factors. Direct Effects of IB on BID, CSAT and ES; and the Effects of CSAT and ES on BID To examine the direct effects of internet banking (IB), customer satisfaction (CSAT), and employee satisfaction (ES) on bank investment decisions, we tested a structural equation modeling as presented in Fig. 2. The outcomes indicated that unstandardized coefficients (see Table 5) from internet banking to bank investment decision, customer satisfaction and employee satisfaction were respectively 0.659 (p < 0.000), 0.482 (p < 0.000) and 0.326 (p < 0.000). Thus, H1, H2, and H3 were accepted. Both customer satisfaction and employee satisfaction exerted statistically and significant effects on bank investment decision (b = 0.175, p < 0.000) and (b = 0.578, p < 0.000). Thus, H4 and H5 were also confirmed.
324
J. B. B. Pea-Assounga and M. Wu Table 5. Hypothesis test, unstandardized coefficients (1) BID 0.659*** (0.0215)
(3) (4) (5) (6) (7) (8) ES BID BID BID BID BID 0.326*** 0.136*** 0.555*** 0.128*** IB (0.0199) (0.0106) (0.0223) (0.0107) *** CSAT 0.175 0.084*** 0.070*** (0.0108) (0.0124) (0.0129) ES 0.578*** 0.320*** 0.458*** (0.0292) (0.0267) (0.0117) Constant 6.165*** 5.369*** 9.375*** 1.388*** 8.513*** 0.344* 3.163*** -0.010 (0.365) (0.308) (0.338) (0.148) (0.442) (0.162) (0.430) (0.185) N 1500 1500 1500 1500 1500 1500 1500 1500 R2 0.385 0.320 0.151 0.888 0.207 0.899 0.439 0.900 F-Stat 937.42 705.07 267.38 11874.4 392.21 6676.47 585.52 4498.44 Note: Standard errors are in parentheses, *** p < 0.001, ** p < 0.01, * p < 0.05; BIDBank Investment Decision, IB -Internet Banking, CSAT- Customer Satisfaction and ESEmployee Satisfaction
4.2
(2) CSAT 0.482*** (0.0181)
Mediation Analysis
Table 6 demonstrates the indirect effects of internet banking (IB) on bank investment decision (BID), this effect is mediated via customer satisfaction (CSAT) and employee satisfaction (ES) factors. Table 6. The mediation tests (Direct effects, indirect effects, and Total effects) Direct effects Paths Coef. St. Err. Z P>z CSAT < - IB 0.610 0.018 33.691 0.000 BID < - CSAT 0.870 0.007 122.176 0.000 BID < - ES 0.024 0.010 2.522 0.012 BID < - IB 0.127 0.010 12.229 0.000 ES < - IB 0.410 0.023 18.170 0.000 Indirect effects BID < - IB 0.540 0.016 32.860 0.000 Total effects CSAT < - IB 0.610 0.018 33.691 0.000 BID < - CSAT 0.870 0.007 122.176 0.000 BID < - ES 0.024 0.010 2.522 0.012 BID < - IB 0.667 0.015 45.097 0.000 ES < -IB 0.410 0.023 18.170 0.000 Note: Bootstrapping outputs from SmartPLS
[CI at 95%] 0.572–0.641 0.855–0.884 0.006–0.044 0.107–0.148 0.364–0.457 0.501–0.567 0.572–0.641 0.855–0.884 0.006–0.044 0.638–0.701 0.364–0.457
Internet Banking and Bank Investment Decision: Mediating Role
325
Table 7 indicates that the indirect relationship between IB and bank investment decision via CSAT was statistically significant as b = 0.530 and p < 0.000. Similarly, the mediating effects of ES on the relationship between IB and bank investment decisions were 0.010 (p < 0.012). Thusly reasons, statistically positive and significant mediation effects may be inferred, indicating that H6 and H7 were likewise accepted. The results in Tables 6, 7, and Fig. 2 demonstrate that direct and indirect effects of IB on bank investment decision (BID) are statistically and positively significant, indicating that customer satisfaction (CSAT) and employee satisfaction (ES) partially mediate the effect of IB.
Table 7. Specific indirect effects Paths Std. Coef. St. Err. Z P > z [95% CI] IB - > CSAT - > BID 0.530 0.016 32.698 0.000 0.495–0.556 IB - > ES - > BID 0.010 0.004 2.527 0.000 0.002–0.018
Fig. 2. Structural equation modelling
4.3
Discussion
We notice in this study that innovation components and stakeholders influence bank investment decisions. Our outcomes support prior studies showing BID is affected by IB and stakeholders [17, 21, 23], and contradict other researchers’ findings that BID is positively linked to only a few of the above components [29, 31]. Also, we found that various components of innovation and technologies have varied effects on BID. In particular, IB has both direct and indirect impacts on the decision to invest in banks. Similarly, employee and customer satisfaction partially mediated the decision to
326
J. B. B. Pea-Assounga and M. Wu
connect internet banking and bank investment. These results demonstrate that the mere focus on workers and customers should be taken into account to guarantee the benefits anticipated from the decisions of banks. In addition, in promoting bank investment decisions, stakeholders tend to play the most important role. This is possibly due to the culture of relationship-oriented in the corporate atmosphere of the banks, which illustrates social and interpersonal agreement relationships. Companies with significant innovation technologies can build successful interactions with various stakeholders, leading to additional market opportunities and enhance outputs. In short, these results add to the literature on innovation by revealing key mechanisms by which components of innovation help banks to make better decisions. They also contribute to the HRM literature by advising which IB aspects a firm should focus on more heavily. In addition, our findings support the idea that numerous variables can mitigate the impact of IB on bank investment decisions. Nonetheless, our research is amongst the first to explore the role of CSAT and employee satisfaction in mediation. In particular, the impact of the components of the IB on bank investment decisions is partly mediated by both customer satisfaction and employee satisfaction. Customer satisfaction tends to have more effect than employee satisfaction, as seen in Tables 6 and 7 and leads to bank investment decisions. These results add to current innovation and HRM literature and may serve as guidance for companies to develop a comprehensive HR process to boost the IB, the loyalty of stakeholders, and the decision to invest by banks. These results shed a new bright on structural mechanisms, which ease bank investment decision-making.
5 Conclusion, Limitations, and Opportunities for Further Studies Our results have two major consequences for practitioners. First, as the IB aspect is related to customer satisfaction, employee satisfaction, and bank investment decisions, managers should aim to continually improve and sustain their IB through investments in selection and recruitment of employees, development, and training of employees, optimization and design of procedures, and other human resource management (HRM) activities. Managers must remember that different elements of innovation have different cumulative effects on investment decisions. Thus, based on the investment functioning emphasized by their market strategies, they should devote more resources to unique components. In particular, if their companies are to boost the benefits of investment, they should make greater efforts to improve the satisfaction of stakeholders. However, if the priority of the plan is to increase revenues from investments, particular attention has to be considered to customer satisfaction. Their workers may already have advanced experience, abilities, and skills relevant to their job. In this situation, it is not a primary concern to develop human resources further. In this paper, we have built a theoretical framework that explains the mediating roles of customer satisfaction and employee satisfaction on the relationship between IB and bank investment decisions, and verified the hypotheses by examining data gathered from Congolese banks. The outcomes demonstrate that the component of innovation, namely internet banking, is positively associated with customer satisfaction and employee satisfaction, which ultimately contribute positively to the bank investment
Internet Banking and Bank Investment Decision: Mediating Role
327
decision. Both customer satisfaction and employee satisfaction partially mediate the effects of IB on bank investment decisions. Most innovation research focuses only on exploring the individual or contextual components that improve it since few companies can survive, prosper and succeed in the competitive world in which we live without innovation. This study gives a framework on which more research will help companies to understand when and how the positive effects of the creative activities of their workers can be stimulated, while also minimizing the negative effects. There are some drawbacks to our research, which in turn give possible directions for further investigation. Firstly, we used a cross-sectional method to explore the inherent mechanism of the effect of internet banking on bank investment decisions. A cross-sectional design, however, does not disclose causality among constructs. To establish a clear causal association and analyze the possible period lag effects of IB accumulation, future work can conduct longitudinal investigations. Second, this research was performed in the sense of Congolese banks. As financial industries are typically innovation-oriented and knowledge-intensive, associations IB, customer satisfaction, employee satisfaction, and decision-making on bank investment may be stronger in this environment than in other organizations. To corroborate the validity of our study results, it is suggested that future studies should gather data from different companies. Third, by concentrating on the mediating role of stakeholders (employee and customer) satisfaction, this study explores the underlying process between IB and bank investment decisions. Stakeholders subjectively assess the scale elements of the above-mentioned constructs. More realistic metrics for bank investment decisions will need to be obtained in the future study, such as a rate for recurrent investment or capital investment for investment decision-making. Lastly, our research emphasizes two major mediators, namely customer satisfaction and employee satisfaction between IB and bank investment decisions. Yet, some significant contextual components can mitigate their impact, particularly from an innovation security perspective. Future research will therefore need to investigate the moderating effects of such contextual variables to gain more insights, such as internet security and perceived risk. Moreover, the future study can also extend this work by adding other mediating variables such as bank performance, bank competitive advantage, and bank sustainability.
Appendix Questionnaire of the study “Items questionnaire” “Internet Banking (IB)” “IB1. You feel confident while using the e-banking method to access money” “IB2. Internet banking enables me to complete a transaction quickly” “IB3. Online banking enhances your effectiveness in doing banking transactions” “IB4. You find online banking useful” (continued)
328
J. B. B. Pea-Assounga and M. Wu (continued)
Questionnaire of the study “IB5. Online banking saves your time” “Customer Satisfaction (CSAT)” “CSAT1. I am satisfied with the transaction processing via E-Banking services” “CSAT2. I think I made the correct decision to use the E-Banking services” “CSAT3. My satisfaction with the E-Banking services is high” “CSAT4. Overall, E-Banking services are better than my expectations” “Employee Satisfaction (ES)” “ES1. We are satisfied with the salary of this bank” “ES2. We are satisfied with the promotion opportunity of this bank” “ES3. We are satisfied with the job nature of this bank” “ES4. We are satisfied with the relationship of my fellow workers of this company” “Bank Investment Decision (BID)” “BID1. My investment reports better results than expected” “BID2. My investment in IT and Innovation has demonstrated increased cash flow growth in the past 5 years” “BID3. My investment in technology innovation has a lower risk compared to the market financial products in general” “BID4. My investment in sustainability activities has a high degree of safety” “BID5. My investment proceeds will be used in a way that benefits society”
References 1. Aboobucker, I., Bao, Y.: What obstruct customer acceptance of internet banking? Security and privacy, risk, trust, and website usability and the role of moderators. J. High Technol. Managem. Res. 29, 109–123 (2018). https://doi.org/10.1016/j.hitech.2018.04.010 2. Rahi, S., Abd Ghani, M., Hafaz Ngah, A.: Integration of unified theory of acceptance and use of technology in internet banking adoption setting: evidence from Pakistan. Technol. Soc. 58, 101120 (2019). https://doi.org/10.1016/j.techsoc.2019.03.003 3. Hoehle, H., Scornavacca, E., Huff, S.: Three decades of research on consumer adoption and utilization of electronic banking channels: a literature analysis. Decis. Support Syst. 54, 122– 132 (2012) 4. Alalwan, A.A., Baabdullah, A.M., Rana, N.P., Tamilmani, K., Dwivedi, Y.K.: Examining adoption of mobile internet in Saudi Arabia: extending TAM with perceived enjoyment, innovativeness and trust. Technol. Soc. 55, 100–110 (2018). https://doi.org/10.1016/j. techsoc.2018.06.007 5. Martins, C., Oliveira, T., Popovič, A.: Understanding the Internet banking adoption: a unified theory of acceptance and use of technology and perceived risk application. Int. J. Inf. Manage. 34, 1–13 (2014). https://doi.org/10.1016/j.ijinfomgt.2013.06.002 6. Rahi, S., Abd Ghani, M.: Customer’s perception of public relation in e-commerce and its impact on e-loyalty with brand image and switching cost. J. Internet Banking Commerce. 2016, 21 (2016)
Internet Banking and Bank Investment Decision: Mediating Role
329
7. Ma, L., Zhai, X., Zhong, W., Zhang, Z.-X.: Deploying human capital for innovation: a study of multi-country manufacturing firms. Int. J. Prod. Econ. 208, 241–253 (2019) 8. Lane, K., Rosewall, T.: Firms’ investment decisions and interest rates. RBA Bull. 2015, 1–7 (2015) 9. Hjelseth, I.N., Meyer, S.S., Walle, M.A.: What factors influence firms’ investment decisions? Econ. Comment. 2017(10) (2017). http://hdl.handle.net/11250/2558941 10. Velmurugan, G., Selvam, V., Abdul, N.N.: An empirical analysis on perception of investors’ towards various investment avenues. MJSS 6, 427 (2015). https://doi.org/10.5901/mjss. 2015.v6n4p427 11. Carbó‐Valverde, S., Cuadros‐Solas, P.J., Rodríguez‐Fernández, F.E.Y.: The effect of banks’ IT investments on the digitalization of their customers. Glob. Policy. 11, 9–17 (2020). https://doi.org/10.1111/1758-5899.12749 12. Holcomb, T.R., Holmes, R.M., Jr., Connelly, B.L.: Making the most of what you have: managerial ability as a source of resource value creation. Strat Manage. J. 30, 457–485 (2009). https://doi.org/10.1002/smj.747 13. Ndofor, H.A., Sirmon, D.G., He, X.: Firm resources, competitive actions, and performance: investigating a mediated model with evidence from the in-vitro diagnostics industry. Strat. Manage. J. 32, 640–657 (2011) 14. Nandini, P.: Gender differences in investment behavior with reference to equity investments. Doctoral dissertation, Pondicherry University (2018) 15. Antony, A., Joseph, A.I.: Influence of behavioural factors affecting investment decision—an AHP analysis. Metamorphosis 16, 107–114 (2017). https://doi.org/10.1177/ 0972622517738833 16. Ogunlusi, O.E., Obademi, O.: The impact of behavioural finance on investment decisionmaking: a study of selected investment banks in Nigeria. Global Bus. Rev. 22, 1–17 (2019) 17. Flor, C.R., Hansen, S.L.: Technological advances and the decision to invest. Ann Finan. 9, 383–420 (2013) 18. Goodman, T.H., Neamtiu, M., Shroff, N., White, H.D.: Management forecast quality and capital investment decisions. Account. Rev. 89, 331–365 (2014). https://doi.org/10.2308/ accr-50575 19. Panousi, V., Papanikolaou, D.: Investment, idiosyncratic risk, and ownership. J. Finan. 67, 1113–1148 (2012). https://doi.org/10.1111/j.1540-6261.2012.01743.x 20. Hu, T., Xie, C.: Competition, innovation, risk-taking, and profitability in the Chinese banking sector: an empirical analysis based on structural equation modeling. Discret. Dyn. Nat. Soc. 2016, 1–10 (2016). https://doi.org/10.1155/2016/3695379 21. Turedi, S., Zhu, H.: How to generate more value from IT: the interplay of IT investment, decision making structure, and senior management involvement in IT governance. CAIS 4, 26 (2019) 22. Božić, L., Botrić, V.: Innovation investment decisions: are post (transition) economies different from the rest of the EU? Eastern J. Eur. Stud. 8, 25–43 (2017). https://nbnresolving.org/urn:nbn:de:0168-ssoar-61825-3 23. Crespi, G., Zuniga, P.: Innovation and productivity: evidence from six Latin American countries. World Dev. 40, 273–290 (2012). https://doi.org/10.1016/j.worlddev.2011.07.010 24. Nazaritehrani, A., Mashali, B.: Development of e-banking channels and market share in developing countries. Finan. Innov. 6(1), 1–19 (2020). https://doi.org/10.1186/s40854-0200171-z 25. Turkyilmaz, A., Akman, G., Ozkan, C., Pastuszak, Z.: Empirical study of public sector employee loyalty and satisfaction. Industr. Manage. Data Syst. 111, 675–696 (2011). https:// doi.org/10.1108/02635571111137250
330
J. B. B. Pea-Assounga and M. Wu
26. Hammoud, J., Bizri, R.M., El Baba, I.: The impact of e-banking service quality on customer satisfaction: evidence from the lebanese banking sector. SAGE Open 8, 215824401879063 (2018) 27. Fornell, C., Rust, R.T., Dekimpe, M.G.: The effect of customer satisfaction on consumer spending growth. J. Mark. Res. 47, 28–35 (2010). https://doi.org/10.1509/jmkr.47.1.28 28. Sarkar Sengupta, A., Balaji, M.S., Krishnan, B.C.: How do customers cope with service failure? A study of brand reputation and customer satisfaction. J. Bus. Res. 68, 665–674 (2015) 29. Vo, L.V., Le, H.T.T., Le, D.V., Phung, M.T., Wang, Y.-H., Yang, F.-J.: Customer satisfaction and corporate investment policies. J. Bus. Econ. Manag. 18, 202–223 (2017) 30. Bai, J., Fairhurst, D., Serfling, M.: Employment protection, investment, and firm growth. Rev. Finan. Stud. 33, 644–688 (2020). https://doi.org/10.1093/rfs/hhz066 31. Sorescu, A., Sorescu, S.M.: Customer satisfaction and long-term stock returns. J. Mark. 80, 110–115 (2016). https://doi.org/10.1509/jm.16.0214 32. Merrin, R.P., Hoffmann, A.O.I., Pennings, J.M.E.: Customer satisfaction as a buffer against sentimental stock-price corrections. Mark Lett. 24, 13–27 (2013). https://doi.org/10.1007/ s11002-012-9219-9 33. Obeng, A.Y., Mkhize, P.L.: An exploratory analysis of employees and customers’ responses in determining the technological innovativeness of banks. Electron. J. Inf. Syst. Develop. Countries. 80, 1–23 (2017). https://doi.org/10.1002/j.1681-4835.2017.tb00586.x 34. Yee, R.W.Y., Yeung, A.C.L., Cheng, T.C.E.: The impact of employee satisfaction on quality and profitability in high-contact service industries. J. Oper. Manag. 26, 651–668 (2008). https://doi.org/10.1016/j.jom.2008.01.001 35. Westland, J.C.: Structural Equation Models: From Paths to Networks. Springer International Publishing, Cham (2015). https://doi.org/10.1007/978-3-319-16507-3 36. Pea-Assounga, J.B.B., Yao, H.: The mediating role of employee innovativeness on the nexus between internet banking and employee performance: evidence from the republic of Congo. Math. Probl. Eng. 2021, 1–20 (2021). https://doi.org/10.1155/2021/6610237 37. Ringle, C.M., Wende, S., Becker, J.-M.: SmartPLS 3. Bönningstedt: SmartPLS (2015) 38. Henseler, J., Ringle, C.M., Sarstedt, M.: A new criterion for assessing discriminant validity in variance-based structural equation modeling. J. Acad. Mark. Sci. 43(1), 115–135 (2014). https://doi.org/10.1007/s11747-014-0403-8 39. Hair, J.F., Sarstedt, M., Ringle, C.M., Gudergan, S.P.: Advanced Issues in Partial Least Squares Structural Equation Modeling. Sage Publications, Thousand Oaks (2017) 40. Lowry, P.B., Gaskin, J.: Partial least squares (PLS) structural equation modeling (SEM) for building and testing behavioral causal theory: when to choose it and how to use it. IEEE Trans. Profess. Commun. 57, 123–146 (2014). https://doi.org/10.1109/TPC.2014.2312452
Inductions of Usernames’ Strengths in Reducing Invasions on Social Networking Sites (SNSs) Md. Mahmudur Rahman1 , Shahadat Hossain2(B) , Mimun Barid3 , and Md. Manzurul Hasan4 1
4
Bangabandhu Sheikh Mujibur Rahman Aviation and Aerospace University (BSMRAAU), Dhaka, Bangladesh [email protected] 2 City University, Dhaka, Bangladesh [email protected] 3 University of South Asia, Dhaka, Bangladesh American International University-Bangladesh (AIUB), Dhaka, Bangladesh [email protected]
Abstract. The internet allows people to make social contacts and to communicate among different Social Networking Sites (SNSs). If users are ignorant of their exposures, it may reveal their identities and may enhance cyber-attacks. Hence, password secrecy is usually prioritized to protect our personal information. Besides, usages of the same usernames across many SNSs expose users’ identities to other users and intruders. Hackers can use usernames to track usage patterns and manipulate social media accounts or systems. As a result, in terms of security, usernames must be treated the as same as passwords. This empirical study illuminates the analyses of usernames’ strengths by predicting weak usernames with machine learning models to limit poor username selections. We have analyzed the Reddit usernames dataset (83958) to see how frequently people choose weak usernames for their accounts. Our predictive models correctly categorize strong and weak usernames with an average accuracy of 87%. Keywords: Username selection · SNS · Support vector classification Linear support vector classification · Random Forest · KNN
1
·
Introduction
The advancement of the internet has created a plethora of opportunities for communications through Social Networking Sites (SNSs). Increasingly complicated algorithms are being used to make it more challenging for the hackers to hack personal information from popular websites. Every system is unsecured until it is standalone. But we shall have to be connected with others in the twenty first century. In this connection we are trying to make our credentials complex to c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 331–340, 2022. https://doi.org/10.1007/978-3-030-93247-3_32
332
Md. M. Rahman et al.
complexes, i.e. polynomial to exponential time breakable systems. Even so, they cannot guarantee that all users have strong usernames. The same username on various social networking sites allows hackers to recognize the pattern and to hack a user account. We usually focus on strong passwords, where as both usernames and passwords are essential for digital authentications as well as digital authorizations. Knowing the correct username is required to locate a particular resource within the system while knowing the correct password is required to access that resource. Since ancient times, passwords have been employed as a unique code to identify malicious individuals. On the other hand, people use basic and easy-to-remember passwords [24]. As computer power and internet access have been increased, cyberattacks are being observed. In this regard, new techniques of threatening victims are being evolved. For example, a network of internet-connected devices (botnet) may be utilized to dramatically lower the Time to Crack (TTC) [6], requiring more difficult passwords. Nonetheless, a lack of cybersecurity awareness prohibits consumers from putting appropriate safeguards in place. This study demonstrates that users are willing to forego protections in exchange for convenience. As a result, the risk of an attack increases, especially if multiple accounts reuse identical data. Hence, poor password management magnifies the affect of breaches: the most recent attacks are measured in millions of compromised logins and password combinations [16]. Many attackers target weak users by obtaining credentials from accounts that can be used to access other websites. According to the Identity Theft Resource Center [3], there are approximately 1,108 data breaches affecting US consumers in 2020. However, research on the usernames limit to very low to the best of our knowledge. Therefore, our study employs machine learning(ML) classification techniques to assess usernames’ strengths in order to reduce the risk of selecting a common (may not always applicable) and weaker username for different SNSs. Our dataset of 83958 usernames from Reddit is investigated by various parameters and have been predicted through our models. Then on, it can appropriately categorize any weak username. Thus, it reduces the likelihood of getting attacked by well-known notorious hackers. Our research demonstrates that weaker usernames are widely used, and our predictive models determine whether a username is vulnerable or not. Our objectives of this paper are given below: – – – –
To To To To
observe username strengths in SNSs. predict username strengths using RF, KNN, SVC, and LinearSVC. analyze and to compare evaluation matrices of applied models. give a parametric observation on the Reddit dataset.
The rest of the sections are as follows. First, we describe some related works in Sect. 2. Then, in Sect. 3, we describe the procedure we have followed to perform the analysis. Next, in Sect. 4, we give our findings and classification performance before concluding in Sect. 5.
Inductions of Usernames’ Strengths Over SNSs
2
333
Related Work
Individuals commonly use usernames and passwords to log into their accounts. Many devices now use iris scanners and pattern matchings to determine user authentication, and however, the alphanumeric texts are still the most prevalent. As a result, many cyber-security professionals have concentrated on passwords: research shows that nearly all users establish short, memorable passwords that they reuse across numerous accounts [9,21]. Furthermore, more stringent passwords and complicated access control measures are being created (e.g., twofactor authentication). Today, users can implement their passwords in various situations, but they may choose to ignore them for the sake of ease. While password complexity is always being prioritized, we have given less attention to usernames, with only few studies assessing their impacts on account’s securities [12,16]. However, research has revealed that increasingly personalized login credentials, such as biometric recognitions are being used (e.g., Fingerprint, Iris). In any case, account names serve as the first security line for websites that hold sensitive data (e.g., bank accounts, trading data). Fandakly et al. [10] emphasized account names as the first line for the credential that impacts an account security. Shi [19] proposed a mechanism of user discrimination based on the username characteristics. Furthermore, to secure users’ passwords online, the authors of [23] presented a virtual password idea in which users can choose from a variety of virtual password schemes ranging in security from bad to robust. Basta et al. [7] noticed that relatively little emphasis had been made on the username format and observed that most firms designate an account using specific versions of the person’s first name and last name. As a result, acquirings of usernames become pretty straightforward. According to Wang et al. [25], hackers are attracted to cloud-based password managers. With the master password and the user’s phone, they suggested a bidirectional generative password manager. In addition, Perito et al. [17] utilized the N-gram model to assess the uniqueness of a user’s username and to locate several profiles belonging to the same person. Leona et al. [22] investigated five distinct password management behaviors to see how well consumers understood password quality. According to their research, users comprehend good and bad passwords, but the concept of password security varies from person to person. Password reuse is another source of security vulnerability and to detect it, Jeffrey et al. [14] devised a two-pronged strategy involving detection and remediation. Coban et al. [8] also concentrated on username reuse, extracting similar usernames from other internet domains using various machine learning algorithms.
3
Methodology
This section defines the process and methods we follow in this study. The architecture of the proposed way to username selection approach is shown in Fig. 1.
334
Md. M. Rahman et al.
Fig. 1. Architecture of username selection approach
3.1
Username Selection Process
The username contains a string of letters, numbers, and a few special characters. It should not be permissible to use the same username twice within the same SNS. Some users worry about compromising their accounts, although they are using a strong, unique password from a password manager and only are reusing usernames. If we have a similar username for several SNSs, our information can be traced using a single username, which is extremely sensitive to being breached [17]. According to Kumar et al.[15], more than 50% of users in different SNSs use the same logins. Therefore, in order to distinguish ourselves from online profiling, we must adopt different usernames. In this research, we have examined usernames’ strengths and human behaviors, which are the reasons for using vulnerable usernames. Human tendencies that lead to breaches while using usernames include: 1. Username shown in text: Normally, the password is masked while typing, but the username is always unveiled in clear text, which plays a vital role for other miscreants to steal and use that in password generation using a brute force mechanism. 2. Social sharing of username: Social sharings of usernames also harm credential hackings as they are exposed publicly. 3. Email as username: Most social sites are now using Email as username and increased the vulnerability as these are featured in different business diaries and business cards. 4. Similar usernames on different sites: Users have high tendencies to reuse a username-like passwords which can be used as the stepping stone for attacking websites of low security. 5. Personal information based username: Many of us malpractice the personal name-based usernames, that link to the other accounts of the same person. 3.2
Dataset Description
We collect username data from a repository Kaggle, where we find a dataset titled Reddit usernames [2]. Reddit’s dataset contains around 26 million usernames who have commented at least once in the comment section. For our subsequent
Inductions of Usernames’ Strengths Over SNSs
335
study, we take 83958 usernames from the 26 million records in the MS Excel file and exclude the frequencies of their comments. Following that, we employ several parameters to determine the usernames’ strengths. Several security experts have stated that several features are required to form a secure username [1]. As a result, we measure following those specifications. We use the length of the username, the number of digits in the username, existences of special characters, and camel cases as our attributes. We identify these characteristics for each username after selecting these attributes. For example, if a username only consists of characters, then its strength will be ‘very weak’. Similarly, if the username contains only one feature, then it will be considered ‘weak.’ If it has two features, we consider our usernames of ‘normal’ strength. Further, if it has three features, then it is ‘strong’. If it contains all four features, then its strength will be ‘very strong’.
Fig. 2. Bar chart of usernames’ trends
We have found some observations after visualization of our dataset. First, we find that different users prefer to use different types of usernames. Some users choose short usernames, and some use long ones. Figure 2 shows that most of the users’ on Reddit select usernames with length more than eight. Besides, only few people use camel-case, making their usernames guessable. The tendencies to use numbers and special characters are lower in number, that also make their usernames predictable. Based on these four criteria, we have measured the strengths of usernames (in Fig. 3). Suppose, users are maintaining four criteria in their usernames, which indicate the high strengths. We have found that few people have very strong usernames, and most people use normal, weak, and very weak usernames.
336
Md. M. Rahman et al.
Fig. 3. Different strength of username
3.3
Data Pre-processing
To classify our data properly by the ML classification algorithm, we have performed some data processings. We have converted our strength feature from nominal to numeric data. So that we can accurately find out weak usernames. The username strength (very weak, weak, normal ) is given a label of ‘0’ and the strength (strong, very strong) is given a ‘1’ label. The dataset is shuffled and splited into training (70%) and testing (30%) data for applying different models. In our dataset, we find that the maximum username strength is weaker than that of the strong label. To mitigate this imbalance, we have performed Synthetic Minority Oversampling Technique (SMOTE) from python imbalance library before Random Forest (RF) analysis. 3.4
Random Forest (RF)
Deterministic trees are a type of classifier that manifests itself as the space partitioning of a recursive instance. Nodes in the decision tree form a directed tree with no incoming edges, along with a root node [13,18]. We choose RF because it has a shallow bias and variation. A different decision tree algorithm, j48 is superior to RF in terms of correctness [4] if we compromise some miscellaneous issues. Tree density in RF improves efficiency and accuracy estimation, but it slows down the computation. There should be many features considered when dividing a node and a minimum number of leaf nodes. Regression and classification can be done with the RF algorithm, which has more excellent stability than that of other decision tree algorithms. 3.5
K-nearest Neighbor (KNN)
KNN is a non-parametric classifier. A KNN model is created using training samples, and then the quality of those samples is determined by the KNN process (classification or regression) [5,11]. In order to use KNN, we need to specify these types of information. If k(k = 1, 2, 3, 4, 5, ......n) is positive, then the new member is identified by the maximization vote of a neighbor. This k reflects the nearest neighbor’s number. KNN-regression calculates the average value of k-nearest neighbors for each new member that joins the model. By applying Euclidean distance, the training samples are incorporated into the model.
Inductions of Usernames’ Strengths Over SNSs
3.6
337
Support Vector Classification (SVC)
Support vector machines for classification (SVC) are supervised learning models with related learning algorithms used in machine learning to evaluate data and to predict results [20]. The basic SVM is a non-probabilistic binary linear classifier that takes input data and predicts which of two possible output classes will be produced for each given input. Based on a set of training examples are labeled as methods and are separated into two categories. Then, SVC training technique generates a model that assigns fresh examples to one of two categories. SVC model maps the instances as points in space, with a substantial gap separating the examples of the various categories. New examples are then mapped into the same region and classified according to which side of the divide they are on. 3.7
Linear Support Vector Classification (LinearSVC)
This SVC method performs classification using a linear kernel function and works well with large numbers of samples. The Linear SVC model has more parameters than that of the SVC model, such as penalty normalization (‘L1’ or ‘L2’) and loss function. The linear kernel methodology cannot be adjusted because linear SVC is dependent on it. After training the model, different regularizations can be utilized in the model. Non-linear classifiers are slower than linear classifiers.
4
Experiment and Result Analysis
Our experiment constructs a machine learning system using Python(version 3) to learn various usernames. We have used Scikit-learn, a python library. After completing data preprocessing, we have started our experiment by classifying the dataset through various machine learning algorithms. We have selected five features for the classification. Next, we have scaled our dataset and split 83958 usernames into training (70%) dataset and testing (30%) dataset. Once the training has been completed, we have used RF KNN, SVC, and LinearSVC classification models to predict our testing data. We have recorded the results for further analysis. Table 1. Confusion matrix of different RF, KNN, SVC, and LinearSVC models Actual value RF KNN Weak Strong Weak Strong Weak Strong Predicted value Weak TP Strong FP
FN TN
20276 983 2356 1573
19456 1840 1956 1936
Actual value SVC LinearSVC Weak Strong Weak Strong Weak Strong Predicted value Weak TP Strong FP
FN TN
20227 1069 2268 1624
20173 1123 2409 1483
338
Md. M. Rahman et al.
Test data have been anticipated by fitting training data to models and are represented in a confusion matrix. Confusion Matrix shows a table of True Positive (TP), False Positive (FP), False Negative(FN), and True Negative(TN) values from the test data (Table 1). Table 2. Result table Precision Sensitivity Specificity Accuracy RF
0.90
0.90
0.40
0.87
KNN SVC
0.91 0.90
0.91 0.90
0.50 0.42
0.85 0.87
Linear SVC 0.89
0.89
0.38
0.86
After applying the test data to our model, we have evaluation matrices of our models, which is shown in Table 2. As we aim to predict the usernames which are weak in strength, we concentrate our analysis through observing how our models perform to predict weak usernames from the dataset. We have found that in terms of accuracy Random Forest and SVC model have achieved 0.87 score, which is the highest among other models. In sensitivity matrices, both RF and SVC models achieve 0.90 scores, that means these two models can predict True Positive 90% correctly. KNN model achieves 0.91 score in terms of sensitivity. We have observed that all models perform low specificity as we have fewer strong usernames in the Reddit dataset. Our models can predict the weak usernames, which assist the users in choosing strong usernames for any social media account.
Fig. 4. Importance of features.
Furthermore, we have the importance of features (Fig. 4) from our RF model through applying feature importances , a method from python Random Forest classifier library. For example, we have observed that usernames containing numbers, have lengths with 8 characters (minimum), and camel-cases impact the RF model more. Besides, usernames containing special characters have less importance on our RF model.
Inductions of Usernames’ Strengths Over SNSs
5
339
Future Research Direction and Conclusion
Despite gaining considerable insights into different username strengths, we have discovered fewer datasets and insufficient literature on usernames’ strength analysis from various SNSs. In the future, we hope to apply other machine learning techniques to verify if the same individual has previously used any username at any account, so it eliminates the possibility of online tracing and profile hacking in addition to our username strictness. Along with a password, a user’s username may contain crucial information regarding online privacy. Unfortunately, compromising accounts often have weak usernames (no numeric characters, special characters, camel-case, or long length). This article emphasizes the need for strong usernames for account security and presents machine learning classifiers to detect weak usernames. Our analyses have shown that Random Forest(RF) and Support Vector Classifier (SVC) can correctly categorize 87% of our weak usernames. This study will help SNS owners to secure their clients in a better manner and urge users to choose more robust usernames to avoid being hacked.
References 1. The importance of creating a strong online username. https://www.bellco.org/ advice-planning/fraud-prevention/online-security/username-tips.aspx 2. Reddit usernames. https://kaggle.com/colinmorris/reddit-usernames R 2020 annual data breach report reveals 19 percent 3. Identity theft resource center’s decrease in breaches, January 2021. https://www.idtheftcenter.org/identity-theftresource-centers-2020-annual-data-breach-report-reveals-19-percent-decrease-inbreaches/ 4. Ali, J., Khan, R., Ahmad, N., Maqsood, I.: Random forests and decision trees. Int. J. Comput. Sci. Issues (IJCSI) 9(5), 272 (2012) 5. Altman, N.S.: An introduction to Kernel and nearest-neighbor nonparametric regression. J. Am. Stat. 46(3), 175–185 (1992) 6. Anish Dev, J.: Usage of botnets for high speed MD5 hash cracking. In: Third International Conference on Innovative Computing Technology (INTECH 2013), pp. 314–320 (2013). https://doi.org/10.1109/INTECH.2013.6653658 7. Basta, A., Basta, N., Brown, M.: Computer security and penetration testing. Cengage Learning (2013) ¨ Inan, A., Ozel, S.A.: Your username can give you away: matching 8. C ¸ oban, O., Turkish OSN users with usernames, vol. 10, pp. 1–15 (2021). http://www.ijiss. org/ijiss/index.php/ijiss/article/view/896 9. Das, A., Bonneau, J., Caesar, M., Borisov, N., Wang, X.: The tangled web of password reuse. In: NDSS, vol. 14, pp. 23–26 (2014) 10. Fandakly, T., Caporusso, N.: Beyond passwords: enforcing username security as the first line of defense. In: Ahram, T., Karwowski, W. (eds.) AHFE 2019. AISC, vol. 960, pp. 48–58. Springer, Cham (2020). https://doi.org/10.1007/978-3-03020488-4 5 11. Fix, E., Hodges, J.L., Jr.: Discriminatory analysis-nonparametric discrimination: small sample performance. California Univ. Berkeley, Technical report (1952)
340
Md. M. Rahman et al.
12. Grassi, P., et al.: Digital identity guidelines: authentication and lifecycle management, 22 June 2017. https://doi.org/10.6028/NIST.SP.800-63b 13. Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998) 14. Jenkins, J., Grimes, M., Proudfoot, J., Lowry, P.: Improving password cybersecurity through inexpensive and minimally invasive means: detecting and deterring password reuse through keystroke-dynamics monitoring and just-in-time fear appeals. Inform. Technol. Dev. 20, 196–213 (2013). https://doi.org/10.1080/ 02681102.2013.814040 15. Kumar, S., Zafarani, R., Liu, H.: Understanding user migration patterns in social media. In: Twenty-Fifth AAAI Conference on Artificial Intelligence (2011) 16. Onaolapo, J., Mariconti, E., Stringhini, G.: What happens after you are PWND: Understanding the use of leaked webmail credentials in the wild. In: Proceedings of the 2016 Internet Measurement Conference, pp. 65–79 (2016). https://doi.org/ 10.1145/2987443.2987475 17. Perito, D., Castelluccia, C., Kˆ aafar, M.A., Manils, P.: How unique and traceable are usernames? CoRR abs/1101.5578 (2011). http://arxiv.org/abs/1101.5578 18. Rokach, L., Maimon, O.: Top-down induction of decision trees classifiers-a survey. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 35(4), 476–487 (2005). https://doi.org/10.1109/TSMCC.2004.843247, http://ieeexplore.ieee.org/ document/1522531/ 19. Shi, Y.: A method of discriminating user’s identity similarity based on username feature greedy matching. In: Proceedings of the 2nd International Conference on Cryptography, Security and Privacy, pp. 5-9. ACM, March 2018. https://doi.org/ 10.1145/3199478.3199512 20. de Souza, D.L., Granzotto, M.H., de Almeida, G.M., Oliveira-Lopes, L.C.: Fault detection and diagnosis using support vector machines-a SVC and SVR comparison. J. Safety Eng. 3(1), 18–29 (2014) 21. Stainbrook, M., Caporusso, N.: Convenience or strength? Aiding optimal strategies in password generation. In: Ahram, T.Z., Nicholson, D. (eds.) AHFE 2018. AISC, vol. 782, pp. 23–32. Springer, Cham (2019). https://doi.org/10.1007/978-3-31994782-2 3 22. Tam, L., Glassman, M., Vandenwauver, M.: The psychology of password management: a tradeoff between security and convenience. Behav. IT 29, 233–244 (2010). https://doi.org/10.1080/01449290903121386 23. Umadevi, P., Saranya, V.: Stronger authentication for password using virtual password and secret little functions. In: International Conference on Information Communication and Embedded Systems (ICICES 2014), pp. 1–6. IEEE, February 2014. https://doi.org/10.1109/ICICES.2014.7033936 24. Ur, B., Bees, J., Segreti, S.M., Bauer, L., Christin, N., Cranor, L.F.: Do users’ perceptions of password security match reality? In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, pp. 3748–3760 (2016) 25. Wang, L., Li, Y., Sun, K.: Amnesia: A bilateral generative password manager. In: 36th IEEE International Conference on Distributed Computing Systems, ICDCS 2016, Nara, Japan, June 27-30, 2016. pp. 313–322. IEEE Computer Society (2016). 10.1109/ICDCS.2016.90, https://doi.org/10.1109/ICDCS.2016.90
Tomato Leaf Disease Recognition Using Depthwise Separable Convolution Syed Md. Minhaz Hossain1,2 , Khaleque Md. Aashiq Kamal1 , Anik Sen1,2 , and Kaushik Deb2(B) 1 2
Department of Computer Science and Engineering, Premier University, Chattogram 4000, Bangladesh Department of Computer Science and Engineering, Chittagong University of Engineering and Technology, Chattogram 4349, Bangladesh [email protected] Abstract. Various diseases of plants are the main reason behind reducing production, resulting in a significant loss in agriculture. The evolution of deep learning and its diversification use in different fields extends the opportunity to recognize plant disease accurately. The challenges in plant disease recognition are limited to homogeneous background and high memory for a large number of parameters. In this work, a dataset of 2880 tomato plant images is used to train the depthwise separable convolution-based model to reduce the trainable parameters for memory restriction devices such as mobile. An independent set of test images, including 612 tomato plant images of nine diseases, is used to assess the model under different illumination and orientations. Depthwise Separable Convolution-based tomato leaf disease recognition model entitled reduced MobileNet outperforms according to the trade-off among accuracy, computational latency, and scale of parameters, and achieves 98.31% accuracy and 92.03% F1-score. Keywords: Tomato leaf diseases · Memory size · Computational latency · Depthwise separable convolution · Sustainable accuracy
1
Introduction
One of the most cultivated crops is the tomato in the present world. According to Statista, about 180.77 kilo-metric tons of tomato were cultivated in the year 2019 worldwide [14]. The quantities of production are vastly affected by the diseases of tomato plants. Therefore, early identification of diseases plays an important role in monitoring the plants in agricultural industry. For this reason, different methods are applied in the field of agriculture. The well-known application of chemical ways harms the health of fresh plants and humans and influences the environment negatively. Moreover, these methods increase the cost of tomato productions. In general, the diseases infect the leaves and leaflets, the roots, the stems, and the fruits of the tomato plants. In this study, diseases that are affecting leaves and leaflets are considered. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 341–351, 2022. https://doi.org/10.1007/978-3-030-93247-3_33
342
S. Md. Minhaz Hossain et al.
Machine learning opens the scopes of automated post-harvest monitoring [1], prediction of crop production with respect to weather parameters in [4], plant leaf disease recognition in [8] and guidance of robots in the field of agriculture. Typical machine learning models are suitable and successful in certain conditions, and specific setup. The accuracy of these models decreases considerably in uncontrolled conditions. Considering the diversification of the deep learning model, researchers promoted to apply it to achieve advanced performance in agriculture. However, the uses of deep learning still face some challenges: limitation of device memory (number of parameters), sustainable accuracy (not a fall in testing a new dataset), and latency in computation (floating-point operations and multiply-accumulate operation). Sustainable accuracy is a vital crisis in CNN based plant leaf disease (PLD) recognition models. Adding new PLD images reduces the accuracy [5]. In addition, various works are restricted to symmetric backgrounds [5,15] and responsive to the situations of image capturing [11]. Among all the PLD recognition works, there are two benchmark works for tomato leaf disease recognition in [2] and [3] achieve better accuracy. However, they do not investigate restriction to symmetric backgrounds. Moreover, most of the cutting edge CNN models, such as VGG in [2,5,15], InceptionV4 in [15], AlexNet in [5], DenseNet in [15], InceptionV3, DenseNet201, and custom CNN model in [13], achieved promising accuracy rate for their deep and dense constructions. Though, these models have limitation to space (memory) for mobile and IoT based diseases recognition of plant leaf and costs of computation for faster convergence. We propose a depthwise separable convolution (DSC)-based tomato leaf disease (DSCPLD) recognition model called reduced MobileNet to overcome the mentioned restrictions of present PLD recognition models. Our emphasis is to establish a concrete trade-off among accuracy, number of parameter, and computational latency for mobile/IoT based tomato leaf disease recognition using modification in MobileNet based on [9].
2
Related Work
The manual monitoring of plant diseases is chaotic, hard-working, and challenging. Moreover, the system depends on the situation. Therefore, researchers investigate automatic detection systems to overcome this hectic problem and make the activities of farmers more effective and correct. Several upgrades have been applied in CNN models for detecting PLDs in recent years. Ferentinos et al. [5] developed a CNN model for recognizing 58 diseases of 25 plants. They achieved 99.53% accuracy rates for VGG. On the other hand, accuracy was decreased for unknown data to the training model and reduced by 25–35%. In [15], VGG, ResNet, Inception and DenseNet were used and achieved 99.75% of accuracy for DenseNet for recognizing 38 PLDs of 14 classes. However, the cost of computation is an issue. Liang et al. [11] proposed a modified CNN model for detecting rice blast disease and reached better accuracy than the
Tomato Leaf Disease Recognition Using Depthwise Separable Convolution
343
Table 1. Dataset descriptions of tomato leaf disease recognition model Disease class
#Org. images Distribution techniques Train Validation Test
Bacterial spot
490
320 102
68
Early blight
490
320 102
68
Healthy
490
320 102
68
Late blight
490
320 102
68
Leaf mold
490
320 102
68
M osaic virus
490
320 102
68
Septoria leaf spot
490
320 102
68
Spider mites T wo spotted spide mite early
490
320 102
68
Y ellowLeaf Curl V irus rymildew
490
320 102
68
4410
2880 918
612
Total
feature extraction technique. In this work, the modified CNN model achieved an accuracy of 95.83%. However, this model is sensitive to conditions of image capturing and requires increasing the dataset. On the other side, authors in [6] proposed a novel CNN model based on 4199 images of rice leaf diseases. It recognized diseases of rice by decreasing the network parameters to recognize five types of diseases of rice leaf. Their model achieved the 99.78% as training accuracy and 97.35% as validation accuracy. The authors of [2] applied CNN based model to detect the nine diseases of tomatoes. They used 10000 of tomato images as training and 500 images as testing from the plant village. The accuracy of the model is 91.2%. The authors in [10] studied the computational complexity and memory requirements for PLD recognition. The work in [12] proposed a study among various strategies of pooling techniques, named mean-pooling, max-pooling, and stochastic pooling, to detect rice leaf diseases using CNN models. They achieved 95.48% accuracy for stochastic pooling. The researchers found that it needs to increase the sample sizes for optimizing the number of parameters.
3
Materials and Method
In this section, our proposed model is discussed in detail. 3.1
Dataset
In our work, 2880 original RGB images of tomato plants of nine different diseases are used to train, 918 images are used to validate, and 612 images are used to test the model. The total dataset consists of 4410 images. These images are collected from the PlantVillage dataset1 . The images of nine different class of tomato leaf diseases are shown in Fig. 1. Complete information regarding the tomato leaf disease dataset is described in Table 1. 1
https://www.kaggle.com/emmarex/plantdisease.
344
S. Md. Minhaz Hossain et al.
Fig. 1. Samples of tomato leaf disease images: (a) Bacterial Spot, (b) Early Blight, (c) Healthy, (d) Late Blight, (e) Leaf Mold, (f) Mosaic Virus, (g) Septorial Leaf Spot, (h) Two Spotted Spider Mite, and (i) Yellow Leaf Curl Virus.
3.2
Applying Directional Augmentation to Images
Image captured in different orientations is one of the main issues in tomato leaf diseases recognition system. The features of any image can be spatially transformed due to the relative arrangement of the capturing device. Moreover, it is also problematic to have images from a different angle to overcome the issues [7]. As a result, We applied directional augmentation techniques to increase our dataset, which increases our model’s capacity. An image’s mirror symmetry means to increase all pixels after considering a line as an axis. A vertical line is selected of an image in horizontal mirror symmetry, and then all pixels are increased. On the other hand, A horizontal line of an image is selected in vertical mirror symmetry, and all pixels are increased. 3.3
Applying Lighting Disturbance to Images
The weather condition has a vital role in capturing an image, and image quality is affected by the sunlight, shadow, and cloudy weather. To improve the generalization ability, we create images by changing the sharpness, brightness, and contrast values. Increasing the sharpness of an image indicates intense edges and borders as the objects in that image emerge. Let, Q(x,y) and Q(x, y) = [r(x, y), b(x, y), g(x, y)]T are considered as pixel in a RGB. We use Laplace to pixel for adding sharpness to that image by using Eq. 1. ⎤ ⎡ 2 ∇ [r(x, y)] ∇2 [Q(x, y)] = ⎣∇2 [g(x, y)]⎦ (1) ∇2 [b(x, y)] An image’s brightness means to increase or decrease of values of a pixel in RGB mode. Suppose the original RGB value is B0 and the transformation factor of brightness is d. We get the changed RGB value (B) after using the brightness transformation factor, as shown in Eq. 2.
Tomato Leaf Disease Recognition Using Depthwise Separable Convolution
B = B0 × (1 + d)
345
(2)
An image’s contrast means increasing the bigger RGB value and decreasing the smaller RGB value by considering the brightness median. Suppose, the original RGB value B0 , the transformation of brightness factors is d, and the median of brightness is i. We find the changed RGB value (B) after applying the contrast feature as shown in Eq. 3. B = i + (B0 − i) × (1 + d) 3.4
(3)
Disease Recognition Using Reduced MobileNet
In this section, we describe the basic depthwise separable convolution, basic module of reduced MobileNet, models design and tuning. Depthwise Separable Convolution. Depthwise separable convolution has two convolutions: depthwise convolution and pointwise convolution. It splits 3 × 3 convolutions into a 3 × 3 depthwise convolution and a 1 × 1 pointwise convolution. DSC operation is consists of two steps. Depthwise convolution is a channel-wise convolution. It performs the convolution using individual input channels. Then it performs pointwise convolution, which is similar to traditional convolution with kernel size 1 × 1. Each channel’s output is combined by pointwise convolution. The traditional convolution’s (CostC ) cost of computation is shown in Eq. 4. (4) CostC = M.M.K.K.N.P On the other hand, the cost for depthwise separable convolution (CostD ) is shown in Eq. 5. (5) CostD = M.M.K.K.N + M.M.N.P The traditional convolution’s weight (WC ) is shown in Eq. 6. WC = K.K.N.P
(6)
The depthwise separable convolution’s weight (WD ) is shown in Eq. 7. WD = K.K.N + N.P
(7)
where, N is considered as the number of input channel and P is considered as the number of output channel. K × K is considered as width and height of the kernel and M × M is considered as width and height of the feature map of input. Finally, Eqs. 8 and 9 show the reduction on weights (FW ) and operation (FCost ). 1 WD 1 + 2 = (8) Fw = WC P K
346
S. Md. Minhaz Hossain et al. Table 2. Reduced MobileNet architecture for tomato leaf disease recognition Function
Filter/Pool #Filters Output
Input Convolution Depthwise convolution Pointwise convolution Depthwise convolution Pointwise convolution Global average pooling Dense Softmax
3 3 1 3 1 -
× × × × ×
3 3 1 3 1
32 32 64 64 128 -
224 × 224 32 × 222 × 222 32 × 64 × 64 64 × 64 × 64 64 × 1 × 1 128 × 1 × 1 1 × 1 × 128 1 × 1 × 12 1 × 1 × 12
#Parameters 0 896 32,800 2112 262,208 8320 0 1,161 0
1 CostD 1 + 2 = (9) CostC P K The cost for computing depthwise separable convolution with K × K filter can be reduced K 2 times than the traditional convolutional layer [9]. FCost =
Basic Depthwise Separable Convolution Modules. Two variations of convolution are used in depthwise separable convolution: in first one, pointwise convolution adjacent to depthwise convolution; in another one, batch normalization and ReLU used between each of depthwise convolution and pointwise convolution. From these concepts, we propose reduced MobileNet, based on module in as shown in Fig. 2 for recognizing tomato leaf disease.
Fig. 2. Primary module for tomato leaf disease recognition based on depthwise separable convolution.
Model Design and Tuning. Architecture of the tomato leaf disease recognition model based on MobileNet entitled reduced MobileNet is represented in Table 2 with input size 224 × 224. We split our dataset into three parts: train, validation, and test in the ratio of 70-20-10. RMSprop optimizer with 0.001 learning rate is used. Batch size of 32 is used and the model was trained for 200 epochs.
4
Result and Observation
Recognition of tomato leaf disease experiments are performed on Intel(R) Core i7 8700U 3.2 GHz with 8 GB of RAM. The proposed system is executed with the sklearn packages of Python.
Tomato Leaf Disease Recognition Using Depthwise Separable Convolution
347
4.1 Performance Evaluation To evaluate our proposed reduced MobileNet recognition model’s outcome, we compare it with VGG16, VGG19, and AlexNet based on mean test accuracy and mean F1-score. As the number of samples in the dataset is imbalanced, we use some performance measures, such as mean class accuracy and mean class F1-score. The comparison among tomato leaf disease recognition models with perspective to training accuracy, validation accuracy (val accuracy), mean test accuracy and mean F1-score is as shown in Table 3. Table 3. Performance of various tomato leaf disease recognition models Models
Train accuracy Val accuracy Test accuracy F1-score
VGG16 VGG19 AlexNet Reduced MobileNet
99.45% 99.48% 97.32% 99.23%
99.05% 99.21% 95.12% 98.72%
99.10% 98.21% 94.65% 98.31%
92.54% 90.19% 86.78% 92.03%
Table 4. A concrete representation of computational latency and model size of various tomato leaf disease recognition models
4.2
Models
Image size FLOPs
VGG16 VGG19 AlexNet Reduced MobileNet
180 × 180 180 × 180 224 × 224 224 × 224
MACC
# Parameters
213.5 M 106.75 M 15.2 M 287.84 M 143.92 M 20.6 M 127.68 M 63.84 M 6.4 M 3.70 M 2.15 M 0.31 M
Selection of the Best Model Based on All Criteria
From Table 3, it is shown that VGG16 performs better mean test accuracy of 99.10% and F1-score 92.54% on our tomato dataset. It is 0.79% better in accuracy and 0.51% in F1-score than our proposed model. However, VGG16 requires almost 49 times more parameters than our proposed recognition model, as shown in Table 4. Considering all factors included in Tables 3 and 4, reduced MobileNet is best among all the tomato leaf disease recognition models for mobile and IoTbased recognition. Performances of each class for tomato leaf disease recognition are shown in Table 5. The confusion matrix, Accuracy vs epoch curve and Loss vs epoch curve of reduced MobileNet are shown in Figs. 3 and 4(a–b).
348
S. Md. Minhaz Hossain et al.
Table 5. Accuracy, Precision, Recall and F1-score of each classes of tomato leaf disease Class
Accuracy Precision Recall
T omato bacterial spot
97.71%
86%
96%
F1-score Support 90%
68
T omator early blight
98.04%
93%
78%
85%
68
T omato healthy
97.71%
82%
94%
88%
68
T omato late blight
98.69%
97%
91%
94%
68
T omato leaf mold
96.89%
89%
82%
85%
68
T omato mosaic virus
98.53%
88%
100%
94%
68
T omato septoria leaf spot
99.35%
100%
94%
97%
68
T omato Spider mites T wo spotted spider mite 99.35%
100%
94%
97%
68
T omato Y ellow Leaf Curl V irus rymildew
98.53%
93%
94%
93%
68
Total
98.31%
93.33 %
90.78% 92.03%
612
Fig. 3. The confusion matrix of reduced MobileNet model.
Fig. 4. (a) Accuracy vs epoch curve for tomato leaf disease recognition model and (b) Loss vs epoch curve for tomato leaf disease recognition model.
Tomato Leaf Disease Recognition Using Depthwise Separable Convolution
4.3
349
Processing Steps Using Our Reduced MobileNet Model
A processing example of tomato leaf image using reduced MobileNet is depicted in Fig. 5(a–e) with some activations on each of the layers.
Fig. 5. Activations on: (a) convolution layer; (b) first depthwise convolution layer; (c) first pointwise convolution layer; (d) second depthwise convolution layer; (e) second pointwise convolution layer.
4.4
Evaluation of Generalization for Our Proposed Model
For evaluation of generalization in our reduced MobileNet, we test this model using a tomato leaf disease dataset taiwan.7z (https://data.mendeley.com/ datasets/ngdgg79rzb/1/files/255e82b6-2b3a-41d2-bd07-7bfaf284a533 (accessed on 17 February 2021)). We consider only tomato bacterial spot, tomato healthy, and tomato late blight images for testing our DSCPLD model. There are 493 infected tomato leaf images, including 176 bacterial spot images, 160 healthy images, and 157 late blight images. Reduced MobileNet achieves the best mean test accuracy of 92.45% for recognizing the three tomato disease classes, and accuracy falls down 5.86% less than testing with our dataset, as shown in Table 6. Table 6. Evaluation for generalization using various optimizers Datasets
SGD
Adam
RMSprop
Tomato dataset 78.75% 84.34% 92.45% 80.06% 92.21% 98.31% Our dataset
4.5
Comparison
In our work, we investigate a fall in accuracy for testing a new set of tomato images. However, generalization is better than the work in [5]. By performing our proposed recognition model, it is proved that we can reduce the computational latency and memory spaces for mobile and IoT-based tomato leaf disease recognition than other benchmark CNN models, as shown in Table 4. Comparison with the other state-of-the-art work is shown in Table 7; where, NR = not resolved, R = resolved, PR = partially resolved.
350
S. Md. Minhaz Hossain et al.
Table 7. Comparison of the performances between our model and a benchmark model Reference Classes Models
Generalization Complexity Memory
Accuracy
[2]
NR
10
Our work
5
9
Custom
Reduced MobileNet R
NR
R
92.23%
R
R
98.31%
Conclusion and Future Work
Precision agriculture is a crucial point in the agro-industry. Improvements in technologies make it easy to detect and classify diseases accurately. However, in precision agriculture, sustainable accuracy, complexity analysis for detection time and memory size are also becoming important factors. In our work, we include images under uneven illumination and different orientations, making the model more efficient to trace the tomato leaf diseases appropriately. However, accuracy falls at 5.86% using reduced MobileNet in case of testing new data from another dataset. This model provides better performance than [5] in terms of rate of fall in accuracy. Besides, reduced MobileNet is very effective for mobile and IoT-based tomato leaf disease recognition due to the lower network parameters of the model and lower computational cost. Further, we will focus on the stages of tomato leaf diseases to visualize the symptoms’ changes with time.
References 1. Vasilyev, A.A., Samarin, G.N., Vasilyev, A.N.: Processing plants for post-harvest disinfection of grain. In: Vasant, P., Zelinka, I., Weber, G.-W. (eds.) ICO 2019. AISC, vol. 1072, pp. 501–505. Springer, Cham (2020). https://doi.org/10.1007/ 978-3-030-33585-4 49 2. Agarwal, M., Singh, A., Arjaria, S., Sinha, A., Gupta, S.: ToLed: tomato leaf disease detection using convolution neural network. Procedia Comput. Sci. 167, 293–301 (2020) 3. Ashok, S., Kishore, G., Rajesh, V., Suchitra, S., Sophia, S.G.G., Pavithra, B.: Tomato leaf disease detection using deep learning techniques. In: 2020 5th International Conference on Communication and Electronics Systems (ICCES), pp. 979–983 (2020). https://doi.org/10.1109/ICCES48766.2020.9137986 4. Borse, K., Agnihotri, P.G.: Prediction of crop yields based on fuzzy rule-based system (FRBS) using the Takagi Sugeno-Kang approach. In: Vasant, P., Zelinka, I., Weber, G.-W. (eds.) ICO 2018. AISC, vol. 866, pp. 438–447. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-00979-3 46 5. Ferentinos, K.P.: Deep learning models for plant disease detection and diagnosis. Comput. Electron. Agric. 145, 311–318 (2018) 6. Hossain, S.M.M., et al.: Rice leaf diseases recognition using convolutional neural networks. In: International Conference on Advanced Data Mining and Applications, pp. 299–314 (2021) 7. Hossain, S.M.M., Deb, K., Dhar, P.K., Koshiba, T.: Plant leaf disease recognition using depth-wise separable convolution-based models. Symmetry 13(3), 511 (2021)
Tomato Leaf Disease Recognition Using Depthwise Separable Convolution
351
8. Hossain, S.M.M., Deb, K.: Plant leaf disease recognition using histogram based gradient boosting classifier. In: Vasant, P., Zelinka, I., Weber, G.-W. (eds.) ICO 2020. AISC, vol. 1324, pp. 530–545. Springer, Cham (2021). https://doi.org/10. 1007/978-3-030-68154-8 47 9. Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications (2017) 10. Kaur, S., Pandey, S., Goel, S.: Plants disease identification and classification through leaf images: a survey. Arch. Comput. Methods Eng. 26, 507–530 (2019) 11. Liang, W.J., Zhang, H., Zhang, G.F., Cao, H.X.: Rice blast disease recognition using a deep convolutional neural network. Sci. Rep. 9(1), 1–10 (2019) 12. Lu, Y., Yi, S., Zeng, N., Liu, Y., Zhang, Y.: Identification of rice diseases using deep convolutional neural networks. Neurocomputing 267, 378–384 (2017) 13. Patidar, S., Pandey, A., Shirish, B.A., Sriram, A.: Rice plant disease detection and classification using deep residual learning. In: Bhattacharjee, A., Borgohain, S.K., Soni, B., Verma, G., Gao, X.-Z. (eds.) MIND 2020. CCIS, vol. 1240, pp. 278–293. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-6315-7 23 14. Shahbandeh, M.: Vegetables production worldwide by type 2019. https://www. statista.com/statistics/264065/global-production-of-vegetables-by-type/ 15. Too, E.C., Yujian, L., Njuki, S., Yingchun, L.: A comparative study of fine-tuning deep learning models for plant disease identification. Comput. Electron. Agric. 161, 272–279 (2019)
End-to-End Scene Text Recognition System for Devanagari and Bengali Text Prithwish Sen(B) , Anindita Das , and Nilkanta Sahu Indian Institute of Information Technology, Guwahati, India {prithwish.sen,anindita.das,nilkanta}@iiitg.ac.in http://iiitg.ac.in/faculty/nilkanta/ Abstract. Scene text detection and recognition have been explored extensively in the recent past but very few among those are in Indian languages. In this paper, an end-to-end system for the detection and recognition of Devanagari and Bengali text from scene images is proposed. The work is done in three stages namely detection, matra removal, and character recognition. Firstly, the PP-YOLO network for scene text detection is used. As both the languages under consideration have matra-line, a U-Net based matra-line removal strategy is used next. U-Net almost achieves 100% accuracy for segmentation, which contributes to the overall performance of the proposed scheme. Finally, character recognition is done with the help of a CNN. To train the CNN a dataset is created with Devanagari and Bengali text images. Experimental results show the efficiency of the individual stages as well as the efficiency of the scheme as a whole. Keywords: OCR PP-YOLO
1
· Bengali · Devanagari · Matra · U-Net · CNN ·
Introduction
Scene text holds enormous semantic information which helps to acknowledge the scene better. Because of this, detection and recognition of text in natural scenes drew significant attention from the computer vision community in the last decade. The problem with the natural scene image is that they exhibit varied illuminations, distortions, fonts, background, etc. Most of the existing work done on natural scene text detection and recognition primarily focuses on English, but minimal works were made with Indian languages. Indian languages have vast diversity in its pattern or way of representation. This wide variance of Indian languages makes it more challenging to find a unique solution for OCR. As far as OCR is concerned, the Bengali (Bangla) and Devanagari language got a good share of attention from the researcher in this regard. Bengali and Devanagari are spoken by 210 million and 182 million people all over the world respectively. These two languages are also the most spoken languages in India. Bengali is the national language of Bangladesh. Usually, Indian languages contain vowels, consonants, and compound letters. Compound c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 352–359, 2022. https://doi.org/10.1007/978-3-030-93247-3_34
End-to-End Scene Text Recognition System for Devanagari
353
letters are a combination of two or more letters and have different shapes. These languages do not have any upper/lower case. Most of the characters in a word are connected by horizontal lines called the ‘MATRA’ line. Bengali and Devanagari (and other Indian languages) OCR is more challenging than the English language because of the large set of compound letters and also matra’s makes it difficult to segment the characters. The process of scene text detection/recognition involves many functional steps such as text detection, word segmentation, character segmentation, OCR engine, or character classification. OCR is the process of automatic recognition of text from images of text. In the proposed scheme, We considered text as an object and applied one of the efficient object detection algorithms i.e. PP-YOLO for text detection [14]. The recognition task is divided into two sub-tasks, first matra removal and then character recognition. For matra removal, probably for the first time, we used U-net architecture.
2
Literature Review
The problem of scene text recognition mainly consists of two sub problems, firstly scene text detection and secondly text recognition. Some researchers tried to solve these two tasks separately, where some approached it as a single problem. For scene text detection initially hand-crafted statistical features were used. Approaches based on statistical Features [25] and intelligence of humans were used to evaluate text detection and text recognition. These Hand Crafted schemes use properties like Stroke Width Transform [17], where an image operator is used to evaluate the stroke width of pixels. In 2011, Hog-Features [22], a method presented by the authors to spot the words from an unconstrained environment. MSERs [16] which includes a selector to exploit properties of text that are of higher-order in nature. Mishra et al. [13] used an English dictionary to compute higher-order prior to estimate recognition. With the rise of deep learning algorithms, researchers started exploring deep learning based schemes [18] and got significant success. A Sanskrit targeted OCR system [6] is proposed where an attention-based LSTM model is used for extracting Sanskrit characters. Azeem et al. [4] detects counter also segments digits and recognizes them using a Mask-RCNN. In the year 2021 [23], a spatial graph-based CNN to identify paragraphs is proposed. This process involves 2 steps as line splitting and clustering. Ghosh [7] and his group tried to represent the semantic information of the movie poster by transfer learning to recognize text from graphics-rich posters. Recently OCR systems seem to include training of both detection and recognition [21] altogether. In 2017 [24], CNN and RNN based end to end model to recognize french with the extraction of textual content are proposed. Again, RCNN based deep network to address text detection and recognition of scene images [11] with an evaluation of features only once is introduced. Zhang [26] reinforces text reading and information extraction together with a fusion of textual features and multimodal visual. In the year 2021, Huang et al. [9] proposed an end-to-end Multiplexed Multilingual Mask
354
P. Sen et al.
TextSpotter which helps in the identification of scripts at word level and also helps to recognize.
3
Proposed Scheme
In the proposed scheme (Fig. 1), an image which is primarily a natural scene image with/without text content is fed into the system. The system first localized the word in the scene followed by several other steps to recognize each and every character at the scene text. The detailed methodology is described below.
Fig. 1. System diagram
3.1
Text Detection
Considering text detection as a regression problem so as to specify separate bounding boxes and associated confidence score. A neural network is trained to predict text bounding boxes and confidence scores directly from natural scene text images in one evaluation. The PP-YOLO model [12] is trained with 2000 Bengali natural scene text images, IIIT-ILST [1]. It is obvious that lower the FPS higher the accuracy of the detection scheme. PP-YOLO produces localization errors but it is unlikely to forecast false positives in the background. 3.2
Matra Line Removal Using U-Net
After detecting the text, a top-down segmentation approach is applied to extracts the characters. In most of the Bengali and Devanagari characters, the presence of the matra line creates trouble in separating the characters in a word. So removing the matra line [19] can easily isolate the characters in a word.
End-to-End Scene Text Recognition System for Devanagari
355
To remove matra line we used U-Net model, which had been used successfully for image segmentation for other application [20]. U-Net takes two sets of input, original text image, and Images with only matra lines. The model is trained for these data and which reflects the change in the two images as an output. It contains an expansive path and a contracting path i.e. right side and the left side. The left side reproduces the typical model of a CNN. Each of these downsampling steps in the network, the feature channels gets twice. The right side path step follows upsampling of the feature channels which gets halved. Trimming is essential because of the deprivation of border pixels in each of convolution. Finally, mapping feature vectors to the corresponding classes is done. In total, our model contains 23 convolutional layers. Thus when this model is tested on sample images it removes the matra line with 100% accuracy. 3.3
Character Segmentation and Classification
After the matra line is removed from the word, the word undergoes character segmentation with connected component analysis. It finds contour for each character and separates them. Connected Component is the process of finding connected edges in a given image with the same pixel intensity. Hence, a distinguishable boundary between pixels of the same intensity and with different intensities is found. Finally, each character from the image is separated by a distinguishable boundary based on the region of interest. A multilingual OCR system generally uses one of two approaches. I) It includes a classifier for all the character sets of all languages under consideration. II) It designs a separate classifier for each language which is preceded by a language classifier. In the proposed scheme we used the first method.
Fig. 2. CNN block diagram
Segmented Characters from above fed into classifiers as shown in Fig. 2 and outputs the predicted class for each character of the input image. At first, input characters are pre-processed with grayscale conversion and histogram equalization. A convolutional neural network takes the input and generates feature maps for the corresponding input. ReLU is used as an activation function for each
356
P. Sen et al.
convolutional layer which eliminates the vanishing gradient problem. After successful training, we predict which character class the input characters belong to with the softmax function.
4
Experimental Result
To train our text detection module we used IIIT-ILST [1] dataset, along with 2000 collected scene images. For recognition module a combination of BanglaLekhaIsolated [2], ISI Bengali [3], Devanagari [15] and ours collection of printed datasets are used. The whole process of scene text recognition can be realized from Fig. 3.
Fig. 3. Scene text recognition pipeline
For text detection performance, precision and recall metrics are evaluated. Further, to find the average precision, the area under the curve plotted as in Fig. 4 must be evaluated. This is done by finding 11 point segmentation of recall and interpolated precision. Table 1 shows the different values of segmented recall and interpolated precision. Hence, the resulting Average Precision is found out to be 90.32%. Table 1. Segmented recall and interpolated precision Segmented recall
0 0.1 0.2
Interpolated 1 1 precision
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.83 0.84 0.86 0.86 0.86 0.83 0.88 0.976 1
End-to-End Scene Text Recognition System for Devanagari
357
Fig. 4. Performance evaluation plot
To check the versatility of the proposed architecture, the whole recognition dataset has been divided into training set (85%) and testing set (15%). Firstly, we trained our model with the text detection dataset mentioned above. The detection pipeline is trained with the help of darknet and was done with NVIDIA’s Tesla K80 GPU. Secondly, for recognition with mixed Dataset taking a batch size of 100 and the number of epochs equals 60 with the learning rate of the model being 0.001. The proposed model achieves 98.23% in terms of accuracy which shows that the model is performing well with different languages. The proposed model almost achieved 100% accuracy in matra line removal using the U-Net model. The results of implementing the proposed model on the datasets are portrayed along with the comparison in Table 2. Table 2. Model accuracy comparison Language
Other works Our method
Bengali and Devanagari 92.00% 95.39% 80.02% 86.21%
5
[8] [10] [25] [5]
98.89%
Conclusion
In this paper, we considered two different Indian languages i.e. Devanagari and Bengali in natural scene text images. The proposed scheme consists of a threestaged pipeline for detection and segmentation followed by an end-to-end Optical Character Recognition (OCR) system. Firstly, for detecting the text in a natural scene text image a neural network trained with darknet is used. As both the languages under consideration have matra-line, a U-Net based matra-line removal
358
P. Sen et al.
strategy is used for the first time. Finally, the character recognition pipeline is trained with our created mixed dataset. Furthermore, our proposed method outperforms other existing approaches in terms of accuracy. In the future, we are interested to carry out further testing the performance of the proposed scheme with other Indian languages too.
References 1. 2. 3. 4.
5.
6.
7.
8.
9. 10.
11.
12. 13.
14.
15.
http://cvit.iiit.ac.in/research/projects/cvit-projects/iiit-ilst https://data.mendeley.com/datasets/hf6sf8zrkc/2 https://www.isical.ac.in/∼ujjwal/download/SegmentedSceneCharacter.html Azeem, A., Riaz, W., Siddique, A., Saifullah, U.A.K.: A robust automatic meter reading system based on mask-RCNN. In: 2020 IEEE International Conference on Advances in Electrical Engineering and Computer Applications (AEECA), pp. 209–213. IEEE (2020) Bhunia, A.K., Kumar, G., Roy, P.P., Balasubramanian, R., Pal, U.: Text recognition in scene image and video frame using color channel selection. Multimedia Tools Appl. 77(7), 8551–8578 (2018) Dwivedi, A., Saluja, R., Sarvadevabhatla, R.K.: An OCR for classical Indic documents containing arbitrarily long words. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 560–561 (2020) Ghosh, M., Roy, S.S., Mukherjee, H., Obaidullah, S.M., Santosh, K., Roy, K.: Understanding movie poster: transfer-deep learning approach for graphic-rich text recognition. Vis. Comput. 37, 1–20 (2021) Ghoshal, R., Roy, A., Parui, S.K., et al.: Recognition of Bangla text from outdoor images using decision tree model. Int. J. Knowl. Based Intell. Eng. Syst. 21(1), 29–38 (2017) Huang, J., et al.: A multiplexed network for end-to-end, multilingual OCR. arXiv preprint arXiv:2103.15992 (2021) Islam, R., Islam, M.R., Talukder, K.H.: Extraction and recognition of Bangla texts from natural scene images using CNN. In: El Moataz, A., Mammass, D., Mansouri, A., Nouboud, F. (eds.) Image and Signal Processing, pp. 243–253. Springer International Publishing, Cham (2020) Li, H., Wang, P., Shen, C.: Towards end-to-end text spotting with convolutional recurrent neural networks. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 5248–5256 (2017). https://doi.org/10.1109/ICCV.2017.560 Long, X., et al.: PP-YOLO: an effective and efficient implementation of object detector. arXiv preprint arXiv:2007.12099 (2020) Mishra, A., Alahari, K., Jawahar, C.: Scene text recognition using higher order language priors. In: BMVC - British Machine Vision Conference. BMVA, Surrey, UK, September 2012. https://doi.org/10.5244/C.26.127, https://hal.inria.fr/hal00818183 Naosekpam, V., Kumar, N., Sahu, N.: Multi-lingual Indian text detector for mobile devices. In: Singh, S.K., Roy, P., Raman, B., Nagabhushan, P. (eds.) Computer Vision and Image Processing, pp. 243–254. Springer, Singapore (2021) Narang, V., Roy, S., Murthy, O.R., Hanmandlu, M.: Devanagari character recognition in scene images. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 902–906. IEEE (2013)
End-to-End Scene Text Recognition System for Devanagari
359
16. Neumann, L., Matas, J.: Text localization in real-world images using efficiently pruned exhaustive search. In: 2011 International Conference on Document Analysis and Recognition, pp. 687–691 (2011). https://doi.org/10.1109/ICDAR.2011.144 17. Ofek, E., Epshtein, B., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2963–2970. IEEE Computer Society, Los Alamitos, CA, USA, June 2010. https://doi.org/10.1109/CVPR.2010.5540041 18. Peng, X., Wang, C.: Building super-resolution image generator for OCR accuracy improvement. In: Bai, X., Karatzas, D., Lopresti, D. (eds.) DAS 2020. LNCS, vol. 12116, pp. 145–160. Springer, Cham (2020). https://doi.org/10.1007/978-3-03057058-3 11 19. Rahman, A., Cyrus, H.M., Yasir, F., Adnan, W.B., Islam, M.M.: Segmentation of handwritten Bangla script. In: 2013 International Conference on Informatics, Electronics and Vision (ICIEV), pp. 1–5 (2013). https://doi.org/10.1109/ICIEV. 2013.6572635 20. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4 28 21. Subedi, B., Yunusov, J., Gaybulayev, A., Kim, T.H.: Development of a low-cost industrial OCR system with an end-to-end deep learning technology. IEMEK J. Embed. Syst. Appl. 15(2), 51–60 (2020) 22. Wang, K., Belongie, S.: Word spotting in the wild. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6311, pp. 591–604. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15549-9 43 23. Wang, R., Fujii, Y., Popat, A.C.: General-purpose OCR paragraph identification by graph convolution networks. arXiv preprint arXiv:2101.12741 (2021) 24. Wojna, Z., et al.: Attention-based extraction of structured information from street view imagery. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 844–850 (2017). https://doi.org/10.1109/ ICDAR.2017.143 25. Yao, C., Bai, X., Shi, B., Liu, W.: Strokelets: a learned multi-scale representation for scene text recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 25, pp. 4042–4049, June 2014. https://doi. org/10.1109/CVPR.2014.515 26. Zhang, P., et al.: TRIE: end-to-end text reading and information extraction for document understanding. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 1413–1422 (2020)
A Deep Convolutional Neural Network Based Classification Approach for Sleep Scoring of NFLE Patients Sarker Safat Mahmud1(B) , Md. Rakibul Islam Prince1 , Md. Shamim2 , and Sarker Shahriar Mahmud3 1
3
Mechatronics and Industrial Engineering, Chittagong University of Engineering and Technology, Chattogram 4349, Bangladesh 2 Computer Science and Engineering, Khulna University of Engineering and Technology, Khulna 9203, Bangladesh Mechanical Engineering, Bangladesh University of Engineering and Technology, Dhaka 1000, Bangladesh
Abstract. Sleep stages classification problem is very much important for diagnosing any kind of sleep-related disease. The solution of this classification is a very sensitive work that has been done so far manually by experts. Now from the advancement of Machine Learning (ML) in every expect, every classification problem has been done now automatically by some ML model. Sleep scoring is also not different from that. But most of the approaches for sleep scoring are done for normal patients. As a result, this paper uses a state-of-the-art Deep Convolutional Neural Network model to solve the problem. We didn’t use any kinds of Polysomnography (PSG) files which are used traditionally instead our approach uses only raw data from the EEG electrodes. It can classify the sleep stage of the patients suffering from Nocturnal Frontal Lobe Epilepsy (NFLE). All kinds of latest data analysis tools are used in this approach.
Keywords: NFLE MNE module
1
· CNN · Sleep stage classification · Hypnogram ·
Introduction
Sleep is one of the most important actions in our daily routine. Sleeping takes up a significant portion of one’s day. Sleep is a restorative period when almost all body repairing activities are done. It enables the body to remain fit for the next day. It lifts one’s mood, improves thinking and memory, reduces stress and blood pressure, and eventually boosts the immune system. Any kind of sleep disturbances and disorders can hamper a man’s lifelong activity. Sleep apnea is the most dangerous symptom of a sleep disturbance, as it can lead to serious illnesses like high blood pressure, stroke, heart failure, irregular heartbeats, and c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 360–369, 2022. https://doi.org/10.1007/978-3-030-93247-3_35
A Deep Convolutional Neural Network Based Classification Approach
361
heart attacks, as well as Alzheimer’s disease [1–7]. So every aspect of sleep has been thoroughly explored. Classification of sleep stages is done manually by human experts [8]. CNN performed better than the other methods and obtained high performance for sleep stage classification in EEG recordings with one-dimensional CNN models yielding higher accuracy [9]. Sleep stage diagnosis is both expensive and inconvenient for patients. Patients must stay in the sleep lab for numerous nights to complete the diagnosis, which is inconvenient. Sleep labs require a highly regulated atmosphere, human experts, and high-tech devices, and patients must stay for multiple nights to complete the diagnosis, which is inconvenient. As a result, portable gadgets are substantially more cost-effective and convenient for those patients [9]. They also analyzed the limitations and capabilities of the Deep Learning (DL) approach in sleep stage categorization during the last 10 years, concluding that DL is the best tool for sleep stage score among various artificial intelligence technologies. A recurrent Neural Network (RNN) is one of the numerous forms of Deep Learning Techniques that can predict future states based on the prior sequential stream of data. As a result, RNN is used in time series applications such as capturing the stage transition of the sleep stages in [10]. In [11], automatic sleep stage scoring is done based on raw single-channel EEG and they used a new hybrid architecture with CNN and bidirectional LSTM in the MASS and Sleep-EDF datasets. In [12] paper, Artificial Neural Network has been used to detect epilepsy patients from EEG signals. On the Sleep Heart Health Study (SHHS) database, a CNN-based algorithm was utilized to automatically classify sleep stages using cardiac rhythm from an ECG signal, and it achieved 0.77 accuracies [13]. The microstructure of sleep, which includes CAP and arousal, is a major topic of development and research [9]. Because of the short duration time, conventional sleep scoring often misses this microstructure detection [9]. The CAP is an EEG activity that causes sleep instability and sleeps disturbance [14]. CAP occurs frequently throughout non-REM stages of sleep. CAP levels are frequently elevated in epileptic illnesses such as “Nocturnal Frontal Lobe Epilepsy (NFLE)” [14]. Consequently, the traditional hypnograms of NFLE sufferers and healthy people are not the same. Till now there are very little researches which have been done on normal people’s sleep scoring and among them, there is hardly any work that has been done especially only for NFLE patients as well as none of them used deep learning techniques. Even though the CAP’s arrival has a significant impact on sleep phases, only a little work has been done for the automatic sleep scoring for these kinds of patients. This paper focuses not only on developing an automatic sleep stage scoring technique for NFLE patients but also on bringing a Deep Neural Network (DNN) into the action.
362
2 2.1
S. S. Mahmud et al.
Background Sleep Scoring
Experts have categorized all stages of sleep into two categories: non-REM and REM (Rapid Eye Movement). Each type is associated with distinct brain activity. The brain becomes more active in the REM stage, and the eye movement is significantly faster than in the non-REM period when the eye moves slowly. The non-REM sleep comes first, followed by shorter REM sleep. By this, the REM duration increases each time the cycle repeats. To enter the REM stage, the sleep cycle must pass through 3 or 4 phases in the non-REM period, depending on the scoring system. These non-REM periods are divided into light and deep sleep. Eye movement and muscular action slow down in the Light non-REM stage. During the Deep Non-REM period, the body is repaired, muscles and bones are built. The immune system improves and the body temperature drops down significantly. Heart rates and respiration increase during the REM stage. In the REM period, our brain becomes far more active. This period is marked by intense dreams. Adults spend 20% of their entire sleep time in the REM state, while babies spend 50% of their time there [15]. The electroencephalogram, or EEG, is used to assess electrical activity in the brain, which is used to score sleep. Other biological signals include the electrooculogram (EOG) for eye movement and the electromyogram (EMG) for muscle tone measurement. They’re useful for scoring as well. This type of sleep study is called Polysomnography (PSG), a parametric type of sleep study. So based on this evaluation of the PSG data, the sleep scoring is followed mainly by two standards. In 2007, the American Academy of Sleep Medicine (AASM) divided sleep stages into five groups [16], and before that, the scoringbased standards were dominated by Rechtschaffen and Kales (R&K, 1966) [17]. According to R & K rules, sleep stages are classified into seven divisions. Wake, stage 1, stage 2, stage 3, stage 4, stage REM, and movement time are the different stages. On the other side, AASM standards modified the guidelines and kept 5 important stages from the R&K rules [18]. S1 to S4 are referred to as N1, N2, N3 in the AASM classification. The AASM standard is a simplified version of the R&K regulations. However, in the vast majority of situations, these two criteria are used to score sleep. The CAP sleep database that has been used also followed the R&K scoring excluding the movement time. 2.2
Convolutional Neural Network
Convolutional Neural Networks (CNN) are widely utilized for pattern recognition tasks including image analysis and other computer vision applications. A CNN is a type of artificial neural network that belongs to the machine learning and artificial intelligence domains (AI). The special ability of pattern recognition has made CNN much more powerful than the other AI techniques. The hidden layers of CNN consist of convolution layers which help it to learn the patterns in the input data. Those patterns could be image fragments, allowing CNN to compare
A Deep Convolutional Neural Network Based Classification Approach
363
them and see if they’re present in other images or the same one. In the context of machine learning, these little pieces are referred to as “features.” The same approach can be used to reduce one-dimensional data sequences, known as 1D CNN. CNN, like other deep learning approaches, learns these features and then performs other operations like pooling and classification using fully connected layers following convolution with filters [19]. 2.3
Electroencephalogram (EEG)
The EEG test looks for abnormalities in brain waves or electrical activity. Electrodes composed of small metal discs with thin wires are deposited onto the scalp during the process. The electrodes detect minute electrical charges produced by brain cells’ activity. The most common technology used in sleep study is the electroencephalogram (EEG). The approaches can be extended for application in various disciplines of neuroscience research, while the focus is on sleep research. One of the physiological changes that occur during sleep is an alteration in the electroencephalogram (EEG) signal, which can be utilized to identify and monitor sleep apnea occurrences. According to research published in The Neurodiagnostic Journal [20], electroencephalography, or EEG, technology that analyzes brain function could enable earlier identification of common mental and neurological illnesses such as autism, ADHD, and dementia. This EEG test is the most popular method of diagnosing epilepsy.
3
Deep CNN Based Sleep Stage Classification
The classifier model is only the core of the whole architecture. There are several stages before attaining the last classification. The whole process is being illustrated by Fig. 1.
Known EEG signal of 6 specific channel from each patients 8 hr sleep EDF file
Noise reduction and resample every signal to 256 Hz
Convert EDF file to CSV dataframe
Build Model with 1D CNN
Train model with all patients dataset
Make the whole dataset for Deep Learning Classification
Evaluate Model
Compare all the models and keep the best model
Make Prediction
Fig. 1. System diagram of deep CNN based sleep stage classification.
364
3.1
S. S. Mahmud et al.
System Architecture
The main objective of this research is to employ Deep Convolutional Network to develop an autonomous sleep stage scoring system for NFLE patients. The CAP sleep database (CAPSLPDB) is used here which is the only CAP detection database that provides the PSG recordings of CAP detection [14]. In this database, sleep stages are classified according to R&K rules. It is a collection of 108 polysomnographic (PSG) recordings registered at the Sleep Disorders Center of the Ospedale Maggiore of Parma, Italy [21]. As the model is now in its preparatory stage, there are many scopes for further research. The CAP sleep database has 40 NFLE patients of age group between 14 to 67. As this research focuses on people aged from 14 to 27, fifteen people fall within this category having a similar kind of hypnogram. From the annotations, each person’s hypnogram for around 8 h was generated. Hand-engineered features and expert personnel were responsible for the data annotation task. The generated PSG file consisted of 22 channels and among them, only 6 channels were elected according to the significance level. Since this is a classification problem Deep CNN was applied to make this task automatic. Through the hypnogram generated from the model, future sleep stages will be predicted. 3.2
Pre-processing of Data
While this research work faced several problems including timestamp and frequency discrepancy, various preprocessing methods were used to address those issues. To begin with, the hand-engineered data annotations and the EDF file do not have the same timestamp. For instance, in many EDF files, there were more data than its respective annotation file. To fix that issue, unscored data were manually removed from the corresponding patient’s EDF file. As previously mentioned 6 most significant EEG channels are ROC–LOC, FP2–F4, F4–C4, F7–T3, F3–C3, T4–T6. Besides, some EDF files have a sample rate 512 Hz, whereas others have a sample rate 256 Hz. So, in the latter step, all the data were taken at a constant sample rate 256 Hz by downsampling the higher one. “MNE” and “pyediflib” are two key Python modules that were used for extracting, analyzing, and visualizing EDF files. The MNE package in Python is used to extract all of the EDF files and transfer them into DataFrame. By using the MNE module those EDF files were read and all of the EEG sensor records were analyzed. All the signals had a frequency range of 0.3 to 30 Hz because a bandpass filter was used at the beginning for noise reduction. The values of the 6 channels in each patient’s EDF file were then sent to the 1D CNN model. 3.3
Model Architecture
Deep CNN Layers: The 1D CNN differs slightly from the standard 2D CNN. However, the end goal remains the same. Instead of using images, the raw signal data will be used to find patterns. The varying amplitude values of the wave function can be found in the raw EDF file of the EEG signals. The frequency
A Deep Convolutional Neural Network Based Classification Approach
365
band alpha (8–12 Hz) is more accessible in the wake stage, according to the [22]. As sleep deepens, the frequency band is known as theta (4–8 Hz) and delta (4–8 Hz) becomes less prominent (0.5–4 Hz). With the model’s 1D convolution process, these wave function’s cyclic occurrences are detected as patterns. The convolution layer mainly tries to recognize the features by learning those patterns from the changes in wave function amplitude. The first layers search for patterns in smaller ranges. It tries to identify patterns from a larger timestamp as the layers get deeper. By adjusting several hyperparameters, four Conv1D layers were eventually utilized. The following filters make up these four layers: 64, 128, 256, and 512. For the convolution procedure, a 3 ranked tensor was used as the kernel size. The pooling layer is introduced to the model after each convolution 1D layer, resizing the distorted values from the signal. Since it also shrinks down the maximum values and preserves the major information, therefore, less computational power is needed. The model includes a dropout layer after each convolution and pooling layer. After each epoch, the model tries to learn the features (in our case the patterns of the signals), and then it moves on to the fully connected layers, where the main task is to classify the patterns based on the annotations (Table 1). Table 1. Model summary Layer (type)
Output shape
Param
conv1d (Conv1D) max pooling1d (MaxPooling1D) dropout (Dropout) conv1d 1 (Conv1D) max pooling1d 1 (MaxPooling1D) dropout 1 (Dropout) conv1d 2 (Conv1D) max pooling1d 2 (MaxPooling1D) dropout 2 (Dropout) conv1d 3 (Conv1D) max pooling1d 3 (MaxPooling1D) dropout 3 (Dropout) flatten (Flatten) dense (Dense) dense 1 (Dense) dropout 4 (Dropout) dense 2 (Dense)
(None, (None, (None, (None, (None, (None, (None, (None, (None, (None, (None, (None, (None, (None, (None, (None, (None,
1216 0 0 24704 0 0 98560 0 0 393728 0 0 0 6094976 8256 0 390
Total params: 6,621,830 Trainable params: 6,621,830 Non-trainable params: 0
7678, 64) 2559, 64) 2559, 64) 2557, 128) 852, 128) 852, 128) 850, 256) 283, 256) 283, 256) 281, 512) 93, 512) 93, 512) 47616) 128) 64) 64) 6)
366
S. S. Mahmud et al.
Hyperparameters: Check Table 2 for the hyperparameters that have been used in the model Table 2. Hyperparameters of the DNN based model Parameters
Status
Optimizer
Adam
Loss function
Categorical Cross-entropy
Batch Size
32
Epoch
50
Learning rate
1e−4
Activation function
ReLU, Softmax
The number of nodes at the output layer 6 The number of nodes at the input layer
(7680,6)
Activation Function: The activation function plays an important role in the neural network. In a Neural network, the weights and biases are updated based on the error of an output. The activation function makes this back-propagation possible and because of it, the neuron is decided to be activated or not in a layer. As a result, because of it, the network introduces with non-linearity. The proposed Deep CNN used mainly two activation functions: ReLU and Softmax. There are also other powerful activation functions such as Sigmoid, Softplus, Tanh, and Exponential function, etc. – Rectified Linear Units (ReLU): The math of the ReLU function is simple. Whenever any negative value enters, it becomes zero and for any positive or zero it stays the same. This enables the CNN mathematically healthy by keeping learned values from rushing towards infinity or getting stuck near 0. It is used in most of the CNN ReLU (x) = max(0; x)
(1)
– Softmax: Softmax is a special type of sigmoid function which is mainly used for multiclass classification in the output layer. It converts all the input in a probability distribution, as a result, the sum of the output becomes 1 and all the output vector ranges from 0 to 1. exp(xi ) Softmax(xi ) = j exp(xj )
4
(2)
Result and Discussion
The accuracy of this model obtained from the CAP sleep database, which is the only database for NFLE patients [9], is 60.46% for 15 persons. The CNN model
A Deep Convolutional Neural Network Based Classification Approach
367
didn’t get enough results in the case of accuracy, but it still has the potential to collaborate with NFLE people, since there has been very little work done with deep learning in this dataset. The reason for falling is because of the variation of the people’s sleep cycle. Since these patients are not healthy and their CAP fluctuations are considerably more variable than normal people’s, it does not follow the same trend in the long run and it is clear from the Fig. 3. NFLE patients tend to remain on S1, S2, S3 stage more frequently than normal people. So, the model did not provide perfect patterns for all of the people. The model’s accuracy obtained better results when it was trained with fewer participants. So, there is a lot more scope to diversify the model. From Fig. 2, the accuracy obtained on the training set is shown. Here, the best weights of the model is captured.
Fig. 2. Accuracy on training set.
Fig. 3. Difference between normal and NFLE patients in sleep stage frequency
368
5
S. S. Mahmud et al.
Future Work and Conclusion
Since there has not been so much work done with automatic sleep scoring for NFLE patients, the main target will be to practically implement this model in the medical field as an experimental feature. But the first task will be increasing its accuracy. By observing different models and approaches we have decided to make a hybrid model with CNN and bidirectional LSTM. The bidirectional LSTM layer has the ability to capture the stage transition. Another approach for increasing accuracy is to go for transfer learning. So, if our next model still do not show good potential, then we will move to the transfer learning approach.
References 1. Bianchi, M.T., Cash, S.S., Mietus, J., Peng, C.-K., Thomas, R.: Obstructive sleep apnea alters sleep stage transition dynamics. PLoS ONE 5(6), e11356 (2010) 2. Stefani, A., H¨ ogl, B.: Sleep in Parkinson’s disease. Neuropsychopharmacology 45(1), 121–128 (2020) 3. Pallayova, M., Donic, V., Gresova, S., Peregrim, I., Tomori, Z.: Do differences in sleep architecture exist between persons with type 2 diabetes and nondiabetic controls? J. Diabetes Sci. Technol. 4(2), 344–352 (2010) 4. Tsuno, N., Besset, A., Ritchie, K., et al.: Sleep and depression. J. Clin. Psychiatry 66(10), 1254–1269 (2005) 5. Siengsukon, C., Al-Dughmi, M., Al-Sharman, A., Stevens, S.: Sleep parameters, functional status, and time post-stroke are associated with offline motor skill learning in people with chronic stroke. Front. Neurol. 6, 225 (2015) 6. Mantua, J., et al.: A systematic review and meta-analysis of sleep architecture and chronic traumatic brain injury. Sleep Med. Rev. 41, 61–77 (2018) 7. Zhang, F., et al.: Alteration in sleep architecture and electroencephalogram as an early sign of Alzheimer’s disease preceding the disease pathology and cognitive decline. Alzheimer’s Dement. 15(4), 590–597 (2019) 8. Schulz, H.: Rethinking sleep analysis: comment on the AASM manual for the scoring of sleep and associated events. J. Clin. Sleep Med. 4(2), 99–103 (2008) 9. Loh, H.W., et al.: Automated detection of sleep stages using deep learning techniques: a systematic review of the last decade (2010–2020). Appl. Sci. 10(24), 8963 (2020) 10. Hsu, Y.-L., Yang, Y.-T., Wang, J.-S., Hsu, C.-Y.: Automatic sleep stage recurrent neural classifier using energy features of EEG signals. Neurocomputing 104, 105– 114 (2013) 11. Supratak, A., Dong, H., Wu, C., Guo, Y.: DeepSleepNet: a model for automatic sleep stage scoring based on raw single-channel EEG. IEEE Trans. Neural Syst. Rehabil. Eng. 25(11), 1998–2008 (2017) 12. Sallam, A.A., Kabir, M.N., Ahmed, A.A., Farhan, K., Tarek, E.: Epilepsy detection from EEG signals using artificial neural network. In: Vasant, P., Zelinka, I., Weber, G.-W. (eds.) ICO 2018. AISC, vol. 866, pp. 320–327. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-00979-3 33 13. Sridhar, N., et al.: Deep learning for automated sleep staging using instantaneous heart rate. NPJ Digital Med. 3(1), 1–10 (2020) 14. Terzano, M.G., et al.: Atlas, rules, and recording techniques for the scoring of cyclic alternating pattern (cap) in human sleep. Sleep Med. 3(2), 187–199 (2002)
A Deep Convolutional Neural Network Based Classification Approach
369
15. Felson, S.: Stages of sleep: REM and non-REM sleep cycles, October 2020 16. Ebrahimi, F., Mikaeili, M., Estrada, E., Nazeran, H.: Automatic sleep stage classification based on EEG signals by using neural networks and wavelet packet coefficients. In: 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 1151–1154. IEEE (2008) 17. Moser, D., et al.: Sleep classification according to AASM and Rechtschaffen and Kales: effects on sleep scoring parameters. Sleep 32(2), 139–149 (2009) 18. Danker-Hopfe, H., et al.: Interrater reliability for sleep scoring according to the Rechtschaffen and Kales and the new AASM standard. J. Sleep Res. 18(1), 74–84 (2009) 19. Mandy: Convolutional neural networks (CNNs) explained. Available: https:// deeplizard.com/learn/video/YRhxdVk sIs 20. Moeller, J., Haider, H.A., Hirsch, L.J.: Electroencephalography (EEG) in the diagnosis of seizures and epilepsy (2019). UpToDate https://www.uptodate.com/ contents/electroencephalography-eeg-in-the-diagnosis-of-seizures-and-epilepsy. Accessed 29 Sept 2020 21. Chui, K.T., Zhao, M., Gupta, B.B.: Long short-term memory networks for driver drowsiness and stress prediction. In: Vasant, P., Zelinka, I., Weber, G.-W. (eds.) ICO 2020. AISC, vol. 1324, pp. 670–680. Springer, Cham (2021). https://doi.org/ 10.1007/978-3-030-68154-8 58 22. Vilamala, A., Madsen, K.H., Hansen, L.K.: Deep convolutional neural networks for interpretable analysis of EEG sleep stage scoring. In: 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6. IEEE (2017)
Remote Fraud and Leakage Detection System Based on LPWAN System for Flow Notification and Advanced Visualization in the Cloud Dario Protulipac, Goran Djambic, and Leo Mršić(&) Algebra University College, Ilica 242, 10000 Zagreb, Croatia {dario.protulipac,goran.djambic,leo.mrsic}@algebra.hr
Abstract. This research presents a possible solution for water flow monitoring and alarming system, specially designed for use at remote locations without electricity and Internet access. To make such a system affordable, the low-cost widely available components were tested along with LoRa open network. In addition to the design of the device, it is demonstrated how electricity consumption can be reduced in the sensor platform and the range of the whole system is measured. Research include distance measured at 92 different points and the measurement covered an area of 21.5 km2 in City of Zagreb, Croatia. Keywords: LoRa
IoT Microcontroller Water flow meter
1 Introduction One of the common disasters that can befall an object that is rarely inhabited is the rupture of a water pipe. This phenomenon is mainly caused by the freezing of residual water in the elbows of pipes and valves, during the winter. As the temperature rises, if the main valve was not closed or it was damaged, water will leak. The aim of this research is to propose a system that will inform the owner or the person caring for such a facility that an adverse event has occurred. It must be considered that the electricity is mostly turned off at such facilities, and therefore the internet is not available either. The system that will perform such a task must therefore have its own power supply with the longest possible life of autonomous operation and must not rely on becoming connected to the Internet from the facility itself. As a suitable solution for reporting unwanted water flow, the system is proposed in this paper. This system consists of three parts: (i) water flow sensor; (ii) LPWAN central transceiver; (iii) background system (backend). The water flow sensor is located on the building itself. Its role is to measure the flow of water through a pipe. If a flow occurs, then the sensor must report its size to the central transceiver. It must also report the moment when the water stopped flowing through the pipe. The sensor itself consists of a water flow meter, a microcontroller and a LoRa LPWAN module. Depending on the type of water flow meter, it can be placed immediately behind the water meter or as in the case of this work in which a water meter of section R 1/2” was used, in a place like a garden tap. The microcontroller and the LoRa LPWAN module can be a few meters away from the measuring point, depending on the voltage drop on the connection cable between © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 370–380, 2022. https://doi.org/10.1007/978-3-030-93247-3_36
Remote Fraud and Leakage Detection System Based
371
the microcontroller and the water flow meter. This circuit is completely autonomous. It can run on battery power for a long time and does not need a commercial network connection. The central transceiver, as well as the sensor platform, uses a LoRa LPWAN module that has the same communication parameters set as on the sensor platform. The receiver also consists of a microcontroller connected to an LPWAN transceiver. In this case, we use a type of microcontroller that has a built-in Wi-Fi 802.11n module (Fig. 1).
Fig. 1. System architecture
This dual connection allows it to forward the messages it receives from the sensor via the LoRa LPWAN receiver to the background system via a local or public IP network. Like the sensor platform, the central transceiver can be powered by a battery, but it is recommended to connect it to a mains socket. The position of the central transceiver is not tied to the position of the background system, but it is important that the LoRa module on it and the LoRa module on the sensor platform are within radio range and that it is possible to connect to an IP router equipped with Wi-Fi receiver/WiFi router. Using the MQTT protocol, the central transceiver will notify the background system of the occurrence or cessation of water flow at the sensor location. In addition to these two messages, the sensor also sends periodic messages. They are important to confirm that everything is OK with the sensor. In the system proposed in this paper, periodic normal state messages are sent every 12 h, while in the case of an alarm, messages are sent every 15 min. As a background system, one Raspberry Pi microcomputer was used. Of course, depending on the needs, it is possible to use any other Intel or Arm based computer, powered by Linux, Windows or MacOS operating system. The background system receives messages from the central transceiver via the MQTT intermediary. An intermediary program (middleware) is subscribed to the MQTT messages related to the described alarm system, which prepares the received messages in a suitable format and saves them in a time-oriented database (TSDB). As part of the background system, a web-based user interface has been added that allows you to see if the water flow sensor is in an alarm state and when it last responded to the central transceiver. The same application has the ability to report a message to the end user via email or via the instant messaging system. For an example of this paper, the Telegram instant messaging system was used. The whole system is designed as a demonstration and only one water flow sensor and one central transceiver are used. By
372
D. Protulipac et al.
introducing the LoRaWAN communication protocol, which is a software upgrade of the existing system, and using a multi-channel central transceiver, the system can be expanded to a number of sensors that send data to multiple central transceivers. Also, the background system does not have to be on a single computer and can be deployed to multiple servers as needed. In this way, it is possible to build a very robust and flexible alert network that covers more widespread areas [12, 13, 14].
2 Literature Review The LoRa protocol is a modulation of wireless data transmission based on existing Chirp Spread Spectrum (CSS) technology. With its characteristics, it belongs to the group of low power consumption and large coverage area (LPWAN) protocols. Looking at the OSI model, it belongs to the first, physical layer [1]. The history of the LoRa protocol begins with the French company Cycleo, whose founders created a new physical layer of radio transmission based on the existing CSS modulation [2]. Their goal was to provide wireless data exchange for water meters, electricity and gas meters. In 2012, Semtech acquired Cycleo and developed chips for client and access devices. Although CSS modulation had hitherto been applied to military radars and satellite communications, LoRa had simplified its application, eliminating the need for precise synchronization, with the introduction of a very simple way of encoding and decoding signals. [2] In this way, the price of chips became acceptable for widespread use. LoRa uses unlicensed frequency spectrum for its work, which means that its use does not require the approval or lease of a concession from the regulator. These two factors, low cost and free use, have made this protocol extremely popular in a short period of time. The EBYTE E32 (868T20D) module was used to create the paper [6]. The module is based on the Semtech SX1276 chip. The maximum output power of the module is 100 mW, and the manufacturer has declared a range of up to 3 km using a 5 dBi antenna without obstacles, at a transfer rate of 2.4 kbps. This module does not have an integrated LoRaWAN protocol, but is designed for direct communication (P2P). If it is to be used for LoRaWAN, then the protocol needs to be implemented on a microcontroller. Communication between the module and the microcontroller is realized through the UART interface (serial port) and two control terminals which are used to determine the state of operation of the module. The module will return feedback via the AUX statement. LoRaWAN is a software protocol based on the LoRa protocol [1]. Unlike the patent-bound LoRa transmission protocol, LoRaWAN is an open industry standard operated by the nonprofit LoRa Alliance. The protocol uses an unlicensed ISM area (Industry, Science and Medicine) for its work. In Europe, LoRaWAN uses the ISM part of the spectrum that covers the range between 863–870 MHz [4]. This range is divided into 15 channels of different widths. For a device to be LoRaWAN compatible, it must be able to use at least the first five channels of 125 kHz and support transmission speeds of 0.3 to 5 kbps. Due to the protection against frequency congestion, the operating cycle of the LoRaWAN device is very low and the transmission time must not exceed 1% of the total operation of the device [4].
Remote Fraud and Leakage Detection System Based
373
In addition to defining the type of devices and the way they communicate via messages, the LoRaWAN protocol also defines the appearance of the network itself [5]. It consists of end devices, usually various types of sensors in combination with LoRaWAN devices. The sensors appear to central transceivers or concentrators. One sensor can respond to multiple hubs which improves the resilience and range of the network. Hubs are networked to servers that process incoming messages. One of the tasks of the server is to recognize multiple received messages and remove them. Central transceivers must be able to receive a large number of messages using multi-channel radio transceivers and adaptive mode, adapting to the capabilities of the end device. The security of the LoRaWAN network is ensured by authorizing the sensor to the central transceiver, and messages can be encrypted between the sensor and the application server via AES encryption [5]. MQTT is a simple messaging protocol [11]. It is located in the application layer of the TCP/IP model (5–7 OSI models). It was originally designed for messaging in M2M systems (direct messaging between machines). Its main advantage is the small need for network and computer resources. For these reasons, it has become one of the primary protocols in the IoT world. This protocol is based on the principle of subscriptions to messages and their publication through intermediaries. An intermediary, commonly called a broker, is a server that receives and distributes messages to clients who may be publishers of messages or may be subscribed to them in order to receive them. The two clients will never communicate with each other [7, 8].
3 Methodology The most important segment of the sensor platform is its reliability. To make sure that an accident occurs in time, we must first ensure the reliability of the platform. Precisely for this reason, in the solution proposed in this paper, periodic reporting from the sensor platform to the system is set. The device will report periodically every 12 h, and this is taken care of by the alarm system on the microcontroller. Namely, STM32F411 is equipped with a clock that monitors real time (RTC), and offers the ability to set two independent alarms. In this case, one of them is in charge of waking up the process that sends periodic messages with the current state of the measured water flow through the meter [3, 15, 16]. Before the software implementation of the measurement, it should be noted that the pulse given by the sensor at its output voltage is 5 V. Although the used microcontroller will tolerate this voltage at its input, it is better to lower it to the declared input value of 3.3 V. Such voltage is obtained by two resistors, one with a value of 10 kX and the other of 22 kX, connected in a simple voltage divider [9]. The connection method is clearly shown in the diagram. The flow volume measurement itself is done by monitoring the number of pulses sent by the water sensor via a standard time counter. Each pulse will be registered by the microcontroller as an interrupt. When pulses appear, it is possible to measure the flow and report it via LoRa radio transmission. The frequency of the timer is set to 1 MHz via a divider. By comparing the number of clock cycles between the two interrupts, one can very easily obtain the pulse
374
D. Protulipac et al.
frequency given by the water flow sensor. Knowing the pulse frequency and pulse characteristic, the water flow can be calculated using pre-defined procedure. The first measured flow value greater than zero sets the sensor platform to an alarm state. As long as there is a flow, periodic advertising will take place every 15 min instead of every 12 h. Five minutes after the flow stops, the device will sound the end of the alarm, and the next call will be made regularly after 12 h or earlier in the event of a new alarm. The alarm system works internally in such a way that the last measured value of the water flow is read every 5 s. This value, together with the current counter time, is continuously stored by the measurement process in the form of a time and flow structure. The read value is stored in a field the size of three elements. If after three readings all three elements in the field are equal, it can be determined that there was no flow in the last 15 s and the device exits the alarm state. The system waits another five minutes before announcing the end of the alarm over the LoRa connection. If the flow occurs again within these five minutes, the system will act as if the alarm has not stopped, that is, it will send a flow message after 15 min (Fig. 2).
Fig. 2. Water flow sensor connection diagram
LoRa notifications are intentionally delayed so that in the event of a constant occurrence and interruption of the flow, they would not often send radio messages.
4 Results and Discussion During the measurement, the circuit is supplied with 5 V DC. This is the recommended operating voltage for the LoRa module and water flow sensor used, while the microcontroller can be powered by 5 V or 3.3 V. In this measurement, the first goal is to show that the peak current value will not reach a value greater than 300 mA, which is the maximum that the microcontroller circuit can withstand. This data allows us to power the entire circuit through the microcontroller using the built-in USB port and thus simplify the appearance of the entire sensor. The second goal is to reduce power consumption in order to prolong the autonomy of the sensor operation as much as
Remote Fraud and Leakage Detection System Based
375
possible. As an external power supply, a laboratory power supply R-SPS3010 from Nice-power was used, which can provide a stable operating voltage from 0 to 30 V with a current of up to 10 A. The universal measuring instrument UT139B from UNI-T is connected in series. It is set to measure milliamperes during the measurement, keeping the maximum measured value on the screen. 4.1
Range Measurement
The range was measured from the Zagreb settlement of Vrbani 3, which is located next to Lake Jarun. This location, gives us an insight into what range can be expected in urban and what in rural conditions. Namely, from the central transceiver to the north there is a very urban part with many residential buildings and dense traffic infrastructure, while on the south side is Lake Jarun and the Sava River, which are mostly green areas, smaller forests, and only a few lower buildings. The limiting factor is the position of the antenna of the central transceiver, which was located on the first floor of a residential building, approximately 4 m above ground level and surrounded by buildings. When measuring on the side of the central transceiver, an omnidirectional antenna with a gain of 3.5 dBi was used, which is stationary placed on the outside of the window of a residential building. On the sensor side, for mobility, a smaller antenna with 2 dBi gain was used. The signal was sent in the open “out of hand”. The position of each measurement was recorded via a GPS device on a mobile device and later transferred to Google Earth. In Google Earth, it is possible to import recorded measuring points and measure the distance between them and the antenna of the central transceiver. According to the manufacturer's specification, the maximum range that can be expected from these modules is 3 km in almost ideal conditions with a 5 dBi antenna. In order to somehow approach this distance despite the unfavorable measurement position, the data transfer rate was reduced from the standard module settings from 2.4 kbps to 300 bps. Due to the small amount of data that needs to be transmitted, this is not a limiting factor in practice, and due to the low transmission speed, a smaller amount of errors was obtained when recognizing the received signal and increased success in receiving messages over long distances. In figure below the measured range of the fabricated LoRa system is shown. The position of the central transceiver is shown with an asterisk, while the points from which the signal from the sensor managed to reach it are shown in green. Red dots indicate places where it was not possible to communicate between the sensor and the central transceiver. As expected, the largest range of 3393 m was achieved to the southeast, where apart from a couple of residential buildings near the antenna, there were no additional obstacles. Towards the southwest, the obtained result was 2773 m. However, according to the urban part of the city, the maximum achieved range was 982 m to the east, and to the north it was only 860 m (Fig. 3).
376
D. Protulipac et al.
Fig. 3. Central transceiver antenna position and measuring range
4.2
Results
According to the specification, the maximum consumption of the used module is 130 mA. The measured consumption of the water flow sensor is 4 mA. The maximum current that can be conducted through the sensor board development board is 300 mA, and the circuit on the development platform used is designed so that the Vbus USB terminal and the 5 V terminals of the circuit are on the same bus. From this we can conclude that the entire interface with the sensor and the LoRa module can be powered by the USB interface. However, it is necessary to optimize the consumption so that the circuit can run on a commercially available battery for as long as possible. Table shows the current measurements during the operation of the microcontroller. Here, the microcontroller operated with a maximum operating clock of 96 MHz and without any power optimization. Data are given separately for each element to make it easier to track optimization (Table 1). Table 1. Circuit current without optimization Connected system components Current [mA] State Microcontroller 26.65 Wait Microcontroller 26.88 Event stop Microcontroller + LoRa Module 39.16 Wait Microcontroller + LoRa Module 121.5 Signal send Microcontroller + LoRa Module + Sensor 42.51 Wait Microcontroller + LoRa Module + Sensor 125.7 Signal send
As the flow sensor does not have the possibility of optimization, in Table the values of the current flowing through it are singled out and at the end of each step they will only be added to the obtained results. Table shows that by reducing the operating clock, the current decreased by 11 mA, which is a decrease of slightly more than 40% in the consumption of microprocessors (Table 2). The first step of optimization is to lower the processor clock to 48 MHz (Table 3).
Remote Fraud and Leakage Detection System Based
377
Table 2. Current through the water sensor Current [mA] State 3.35 Idle 4.03 Flow
Table 3. Current with reduced microprocessor clock speed Connected system components Microcontroller Microcontroller Microcontroller + LoRa Module
Current [mA] Stanje 15.50 Wait 15.91 Event stop 28.15 Wait
As the LoRa module on the sensor platform is not used for receiving messages, there is no need to keep it constantly active. Fortunately, this module has a mode in which it shuts down its radio transceiver. By changing the code on the microcontroller, an operating mode was introduced where the radio transceiver is turned on only when necessary. With this procedure, the total current through the microcontroller and the LoRa module dropped to 17.7 mA in standby mode. The STM32F411 microcontroller has various energy saving functions. One of them is a sleep state in which we stop the processor clock completely and listen only to interruptions coming from external devices or clocks. As FreeRTOS was used in the paper, instead of directly sending the microprocessor to sleep, FreeRTOS tickless mode was used [10]. In it, FreeRTOS stops working and puts the microprocessor to sleep. This lowers the current through the circuit consisting of the microcontroller and the LoRa module to 5.87 mA in standby mode, with the total current through the entire circuit now being only 9.22 mA in standby mode. Measuring the current strength has successfully shown how it is possible to use a USB port to power the entire circuit. Also, in several interventions on the program code of the microprocessor, it was possible to lower the current from 42.51 mA to 9.22 mA, which is a difference of 78%. This is very important because waiting time is the state in which the circuit is located almost all the time. Using a portable USB charger (power bank) with a capacity of 10000 mAh (the most common value at the time of writing), with such consumption can be counted on approximately 40 days of autonomous operation of the sensor. Radio signal acquisition showed very good results considering the power and position of the antenna. This measurement is an indication of how even without a great search for the ideal antenna position, a quite decent range can be achieved with a device that has the output power of an average home Wi-Fi system. The maximum measured distance was 3393 m in terms of measurements from ground level and without optical visibility. There is also a large difference in the behavior of LoRa radio protocols between urban and rural areas. While in an uninhabited area the range exceeded the manufacturer’s specifications, in places
378
D. Protulipac et al.
with several residential buildings, the range dropped sharply. It can be concluded that for the purpose of reporting adverse events in rural and remote areas, LoRa LPWAN is an excellent solution. Smaller range in the urban area is very easy to compensate with more densely placed central transceivers. 4.3
Future Research
Further power savings can be achieved if the stop state or standby mode of the microcontroller is used instead of the sleep state. In stop mode, the CPU microcontroller shuts down, and in idle mode, the memory. In these states, the microprocessor needs a little more time to wake up, and when writing the code, it is necessary to pay attention to resetting the initial settings of the microcontroller. Also, instead of the STM32F4 series, which belongs to the series of higher performance, a series of microcontrollers that are specially made for low consumption can be chosen, eg STM32L series. The range of the device, which proved to be very good even in these conditions, can be further improved by placing the antennas in a position that allows unobstructed optical visibility between the sensor antenna and the antenna of the central transceiver. It should be borne in mind that in practice both antennas will be stationary and it will be possible to adjust the antenna on the sensor platform side. If the monitored objects are located in approximately the same direction of the antenna, the signal should cover a narrower area, so instead of omnidirectional antennas, directional antennas can be used that allow greater range.
5 Conclusion This paper shows how today, in home-made construction with really modest financial expenses, a prototype device with a function almost unimaginable can be made only a decade ago. If the price of the central platform, which can be performed with any standard computer, is neglected, less than HRK 250 was spent for the entire platform. Of course, the knowledge required to build the system and the time spent on development are incomparably greater. During operation, most attention and time was spent on developing the sensor platform as the most critical part of the system. Ultimately, a completely autonomous and reliable sensor platform was successfully designed and built, which, in addition to its basic function, had to serve as an intermediary for adjusting the LoRa module and as an instrument for measuring range. To achieve the longest possible autonomy of the sensor platform, studying the current through the sensor platform, the consumption of microcontrollers and radio modules was gradually reduced to almost one fifth of the original consumption while waiting. This is completely done by software shutting down individual components of the system at times when we do not need them and quickly turning them on when the need arises. The operation of the entire system was finally tested using a signal generator, thus confirming the correct operation of all three parts of the system. In order to obtain
Remote Fraud and Leakage Detection System Based
379
confirmation that the developed system can meet the requirement for radio range, the distance was measured at 92 different points and the measurement covered an area of 21.5 km2. The radio reach distances obtained by measuring in this paper fully confirmed that LoRa is a more than acceptable solution for water flow control in houses where the owners do not live most of the year, and are located within or on the wider periphery of the settlement. The obtained results show that such a prototype could be applied in practice even now, without major changes in implementation, only with the connection to the solder plate and placement in a suitable housing. Also, by using other types of sensors, the prototype can serve as a basis for collecting various other information from less frequently visited locations.
References 1. LoRa developers portal. https://lora-developers.semtech.com/ 2. Slats, A.: A Brief History of LoRa: Three Inventors Share Their Personal Story at The Things Conference 2020 (2020). https://blog.semtech.com/a-brief-history-of-lora-threeinventors-share-their-personal-story-at-the-things-conference. Dec 2020 3. Semtech: LoRaTM Modulation Basics, Application note AN1200.22, 2015 (2020). https:// semtech.my.salesforce.com/sfc/p/E0000000JelG/a/2R0000001OJa/ 2BF2MTeiqIwkmxkcjjDZzalPUGlJ76lLdqiv.30prH8. Dec 2020 4. LoRa Alliance: LoRaWAN Regional Parameters, 2020 (2020). https://lora-alliance.org/sites/ default/files/2020-06/rp_2-1.0.1.pdf. Dec 2020 5. LoRa Alliance: LoRaWAN 1.1 Specification, 2020 (2020). https://lora-alliance.org/sites/ default/files/2018-04/lorawantm_specification_-v1.1.pdf. Dec 2020 6. EBYTE E32–868T20D User manual (2021). https://www.ebyte.com/en/downpdf.aspx?id= 132. Mar 2021 7. Carmine, N.: Mastering STM32, Leanpub (2018). https://leanpub.com/mastering-stm32. Dec 2020 8. Agus Kurniawan, Internet of Things Projects with ESP32, Packt Publishing (2019). ISBN 978–1–78995–687–0 9. Horowitz, P.: (2015) Winfield Hill. Cambridge University Press, The Art of Electronics Third Edition (2015) 10. Barry, R.: Mastering the FreeRTOSTM Real Time Kernel (2020). https://www.freertos.org/ fr-content-src/uploads/2018/07/161204_Mastering_the_FreeRTOS_Real_Time_Kernel-A_ Hands-On_Tutorial_Guide.pdf. Nov 2020 11. HiveMQ, MQTT & MQTT 5 Essentials e-book (2021). https://www.hivemq.com/downloadmqtt-ebook/. Feb 2021 12. Mrsic, L., Zajec, S., Kopal, R.: Appliance of social network analysis and data visualization techniques in analysis of information propagation. In: Nguyen, N.T., Gaol, F.L., Hong, T.P., Trawiński, B. (eds.) ACIIDS 2019. LNCS (LNAI), vol. 11432, pp. 131–143. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-14802-7_11 13. Mrsic, L., Jerkovic, H., Balkovic, M.: Interactive skill based labor market mechanics and dynamics analysis system using machine learning and big data. In: Sitek, P., Pietranik, M., Krótkiewicz, M., Srinilta, C. (eds.) ACIIDS 2020. CCIS, vol. 1178, pp. 505–516. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-3380-8_44 14. Intelligent Computing & Optimization: Conference proceedings ICO 2018. Springer, Cham (2018). ISBN 978–3–030–00978–6 https://www.springer.com/gp/book/9783030009786
380
D. Protulipac et al.
15. Intelligent Computing and Optimization. Proceedings of the 2nd International Conference on Intelligent Computing and Optimization 2019 (ICO 2019). Springer International Publishing, ISBN 978–3–030–33585–4. https://www.springer.com/gp/book/9783030335847 16. Intelligent Computing and Optimization, Proceedings of the 3rd International Conference on Intelligent Computing and Optimization 2020 (ICO 2020). https://doi.org/10.1007/978-3030-68154-8
An Analysis of AUGMECON2 Method on Social Distance-Based Layout Problems Şeyda Şimşek1(&), Eren Özceylan2, and Neşe Yalçın1 1
2
Industrial Engineering Department, Adana Alparslan Türkeş Science and Technology University, 01250 Adana, Turkey {ssimsek,nyalcin}@atu.edu.tr Industrial Engineering Department, Gaziantep University, 27100 Gaziantep, Turkey [email protected]
Abstract. In the COVID-19 era, social distance has become a new source of concern for people. Decision-makers have a limited idea of how to allocate people according to social distance due to the lack of preparedness for the pandemic. It is essential to think about both distributing as many individuals as possible in a particular area and minimizing the infection risk. This new concern’s multi-objective state affords decision-makers the opportunity to solve the problem using enhanced methodologies. The AUGMECON2 method, one of the recent popular generation methods, is used to produce the exact Pareto sets for the problem. The scale and time constraints of the challenge have been examined, and recommendations have been made to decision-makers on the trade-off between the number of people and the infection risk. Keywords: COVID-19 Social distancing Layout optimization AUGMECON2
Multi-objective optimization
1 Introduction Coronavirus disease 2019 (COVID-19), a novel disease, was discovered in Wuhan, China, at the end of 2019. This one-of-a-kind disease, which is infectious, is caused by a coronavirus. Because of the virus’s spread throughout the world, this disease has quickly become a global concern [1]. The virus spreads through droplets that exit from the infected person’s nose or mouth [2]. Infected people may spread the virus via airborne besides droplets to others who come into contact with them. Respiratory diseases, like coronavirus, are inextricably linked to close physical contact, as are many other diseases [3]. When it comes to transmission via physical contact, social distance may be defined as a combination of non-pharmaceutical treatments used to prevent the transmission of any contagious disease. It’s also known as physical separation, and the goal is to create a predetermined amount of physical distance between people while limiting the likelihood of close contact [4]. In the study of Chu et al. [5], more than 25,000 patients are investigated to determine the impact of distance between patients and other persons. According to their findings, it is demonstrated that the importance of using face masks and keeping a safe distance to prevent the virus from spreading. Also, © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 381–390, 2022. https://doi.org/10.1007/978-3-030-93247-3_37
382
Ş. Şimşek et al.
it is recommended that people wear masks and maintain a physical distance of at least one meter or two meters if possible between themselves and infected people recommended. The fundamental benefit of distance is that it prevents SARS-CoV-2 infection and reduces transmission. With the coronavirus pandemic, governments have recommended and even imposed measures for social distancing through legislation to ensure the safety of their nation. Even though the measures are self-evident, there are no widely accepted norms for applying the social distance rule. As a result, public spaces such as restaurants, markets, offices and universities make their own decisions regarding how to apply social distance measures. To ensure the application of social distance rules, the allocation of people based on social distance measures has recently been handled in the literature. When allocating people according to social distance, two objectives spring to mind: ensuring safety and locating as many people as possible. It is not a trivial task to create a safer environment by simultaneously locating as many people as possible while ensuring the lowest possible virus load and infection risk, because the more people mean the more possible source of the virus. The main purpose of this paper is to introduce a general approach as a multi-objective optimization problem including maximizing the number of tables seated by only one person (also can be thought of as maximizing the number of people) under social distance and minimizing the infection risk. These competing objectives encourage the use of multi-objective optimization techniques. When these techniques are considered for this study, the generation methods are considered superior methods in the aspects of computational speed, giving Pareto sets as all possible solutions and providing advantages to decision-makers. One of the most used generation methods, the e-constraint method, which has some advanced versions such as the augmented e-constraint method (AUGMECON) introduced by Mavrotas [6] and AUGMECON2 method introduced by Mavrotas and Florios [7], is considered as a useful method. Due to its novelty and feasibility, the AUGMECON2 version of the method is applied in this study. To the best of the authors’ knowledge, this method has been applied for the first time to an allocation problem including both virus load and infection risk and social distance in a finite area. The rest of the study is organized as follows: Sect. 2 conducts a broad assessment of the literature on social distance-based layout optimization. Section 3 introduces a generic approach for multi-objective optimization problems and methods. The findings of the applications are presented in Sect. 4, and the conclusions and future roadmap are indicated in Sect. 5.
2 Literature Review Even though distance constraint is not a new concern, optimization based on social distance can be considered a novel topic. When social distance constraint is taken into account, the spread of the virus and infection risk must also be considered. In the related literature, these two aspects have been handled by some researchers (Table 1). Due to the huge decline in both passenger numbers and GDP associated with air travel [20], the studies [8–13] suggest various models for the air transportation sector. The allocation of passengers on the plane, the apron and the waiting queue are all
An Analysis of AUGMECON2 Method
383
Table 1. A literature review about social distance-based layout problems. Reference Milne et al. [8]
Application area Air transportation
Methodology
Performance metrics/goals
Agent based modeling and simulation Mathematical modeling and simulation Agent based modeling and simulation Agent based modeling and simulation
Aisle and window risk, window risk, seat interferences, boarding times/Assessing boarding methods Distance between seats and aisle and people, number of people/Maximizing the safe load under social distancing Boarding time, seat interferences, infection risk/Comparing the boarding methods under social distancing Boarding time, seat interferences, extra luggage storage duration, aisle seats and window seat risk/Comparing the boarding methods Boarding time, virus spread, window risk, aisle risk/Evaluating boarding methods under health constraint Number of passengers, total risk/minimizing transmission risk Configuration of the vehicle, group based seat arrangement/Assigning people according to physical distancing Distance between points, number of the points, time/Best practicing of social distancing The configuration of tables, the shape of room and tables, sitting sense of the customers/Finding the best way to place tables that socially distant Distance between people, virus function related with distance/Maximizing the number o tables while minimizing the infection risk Social distancing, surface area/Maximum allocation of seats, maximum of minimum of social distancing model Position of doors, windows, dimension of physical space/Suggesting optimal designs under social distancing
Salari, et al. [9]
Air transportation
Milne et al. [10]
Air transportation
Cotfas et al. [11]
Air transportation
Milne et al. [12]
Air transportation
Agent based modeling
Pavlik et al. [13] Moore et al. [14]
Air transportation Public transportation
Kudela [15]
Artificial examples
Contardo and Costa [16]
Allocation in dining room
Mathematical modeling Mathematical modeling and heuristics Mathematical modeling and heuristics Mathematical modeling
Fischetti et al. [17]
Allocation in restaurants, beach, theater
Mathematical modeling
Dündar and Karaköse [18] Ugail et al. [19]
Allocation in the university
Heuristics
Allocation in the university
Mathematical modeling
384
Ş. Şimşek et al.
significant challenges for this industry. Also, the problem of public transport including school buses and trains has been addressed by Moore et al. [14]. Aside from transportation models, every aspect of social life requires layout optimization based on social distance due to the lack of preparedness for the pandemic. Some studies [15–19] have attempted to solve this problem by allocating people under social distance constraints for their safety to different types of places such as universities, restaurants, beaches, etc.
3 Multi-objective Optimization on Social Distance Multi-objective integer programming is an essential research area due to many real-life situations which need discrete representations by integer variables [21]. Solving algorithms for multi-objective mathematical programming may be classified into three categories: a priori methods, interactive methods, and a posteriori (generation) methods [22]. When considering these techniques for this study, the generation methods might be useful due to their computational speed [7]. Furthermore, the generation methods have lots of advantages such as giving all possible solutions (i.e. the Pareto sets) and providing a whole picture for a multi-objective problem to decision-makers. Also known as a posteriori methods, the generation methods include the most commonly used solution techniques called e-constraint and weighting. When these two techniques are compared, the e-constraint method has the following several advantages: alters the feasible region and represents richer efficient sets, produces the unsupported efficient solutions in multi-objective integer and mixed integer problems, does not need to scale the objective functions and eliminates the strong effects of scaling on results, and controls the numbers of the generated efficient solutions by altering the ranges of objective functions [6]. The e-constraint method takes into consideration the most significant objective as an objective function while considering other objective(s) as constraint(s). Thanks to this process, efficient solutions are produced by changing the right hand side of each constraint [23]. The AUGMECON method proposed by Mavrotas [6], the previous version of the AUGMECON2, is the augmented e-constraint method. The solving steps of this method for a classical maximization problem are given in Eq. (1). max f1 ðxÞ þ eps S2 =r2 þ S3 =r3 þ . . . þ Sp =rp s:t: f2 ðxÞ S2 ¼ e2 f3 ðxÞ S3 ¼ e3 ... fp ðxÞ Sp ¼ ep x2S Si 2 R þ :
ð1Þ
An Analysis of AUGMECON2 Method
385
All the e values are the parameters for the right hand side and all the r values are the ranges of the respective objective functions. Also, S values show the surplus variables. And, the eps is a variable that is limited as [10–6, 10–3] [24]. The AUGMECON2 method introduced by Mavrotas and Florios [7] in Eq. (2) mainly differs in the objective function part. max f1 ðxÞ þ eps S2 =r2 þ 101 S3 =r3 þ . . . þ 10ðp2Þ Sp =rp :
ð2Þ
The model proposed in this study comes from a recent study by Fischetti et al. [17], which was inspired by wind turbine allocation and used this approach for layout optimization under the social distance constraint. According to a discrete set V of possible points, a binary variable xi the allocation of one table (person) in a point has been defined as 1 or 0 in Eq. (8). A minimum and the maximum number of tables (people) have been defined as a constraint in Eq. (5). An infection risk variable Iij has also been defined in Eq. (10), which is correlated with the dij variable meaning the distance between people. In the study of Fischetti et al. [17], the objective function is the difference between maximizing profit including the number of tables and total infection risk. The trade-off between the objectives in this problem directed the problem addressed in this paper. Thus, the maximizing objective is accepted as maximizing the number of people as a single objective without any coefficient and adapted to the AUGMECON2 method as seen in Eq. (3). Then, the objective minimizing the total infection risk is defined as a constraint given in Eq. (4) according to the general approach of the AUGMECON2. Max z ¼ s:t:
X i2V
ðxi Þ þ eps ðS2 =r2 Þ:
X
ðwi Þ S2 ¼ e2 :
i2V
Nmin
X i2V
xi þ xj 1 X
I X i2V ij i
wi 0
ð4Þ
ðxi Þ Nmax :
ð5Þ
½i; j 2 E1 :
ð6Þ
wi þ Mi ð1 xi Þ
xi 2 f0; 1g
ð3Þ
i 2 V: i 2 V:
Iij . . .1=dij3 :
i 2 V:
ð7Þ ð8Þ ð9Þ ð10Þ
Since there must be a defined minimum distance between two people, the constraint given in Eq. (6) ensures the social distance between two people. This distance is defined as 3 m for the applied model in this study. The size of the distance can be varied from country to country and government as well. The constraint given in Eq. (7) forces to deactivate the total infection risk in case of a point is not allocated by any person. Lastly, the constraint in Eq. (9) ensures that the total infection risk is never less than zero.
386
Ş. Şimşek et al.
4 Applications In this section, the AUGMECON2 method is applied to the proposed model. The source codes [25] have been modified and then applied to the defined datasets. All applications are performed on an Intel(R) Core(TM) i7-4702MQ CPU @ 2.20 GHz computer with 8 GB RAM running Windows 8.1 and solved in GAMS 34.3.0 environment using CPLEX 20.1.0 solver. In the study, three datasets with 150 points are used as coordinates of the potential allocation points. The first dataset gathered from the study of Fischetti et al. [17] is Dataset1 whose variables are varied between 6.8 and 14.9 for x-coordinate and 8.4 and 23.4 for y-coordinate. Dataset2 and Dataset3 are created randomly. The Dataset2 and Dataset3 are varied between 1 to 12.9 and 2 and 16.9 for x-coordinate, and 2 and 11.3 and 4 and 12.9 for y-coordinate, respectively. The distance between points of each dataset that is considered and evaluated independently has an irregularity. All the datasets can be obtained from Github [26]. Table 2. Payoff tables and execution times for the Dataset2. Different sized inputs 20 points max min f2 f1 f1 3 1 f2 0.12 0.0 Time (sec.) 1.63
30 points max min f1 f2 3 1 0.09 0.0 0.56
40 points max min f1 f2 3 1 0.09 0.0 0.71
50 points max min f1 f2 3 1 0.09 0.0 0.75
100 points max min f1 f2 8 1 0.49 0.0 37.26
150 points max min f1 f2 12 1 0.99 0.0 12,611.56
All applications are performed according to the six different sized inputs of each dataset. These six inputs are generated sequentially from the first 20, 30, 40, 50, 100 of 150 points, and all 150 points for each dataset. As an example, P the payoff tables with respect to both objectives (illustrated as max f1 ¼ z ¼ i2V ðxi Þ and min f2 ¼ P i2V ðwi Þ for easy representation), and the execution times for all inputs of Dataset2 are given in Table 2. The payoff table gives the individual optimization of each objective as diagonal [7]. In Table 2, the payoff tables are given and the conflicting behavior of maximizing people and minimizing the total amount of infection risk is approved. The more people always ended up with the more infection risk. At the same time, if the possible points have increased from 20 to 30 and from 30 to 40; the number of people stays the same, but the total amount of infection risk has decreased for Dataset1 and Dataset2. Because the more possible allocation points give more opportunities to allocation, it decreases the infection risk.
An Analysis of AUGMECON2 Method
387
The trade-offs between the number of people and the total amount of infection risk for that number of people can be seen from the Pareto frontiers. The model has been implemented for the 150 points of the Dataset2 and 5 different options are observed as efficient points depicted in Fig. 1 according to the number of people on the horizontal line and the total amount of infection risk on the vertical line. Any decision-maker can easily evaluate and decide to choose one option. For example, a decision-maker can evaluate and make a healthy decision for this problem according to the obtained five efficient solutions based on the number of people and the total amount of infection risk that are (12, 0.99), (11, 0.71), (10, 0.50), (8, 0.24), (1, 0).
Fig. 1. A Pareto frontier for 150 points of the Dataset2.
Table 2 also shows that the execution times are less than 1 min for the 20, 30, 40, 50 and 100 points. Also, the execution times of Dataset1 and Dataset3 give similar results as Dataset2. However, the execution time after 100 points is surprisingly and drastically increased. An exponential increase is observed for all three datasets. The AUGMECON2 method also depicts the allocation points according to each efficient solution. In here, let it be assumed that an assumption has been made as; the difference in the amount of infection risk between 11 and 12 people is tolerated and it is decided to select 12 people. According to this selection, Fig. 2 shows a visual representation of allocation points. As seen from Fig. 2, blue points show all possible allocation points and red points show the allocated points at the optimal result for Dataset2. All allocated points have a minimum of 3 m from each other. As a result, a total of 12 points out of 150 are determined by considering social distancing norms and infection risk.
388
Ş. Şimşek et al.
Fig. 2. All possible allocation points and the allocated points for the Dataset2.
5 Conclusions The social distance measures first appeared in our lives with COVID-19, prompted us to enact new rules. One of them is allocating people according to social distance measures. In this context, getting people in a finite area while following social distance measures is handled in this study. COVID-19 is a recent disease but social distance strategies are appropriate for any respiratory disease that is linked to close contact. The aim of this study is to present a general approach as a multi-objective optimization problem including maximizing people under social distance and minimizing the infection risk. Among the many methods suitable to solve multi-objective problems, the AUGMECON2 method is used. The original model has been modified and integrated into the method. In the analysis, three datasets are taken into account and the results are evaluated. Time and size limitations are observed. The Pareto efficient sets are generated and possible efficient solutions are provided as a whole picture to the decision-maker. This multi-objective problem might be assumed to view through the eyes of two decision-makers. One is an owner of any place such as a restaurant who wants to increase the number of customers while the other is a customer who wants to sit in a safe location as much as possible. A method satisfying the preferences of both decision-makers has been investigated to meet the trade-off between these two conflicting objectives. It is observed from the obtained results that the overall infection risk increases by roughly 40% for each one more person and a 39% rise from 11 to 12 people is regarded tolerable by decision-makers. As a result, the conflicting structure of the problem is supported by the analysis results, which confirm that the more people are the higher the infection risk is provided. The AUGMECON2 method is an improvable method and has other recent versions such as AUGMECON-R and A-AUGMECON2 methods. Therefore, a further research direction may include a broad comparison of all AUGMECON versions may provide broader perspectives for this problem. Due to the time and size limitations, the method is only applied to a limited area. Because of this, some suggestions are presented for
An Analysis of AUGMECON2 Method
389
further research. Also, with some advanced methods and heuristics, the problem may be extended and solved for bigger areas. In addition, the structure of the problem may be enriched by adding new objectives. The possible allocation points may be included chairs, tables or desks seated by more than one person for each point. With further epidemiological researches, products such as air conditioners and air cleaners that can affect viruses in different ways may also be located in places. Lastly, the distribution of possible points might be more regular or more irregular depending on the creativity and needs of the selected places.
References 1. Yu, Y., et al.: Patients with COVID-19 in 19 ICUs in Wuhan, China: a cross-sectional study. Crit. Care 24, 1–10 (2020) 2. Aldila, D., et al.: A mathematical study on the spread of COVID-19 considering social distancing and rapid assessment: the case of Jakarta, Indonesia. Chaos Solitons Fractals 139, 1–22 (2020) 3. Sun, C., Zhai, Z.: The efficacy of social distance and ventilation effectiveness in preventing COVID-19 transmission. Sustain. Cities Soc. 62, 1–10 (2020) 4. Moosa, I.A.: The effectiveness of social distancing in containing Covid-19. Appl. Econ. 52, 6292–6305 (2020) 5. Chu, D.K., Akl, E.A., Duda, S., Solo, K., Yaacoub, S., Schünemann, H.J.: Physical distancing, face masks, and eye protection to prevent person-to-person transmission of SARS-CoV-2 and COVID-19: a systematic review and meta-analysis. Lancet 395, 1973– 1987 (2020) 6. Mavrotas, G.: Effective implementation of the e-constraint method in multi-objective mathematical programming problems. Appl. Math. Comput. 213, 455–465 (2009) 7. Mavrotas, G., Florios, K.: An improved version of the augmented e-constraint method (AUGMECON2) for finding the exact pareto set in multi-objective integer programming problems. Appl. Math. Comput. 219, 9652–9669 (2013) 8. Milne, R.J., Delcea, C., Cotfas, L.A., Ioanaş, C.: Evaluation of boarding methods adapted for social distancing when using apron buses. IEEE Access 8, 151650–151667 (2020) 9. Salari, M., Milne, R.J., Delcea, C., Kattan, L., Cotfas, L.A.: Social distancing in airplane seat assignments. J. Air Transp. Manag. 89, 1–14 (2020) 10. Milne, R.J., Cotfas, L.A., Delcea, C., Craciun, L., Molanescu, A.G.: Adapting the reverse pyramid airplane boarding method for social distancing in times of COVID-19. PLoS ONE 15, 1–26 (2020) 11. Cotfas, L.A., Delcea, C., Milne, R.J., Salari, M.: Evaluating classical airplane boarding methods considering COVID-19 flying restrictions. Symmetry 12, 1–26 (2020) 12. Milne, R.J., Delcea, C., Cotfas, L.A.: Airplane boarding methods that reduce risk from COVID-19. Saf. Sci. 134. (2021). https://doi.org/10.1016/j.ssci.2020.105061 13. Pavlik, J.A., Ludden, I.G., Jacobson, S.H., Sewell, E.C.: Airplane seating assignment problem. Serv. Sci. 13, 1–18 (2021) 14. Moore, J.F., Carvalho, A., Davis, G.A., Abdulhasan, Y., Megahed, F.M.: Seat assignments with physical distancing in single-destination public transit settings. IEEE Access 9, 42985– 42993 (2021) 15. Kudela, J.: Social distancing as p-dispersion problem. IEEE Access 8, 149402–149411 (2020)
390
Ş. Şimşek et al.
16. Contardo, C., Costa, L.: On the optimal layout of a dining room in the era of COVID-19 using mathematical optimization. http://arxiv.org/abs/2108.04233 17. Fischetti, M., Fischetti, M., Stoustrup, J.: Safe distancing in the time of COVID-19. Eur. J. Oper. Res. (2021). https://doi.org/10.1016/j.ejor.2021.07.010 18. Dundar, B., Karakose, G.: Seat assignment models for classrooms in response to Covid-19 pandemic. J. Oper. Res. Soc. 1–13 (2021). https://doi.org/10.1080/01605682.2021.1971575 19. Ugail, H., et al.: Social distancing enhanced automated optimal design of physical spaces in the wake of the COVID-19 pandemic. Sustain. Cities Soc. 68 (2021). https://doi.org/10. 1016/j.scs.2021.102791 20. ICAO, Effects of Novel Coronavirus (COVID-19) on Civil Aviation: Economic Impact Analysis, Economic Development Air Transport Bureau. https://www.icao.int/sustainability/ Documents/COVID-19/ICAO_Coronavirus_Econ_Impact.pdf 21. Özlen, M., Azizoğlu, M.: Multi-objective integer programming: a general approach for generating all non-dominated solutions. Eur. J. Oper. Res. 199, 25–35 (2009) 22. Nikas, A., Fountoulakis, A., Forouli, A., Doukas, H.: A robust augmented e-constraint method (AUGMECON-R) for finding exact solutions of multi-objective linear programming problems. Oper. Res. Int. J. (2020). https://doi.org/10.1007/s12351-020-00574-6 23. Mavrotas, G.: Generation of efficient solutions in multiobjective mathematical programming problems using GAMS. Effective Implementation of the e-Constraint. https://www. researchgate.net/publication/228612972 24. Mavrotas, G., Florios, K.: AUGMECON 2: a novel version of the e-constraint method for finding the exact pareto set in multi-objective integer programming problems. https://www. gams.com/modlib/adddocs/epscmmip.pdf 25. GAMS Source Code for AUGMECON2. https://www.gams.com/latest/gamslib_ml/libhtml/ gamslib_epscmmip.html 26. Datasets for applications. https://github.com/Seydase/Datasets.git
An Intelligent Information System and Application for the Diagnosis and Analysis of COVID-19 Atif Mehmood, Ahed Abugabah(&), Ahmad A. L. Smadi, and Reyad Alkhawaldeh College of Technological Innovation, Zayed University, Abu Dhabi, UAE [email protected]
Abstract. The novel coronavirus spread across the world at the start of 2020. Millions of people are infected due to the COVID-19. At the start, the availability of corona test kits is challenging. Researchers analyzed the current situation and produced the COVID-19 detection system on X-ray scans. Artificial intelligence (AI) based systems produce better results in terms of COVID detection. Due to the overfitting issue, many AI-based models cannot produce the best results, directly impacting model performance. In this study, we also introduced the CNN-based technique for classifying normal, pneumonia, and COVID-19. In the proposed model, we used batch normalization to regularize the mode land achieve promising results for the three binary classes. The proposed model produces 96.56% accuracy for the classification for COVID-19 vs. Normal. Finally, we compared our model with other deep learning-based approaches and discovered that our approach outperformed. Keywords: COVID-19
CNN Batch normalization Classification
1 Introduction The novel coronavirus spread worldwide to more than 200 countries and put an unpredictable load on the healthcare system. COVID-19 directly affects the lungs and disturbs the upper respiratory system. It was first found out in Wuhan, China. In a shorter time, it was spread and direct effect on the lungs tissues. At the start, coronavirus cases exponentially increased and reached a million cases around the world. Once infected with COVID-19, a patient may have various symptoms and indicators of infection, including fever, coughing, and respiratory sickness, among other things. Severe occurrences of the infection can result in pneumonia, trouble breathing, multiorgan failure, and even death in complex situations [1]. Due to sudden outbreaks, many developed countries’ health care systems downfall. During COVID-19 situation required more ventilators in hospitals. In the critical situation, many countries A. Abugabah—Please note that the AISC Editorial assumes that all authors have used the western naming convention, with given names preceding surnames. This determines the structure of the names in the running heads and the author index. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 391–396, 2022. https://doi.org/10.1007/978-3-030-93247-3_38
392
A. Mehmood et al.
announced the lockdown, did not allow outside the house, and strictly banned gathering in the community. In the fight against COVID-19, it is critical and necessary to conduct effective screening of infected people so that confirmed patients can be segregated and treated as soon as possible after being identified [2]. The test is performed based on patient respiratory specimens. It has been shown that the lungs of patients suffering from COVID-19 symptoms exhibit various visual characteristics. Examples of these are ground-glass opacities, which can be utilized to discriminate between COVID-19 infected persons and non-infected individuals. A method that relies on chest radiography, according to scientists, has the potential to be a helpful tool in the diagnosis, measurement, and follow-up of COVID-19 patients. Because of the complicated anatomical patterns of lung participation that might alter in extent and appearance over time, the correctness of CXR identification of COVID-19 infection is heavily reliant on radiographic expertise [3]. As a result of an insufficient set of sub-trained thoracic radiologists, it is difficult to provide appropriate interpretation of complicated chest investigations, particularly in developing countries, where mainstream radiologists and physicians occasionally review chest imaging in developing nations. There are significant benefits to using a chest radiology image-based detection mechanism over a standard approach [4]. Recently, there have been many techniques for current cutting edge methodologies and applications to real-world problems [5, 6]. It has the advantages of being quick, analyzing more than one case at the same time, having better availability, and, most importantly, being highly beneficial in hospitals with a limited amount of testing kits and facilities. As a result of the relevance of radiation in the global healthcare system and the widespread availability of radiology imaging devices throughout the country, radiography-based approaches are becoming increasingly accessible [7]. In artificial intelligence (AI), deep learning is considered the subset of machine learning. These algorithms were inspired by the human brain structure that is called an artificial neuron. Although deep learning techniques, in particular convolutional neural networks, are being explored (CNNs), have consistently outperformed humans in many computer tasks such as computer vision and classification [8]. In recent research, deep learningbased approaches produced different results for detecting pneumonia and COVID-19 disease. The main reason behind deep learning-based approaches is that researchers do not require handcrafted features. In machine learning, based techniques rely on handcrafted features, which directly reduced the model performance and took extra resources and time [9]. Wong et al. [10] developed a deep learning-based approach for classifying normal, pneumonia, and COVID-19. Apostolopoulos et al. [4] introduced new pre-trained models that achieved 98.75% to 93.48% for binary classification. In researchers also test the seven different approaches based on deep learning. They used small data samples for all techniques and attained 80.23% to 90.19% accuracy for classifying normal and COVID-19 patients. Wang and Xia [11] introduced a new CNN-based architecture distinguishing between normal, pneumonia, and COVID-19. They used more than 10000 images for the classification purpose during the experimental process and achieved 92.4% accuracy. In researchers [12] developed another CNN-based technique that applied on the chest X-ray images. They also used the pre-trained model, especially Xception architecture. That architecture is already pre-trained on the
An Intelligent Information System and Application
393
ImageNet database. They achieved the 89.6% average accuracy on multi-class classification. Alazab et al. [13] developed three different techniques for the classification of COVID-19 X-Ray scans. They attained 90% to 97% performance in term classification. Researchers used the five pre-trained CNN models: Inception, ResNet 101, ResNet 152, and InceptionV3. In the experimental process, they used three different binary classes and also used the 5-fold cross-validation. Sethy et al. [14] designed another model combining CNN and support vector machine (SVM). In researchers prove that the combination of ResNet50 produced the best results. The most significant factor contributing to the success of AI-based solutions in the medical field is to extract the automatic features from the input data samples. However, in deep learning-based approaches still, many issues which decreased the classification performance on COVID-19 scans. The primary issue with many models due to overfitting. They did not seem able to produce the best results. In our research study, we overcame this issue and regularized the model that can perform the best results in terms of classification. Furthermore, we used the normalization approach in the proposed model to perfect and avoid the overfitting issue.
2 Methodology Our proposed model is based on three major stages in this study, first pre-processing the data samples. Second, extract the more valuable features from the processed data to classify the normal and COVID-19 patients. This study designed the CNN-based technique, which included the number of convolutional layers and pooling layers and the different batch normalization techniques. The proposed model flow chart is shown in Fig. 1, and Fig. 2 shows the details of the proposed model.
Fig. 1. The proposed model flow chart includes data collection to final classification results.
2.1
Data Preprocessing
We acquired the input data samples from the kaggle database and used our model to classify the three binary classes. In CNN-based approaches, we need more data to reduce the overfitting issue. To overcome this matter, we used the augmentation approach to extend the data X-ray scans. For this, different parameters include flipping, zoom value and brightness value, and shifting.
394
2.2
A. Mehmood et al.
CNN Model
In the proposed model, CNN layers extract the local characteristics derived from the input sample data. These layers have biases, and different weights used when the model is being trained during the training procedure. These weights are shared from one layer to the next layer. In terms of dimensionality reduction, we used the max-pooling layers. When the CNN-based approach using for training on the scratch data, different parameters included activation function and learning rate with batch normalization [15]. The proposed model used 12 convolutional layers with 4 max-pooling and three fully connected, as shown in Fig. 2.
Fig. 2. Proposed CNN-based model with normalization technique.
3 Results and Analysis During experimental processes, we used the Keras library on the workstation with 32 GB RAM. In this study, we used 1290 scans belongs to COVID, 1946 normal, and 2154 pneumonia patients. For classification, we extract the features from all data samples. During training the model, we split the data 80% for training and 20% for testing. Table 1. Proposed CNN based approach performance on three binary classes. Binary classes COVID-19 vs. Normal COVID-19 vs. Pneumonia Normal vs. Pneumonia
Accuracy (%) Sensitivity (%) Specificity (%) 96.56 94.73 97.08 93.39 91.89 93.58 92.41 90.46 93.31
We can see the proposed model results in Table 1. Our CNN-based approach produced the best results for the classification of COVID-19 vs. Normal and attained 96.56% performance in terms of accuracy. Furthermore, remaining two binary classes, we achieved the 93.39% and 92.41%, respectively. In Table 2. We show the comparison between other deep learning models. We can see our proposed model produced promising results.
An Intelligent Information System and Application
395
Table 2. Proposed model comparison with other approaches. Methods Ozturk et al. [3] Khan et al. [8] Hemdan et al. [16] Proposed model
Accuracy (%) 87.02 89.60 90 95.56
4 Conclusion Nowadays, COVID-19 is still increasing daily. This situation still needed the proper computer-aided system (CAD), which is based on deep learning approaches that can detect the COVID-19 on time. In this study, we introduced the CNN-based approach with the combination of batch normalization for the classification of COVID-19. Our proposed model extracts the more useful features from the three binary classes and produces promising results in classification. During the experimental process, we acquire the data samples from the kaggle database. Finally, we attained 96.56%, 93.39%, and 92.41% on three binary classifications, respectively. Funding. This research is supported by Zayed University, Office of the research.
References 1. Mahase, E.: Coronavirus: COVID-19 has killed more people than SARS and MERS combined, despite lower case fatality rate (2020) 2. Liu, Y., Gayle, A.A., Wilder-Smith, A., Rocklöv, J.: The reproductive number ofcovid-19 is higher compared to SARS coronavirus. J. Travel Med. (2020) 3. Ozturk, T., Talo, M., Yildirim, E.A., Baloglu, U.B., Yildirim, O., Acharya, U.R.: Automated detection of covid-19 cases using deep neural networks with x-ray images. Comput. Biol. Med. 121, 103792 (2020) 4. Apostolopoulos, I.D., Mpesiana, T.A.: Covid-19: automatic detection from x-ray images utilizing transfer learning with convolutional neural networks. Phys. Eng. Sci. Med. 43(2), 635–640 (2020) 5. Vasant, P., Zelinka, I., Weber, G.W.: Intelligent Computing & Optimization, vol. 866. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00979-3 6. Vasant, P., Zelinka, I., Weber, G.W.: Intelligent Computing and Optimization: Proceedings of the 2nd International Conference on Intelligent Computing and Optimization 2019 (ICO 2019), vol. 1072. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-33585-4 7. Wang, D., et al.: Clinical characteristics of 138 hospitalized patients with 2019 novel coronavirus–infected pneumonia in Wuhan, China. JAMA 323(11), 1061–1069 (2020) 8. Khan, A.I., Shah, J.L., Bhat, M.M.: CoroNet: a deep neural network for detection and diagnosis of covid-19 from chest x-ray images. Comput. Methods Programs Biomed. 196, 105581 (2020)
396
A. Mehmood et al.
9. Lalmuanawma, S., Hussain, J., Chhakchhuak, L.: Applications of machine learning and artificial intelligence for Covid-19 (SARS-CoV-2) pandemic: a review. Chaos Solitons Fractals 139, 110059 (2020) 10. Wang, L., Lin, Z.Q., Wong, A.: Covid-net: a tailored deep convolutional neural network design for detection of covid-19 cases from chest x-ray images. Sci. Rep. 10(1), 1–12 (2020) 11. Wang, H., Xia, Y.: ChestNet: a deep neural network for classification of thoracic diseases on chest radiography. arXiv preprint arXiv:1807.03058 (2018) 12. Chollet, F.: Xception: deep learning with depth wise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) 13. Alazab, M., Awajan, A., Mesleh, A., Abraham, A., Jatana, V., Alhyari, S.: Covid-19 prediction and detection using deep learning. Int. J. Comput. Inf. Syst. Ind. Manag. Appl. 12, 168–181 (2020) 14. Sethy, P.K., Behera, S.K.: Detection of coronavirus disease (covid-19) based on deep features (2020) 15. Mehmood, A., et al.: A transfer learning approach for early diagnosis of Alzheimer’s disease on MRI images. Neuroscience 460, 43–52 (2021) 16. El-Din Hemdan, E., Shouman, M.A., Karar, M.E.: COVIDX-Net: a framework of deep learning classifiers to diagnose covid-19 in x-ray images. arXiv e-prints pp. arXiv–2003 (2020)
Hand Gesture Recognition Based Human Computer Interaction to Control Multiple Applications Sanzida Islam(B) , Abdul Matin, and Hafsa Binte Kibria Department of Electrical and Computer Engineering, Rajshahi University of Engineering and Technology, Rajshahi 6204, Bangladesh
Abstract. Human Computer Interaction (HCI) is nothing but a system where humans can interact with the computer more naturally and efficiently. The main aim is to eliminate the generally used controllers such as - mouse, keyboard, pointers, etc. which works as a barrier between humans and computers. This research provides a method for detecting hand gestures using computer vision techniques for controlling various applications in real-time. The proposed method detects all the skincolored objects from the captured frames and then detects the face by using Haar based classifier. The number of fingers is detected by the convexity defect approach and then the movement of the hand is tracked. These are considered as the features of the hand gesture recognition system. This hand gesture recognition system doesn’t require any dataset, hence this is simpler to develop. The detected face is blocked. After the gesture is recognized, they’re translated into actions. 20 commands are generated from the hand gestures and sent to the computer via the keyboard. Due to this method, multiple applications like-video player, music player, PDF reader, slideshow presentation, etc. whichever application takes input from the keyboard can be controlled with this single system. The system can be used for different purposes like human-robot communication, e-learning, touch-less interaction with the computer, etc. Keywords: Computers vision · Hand gesture recognition defect · Computer application control
1
· Convexity
Introduction
In this digital era, the computer is a part of our daily life. So it is necessary to have the interaction between humans and computers as natural as possible. Input devices like mouse, keyboard, joystick, pointers, etc. are like a barrier between humans and the computer. Gesture-based input can remove this barrier and make the interaction more natural and easier. Gestures can be generated from the motion of any part of the human body such as the face, head, eyes, hands, etc. [1]. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 397–406, 2022. https://doi.org/10.1007/978-3-030-93247-3_39
398
S. Islam et al.
Hand gestures are used for normal interaction among humans. Hand gestures are used very frequently as it’s simple and expressive at the same time. Hand gesture recognition is used for creating user interfaces, such as home appliances and medical systems, etc. [2]. It can be used for performing mouse operation also [3]. There can be two types of hand gestures. Static or dynamic hand gestures [4]. The static gesture means a shape of the hand, whereas the dynamic gesture means a series of hand gestures like hand gestures obtained from a video. The proposed method uses some of the most common natural gestures to give instructions to the computer like-move up, down, left, right along with finger count. This work aims to present a real-time system for hand gesture recognition based on the detection of the number of fingers and their movement. These gestures will be converted to inputs to control multiple desktop applications. In this system, input image is acquired from the webcam and then some pre-processing is done to reduce the noise. After the processing is completed, the number of finger is detected from the region of interest. Then the movement of hand is tracked and finally the gesture is recognized and instructions are generated. At last, various applications can be controlled using those predefined gestures.
2
Literature Review
The basic steps in the gesture recognition system are image acquisition, processing, feature extraction, and gesture recognition. Many different methods have been implemented by researchers based on their application. Each method has its pros and cons in time requirement, simplicity, cost, accuracy and efficiency. In [5], a dynamic hand gesture recognition system is developed to control the VLC media player. This system contains a central computation module that performs the image segmentation using skin detection and approximate median technique. The motion direction of the hand region is used as a feature and a decision tree is used as a classification tool in this application. This system can only control the VLC media player where as our proposed system can control multiple applications as it sends the command through the keyboard. The average accuracy of this system is around 75%. In another paper [6], the skin region is detected by the skin color model based on Hue. A segmentation algorithm is developed to separate the hand from the face. This paper uses the least square method to fit the trajectory of hand gravity motion. The angle and direction of the hand movement are used for recognizing four gestures (left, right, up, and down). Another paper [7], presented a system to control an industrial robot with hand gestures. This system uses a convexity defect approach to count the fingers [8] and give commands according to the instructions predefined. The instructions are given to the robot via serial communication to perform only four operations (left, right, forward, and backward). In [9], a hand gesture recognition system is presented which can recognize seven gestures and launch different applications using them. This system uses
HGR Based Human Computer Interaction
399
a convex hull and convexity defect approach for feature extraction [8] and also uses a haar cascade for classifying hand gestures without exposing fingers (palm and fist). The proposed method uses convexity defect approach to detect number of fingers and track the movement of the hand in four direction. Thus it can generate more unique gestures instead of just five fingers.
3
Proposed Architecture of HGR System
There are several steps in this proposed system. Such as image acquisition via webcam, RGB to HSV conversion, skin color detection, eliminate face by using haar cascade, noise elimination, skin segmentation, thresholding, binary image enhancement using morphological transformation erosion, dilation, and gaussian blur, feature extraction using contour, convex-hull, and convexity defects, count the number of fingers in front of the webcam and track the hand movement. After completing all the above steps, the decision is taken about the command to be given to control the chosen application. Figure 1 shows the block diagram of the proposed system.
Fig. 1. Block diagram of the hand gesture recognition system.
3.1
Skin Detection and Face Elimination
The image was acquired from the webcam in the form of frames. Then it was converted to HSV after applying Gaussian blur. After that skin-colored objects were detected from the frames using a mask. Then face was detected using the
400
S. Islam et al.
Fig. 2. Hand detection and face elimination
Fig. 3. Threshold image of the hand
Haar cascade classifier and blocked by a black rectangle on the frame [10]. It wasn’t taken into consideration as it’s one of the biggest skin-colored objects in the frame. Now, the only skin-colored object that remained inside the frame, is the hand shown in Fig. 2. 3.2
Noise Removal and Image Enhancement
After the skin color is detected morphological image processing is applied to remove the remaining noise. At first, erosion is applied. It removes the extra pixels from the hand. In the second stage, dilation is applied which adds pixels to the missing parts of the hand. At last, the Gaussian blur is applied to reduce more noise from the frame. 3.3
Thresholding
This is used for the segmentation of an image to obtain a binary image. This method compares each pixel’s intensity with the threshold value. If the value is greater than the threshold value then it’s replaced with a white pixel otherwise with a black pixel. So the output contains only the skin-colored object in white. Figure 3 shows the threshold image of the hand. 3.4
Feature Extraction
After these steps are done, the image is ready for feature extraction. There are three stages in this proposed method. Contour. The first step is to find the contour of the hand. Contour is the curve that joins the boundary points of the hand having the same intensity or color. These points are found due to the change in the intensity of neighboring pixels.
HGR Based Human Computer Interaction
Fig. 4. Contour and Convex hull of hand.
401
Fig. 5. Convexity defect.
It’s found from the threshold image that was formed in the previous stage. In Fig. 4, the green curve showing the outline of the hand is called the contour. If there is more than one skin-colored object then the contour is drawn for the biggest object in the frame. If there are multiple people in front of the camera then contour will be drawn around the hand which is the largest among all. This can be based on the distance where they are sitting. The nearest person’s hand will be seen as the biggest one and the contour will be drawn around it. Convex Hull. It’s a polygon that bounds all the contour points. It surrounds all the white pixels from a binary image. The red polygon drawn in Fig. 4 represents the convex hull of the hand. It determines the fingertip location. Convexity Defect. The difference between the contour and convex hull is called the convexity defect. These are the parts of the convex hull but it’s not the part of the main object. In Fig. 5 the straight lines represent the convex defects. Three components of the convex defect are a start point, endpoint, and far point. Yellow dots represent a far point. Using these points the number of fingers is determined. The process to determine the number of fingers is discussed in the next section. 3.5
Classification
After the convexity defect points are found the number of fingers is counted. The classification of hand gesture is done by the number of fingers and their direction of movement. The cosine rule is used to find the angle at the far point between the line drawn from the start and endpoint to the far point, i.e. the convex points or the fingertips for all defects in the gesture shown at any moment. A triangle like Fig. 6 is formed with the convex points and the far point shown in Fig. 7 (1) B = A2 + C 2 − 2AC cosγ
402
S. Islam et al.
Fig. 6. A triangle.
Fig. 7. A triangle is formed with a start, end, and far points.
From this equation the angle can be calculated. γ = Cos−1 (
A2 + C 2 − B 2 ) 2AC
(2)
If the angle is smaller than 90◦ then it’s considered as a defect. Thus the number of fingers is found asf ingers = def ect + 1 (3) After the finger is counted, the movement of the center of the hand is tracked (up, down, left, or right). This is done by tracking the change in the starting and ending position of the hand in X coordinate and Y coordinate in every frame. Let, the initial position of the center pixel is (x1, y1) and after 5 consecutive frames, the position is (x2, y2). If x2 is greater than x1 then that means the hand has moved in the right direction and vice versa. Same rules are applied when the movement is along the y axis. Since there can be some unintentional movement of hand, so We have considered the movement only if it’s greater then 5 cm. This way each finger is moved to four direction and generated 20 commands. 3.6
Gesture Recognition and Action Generation
After the gestures are recognized, the instructions are given to the application via the keyboard. After performing one action, a small delay of 5 s is taken before another command is given. Since many application takes command from the keyboard, all of them can be controlled with this system. Slide presentation, music player, video player, image viewer, PDF reader, and much other application can be operated with these 20 gestures.
4
Results
The hand gesture recognition system has been implemented successfully to control different applications running on a computer. The proposed system can use
HGR Based Human Computer Interaction
403
Table 1. Hand gestures used to control video player/music player etc. No. of fingers
Hand movement
Keyboard Action
Function
1
Right
F
Toggle fullscreen
2
Left Right
Home/Left End/Right
Backward/Previous Forward/Next
3
Any
M
Mute/Unmute
4
Up Down
Page Up Page Down
Volume up Volume down
5
Any
Space
Play/Pause
20 hand gestures in total. Each of the five fingers can move in four directions (up, down, left, and right). Figure 8 shows the output image where skin color is detected and the face is blocked using haar cascade. The threshold image is also shown on the right corner and then the contour and the convex hull are drawn on this binary image of the hand. Later, the number of fingers is calculated and shown in the text in the frame. The background color matches the skin color, that’s why a white or any other background is used.
Fig. 8. Skin color detection, face elimination, thresholding, and finger detection.
Table 1 shows some of the hand gestures, keyboard actions, and their corresponding functions that were used to control some applications like vlc media player, music player, etc. The Figs. (9, 10, 11, 12 and 13) show different gestures controlling the vlc media player (volume up/down, forward/backward, play/pause, etc.). Table 2 shows some of the hand gestures, keyboard actions, and their corresponding functions that were used to control applications like pdf reader, slide
404
S. Islam et al.
Fig. 9. Move forward when 2 fingers move to the right.
Fig. 10. Increase the volume when 4 fingers move upward.
Fig. 11. Decrease the volume when 4 fingers move downward.
Fig. 12. Mute when 3 fingers are shown.
Fig. 13. Pause/play when 5 fingers are shown.
Fig. 14. Presentation slide change with a hand gesture (five finger).
presentation, photo viewer, etc. Figure 14 shows the slide change using hand gesture in Microsoft PowerPoint. There are still many gestures left that can be used to give more commands to these applications. As these applications take some common keyboard instructions like page up, down, home, end, etc. that’s why they don’t need separate systems to control them. This single system alone can control all these applications that take commands from the keyboard. Table 3 shows the recognition rate of each gesture. All the gestures are recognized very fast and accurately. The overall accuracy is 97.8% in good lighting conditions. The lighting environment affects the overall performance badly, as it affects the outcome of skin detection. The error that also arises in low lighting conditions is that the device is unable to identify the color of the skin because the pixel color is too dark. The whole surface of the hand skin is well defined in bright lighting conditions.
HGR Based Human Computer Interaction
405
Table 2. Hand gestures used to control pdf reader/slide presentation/photo viewer etc. No. of fingers
Hand movement
Keyboard action
Function
1
Up Down
Ctrl + Plus Ctrl + Minus
Zoom in Zoom out
2
Left Right
Home/Left End/Right
Previous Next
3
Any
M
Mute
4
Up Down
Page Up Page Down
Scroll up Scroll down
5
Any
Space
Next
Table 3. Hand gestures recognition rate.
5
No. of fingers
Direction of hand
No. of testing
Recognition rate
1
Up/Down Left/Right
80 90
99%
2
Up/Down Left/Right
85 88
99%
3
Up/Down Left/Right
90 70
95%
4
Up/Down Left/Right
80 110
98%
5
Up/Down Left/Right
85 90
98%
Conclusions and Future Work
From the successful experiment, it can be asserted that the proposed system can recognize the gestures properly in real-time. Also, it can control multiple applications like VLC media player, power-point presentation, pdf reader, chrome browser etc. with the classified gestures. Then they can open a video or book on an app and control it with hand gestures, as seen in the performance of the hand gesture recognition system. In future, instead of skin color-based segmentation CNN can be used which will reduce the problem with low lighting and other skin colored objects appearing in front of the camera. With some modifications, this system can be useful for training the humanized social robots or for touch-less interaction between humans and computer.
406
S. Islam et al.
References 1. Shukla, J., Dwivedi, A.: A method for hand gesture recognition. In: 2014 Fourth International Conference on Communication Systems and Network Technologies, pp. 919–923 (2014) 2. Holte, M.B., Tran, C., Trivedi, M.M., Moeslund, T.B.: Human pose estimation and activity recognition from multi-view videos: comparative explorations of recent developments. IEEE J. Sel. Topics Signal Process. 6(5), 538–552 (2012) 3. Veeriah, J.V., Swaminathan, P.: Robust hand gesture recognition algorithm for simple mouse control. Int. J. Comput. Commun. Eng. 2(2), 219–221 (2013) 4. Rautaray, S.S., Agrawal, A.: Vision based hand gesture recognition for human computer interaction: a survey. Artif. Intell. Rev. 43(1), 1–54 (2012). https://doi. org/10.1007/s10462-012-9356-9 5. Paliwal, M., Sharma, G., Nath, D., Rathore, A., Mishra, H., Mondal, S.: A dynamic hand gesture recognition system for controlling VLC media player. In: 2013 International Conference on Advances in Technology and Engineering (ICATE), pp. 1–4. IEEE (2013) 6. Jingbiao, L., Huan, X., Zhu, L., Qinghua, S.: Dynamic gesture recognition algorithm in human computer interaction. In: 2015 IEEE 16th International Conference on Communication Technology (ICCT), pp. 425–428. IEEE (2015) 7. Ganapathyraju, S.: Hand gesture recognition using convexity hull defects to control an industrial robot. In: 013 3rd International Conference on Instrumentation Control and Automation (ICA), pp. 63–67. IEEE (2013) 8. Mesbahi, S.C., Mahraz, M.A., Riffi, J., Tairi, H.: Hand gesture recognition based on convexity approach and background subtraction. In: 2018 International Conference on Intelligent Systems and Computer Vision (ISCV), pp. 1–5. IEEE (2018) 9. Haria, A., Subramanian, A., Asokkumar, N., Poddar, S., Nayak, J.S.: Hand gesture recognition for human computer interaction. Procedia Comput. Sci. 115, 367–374 (2017) 10. Cuimei, L., Zhiliang, Q., Nan, J., Jianhua, W.: Human face detection algorithm via HAAR cascade classifier combined with three additional classifiers. In: 2017 13th IEEE International Conference on Electronic Measurement and Instruments (ICEMI), pp. 483–487. IEEE (2017)
Towards Energy Savings in Cluster-Based Routing for Wireless Sensor Networks Enaam A. Al-Hussain(&) and Ghaida A. Al-Suhail Department of Computer Engineering, University of Basrah, Basrah, Iraq [email protected]
Abstract. Wireless Sensor Networks (WSNs) are mainly composed of a number of Sensor Nodes (SNs) that gather data from their physical surroundings and transmit it to the Base Station (BS). These sensors, however, have several limitations, including limited memory, limited computational capability, relatively limited processing capacity, and most crucially limited battery power. Upon these restricted resources, clustering techniques are mainly utilized to reduce the energy consumption of WSNs and consequently enhance their performance. The Low Energy Adaptive Clustering Hierarchy (LEACH) protocol serves as a good benchmark for clustering techniques in WSNs. Despite LEACH retains energy from sensor nodes, its energy efficiency is still considerably compromised due to unpredictable and faster power draining. Therefore, the goal of this paper focuses on how the LEACH protocol may be used effectively in the field of environmental monitoring systems to address issues about energy consumption, efficiency, stability, and throughput in a realistic simulation environment. The realistic performance analysis and parameter tuning were carried out utilizing the OMNET++/Castalia Simulator to serve as a baseline for future developments. Keywords: WSNs Castalia
LEACH Clustering Energy efficiency OMNET
1 Introduction Recently, Wireless sensor networks (WSNs) have been regarded as a significant research area due to their critical involvement in a variety of applications. Wireless sensor nodes collect data, analyze it for optimization, and then send it to the sink via a network of intermediary nodes. The network of these nodes as a whole constitutes the wireless sensor network, which is capable of organizing data and transmitting it to the requester (sink) [1]. Meanwhile, energy efficiency is still a critical problem in the design of WSN’s routing protocol according to resource constraints and the nonrechargeability of resources for sensor nodes [2, 3]. Notably, clustering approach is widely used approach for managing the topology of WSNs, since it may significantly enhance the network’s performance. It can make nodes in groups according to predefined criteria such as ensuring QoS, optimizing resource requirements, and balancing network load. A leader node which manages each cluster is called Cluster Head (CH). This node is responsible for data collection from © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 407–416, 2022. https://doi.org/10.1007/978-3-030-93247-3_40
408
E. A. Al-Hussain and G. A. Al-Suhail
cluster members (CMs) and transmitting it to the Base Station. Clustering techniques eliminate the need for resource-constrained nodes to transfer data directly to gateways (sinks), which results in energy depletion, inefficient resource utilization, and interference. Numerous studies on energy efficiency and data collection for cluster-based routing algorithms have been conducted [4–7]. The most of these strategies consist of two phases: (i) Setup phase and (ii) Steady-State phase. The first phase involves the selection and formation of CHs, as well as the assignment of a TDMA schedule to member nodes by the CH [8]. Meanwhile, the former phase is responsible for transmitting the identifiable data to their CHs via a specified TDMA slot allocated by the setup phase’s CH. Then, the CHs collect the data from CMs and transfer it to the Base Station. Several LEACH, PEGASIS, TEEN, APTEEN, and HEED protocols [9–12] are devoted as the primary hierarchical routing protocols in WSN. Each has numerous variants that are adapted to certain applications. Typically, the Sensor Nodes (SNs) consume a great deal of energy during data transmission rather than data processing. As a result, it is critical to minimize redundant sensed data transmission to the BS through the efficient deployment of Cluster Heads (CHs) in a network. Hence, it is important to evaluate the routing protocol in major aspects and scenarios to guarantee the real-world design of WSNs and ensure optimal environment simulation for further improvement utilizing a variety of optimization methods. In this paper, the LEACH protocol is evaluated as a good benchmark for a singlehop clustering algorithms. Numerous scenarios are presented to evaluate the overall energy efficiency and throughput. Moreover, in order to find the typical values for each scenario, several parameters are considered, including the optimal CHs percentage, packets received by the Sink (BS) located in various locations under various node density and data rates. Extensive simulation demonstrates that once the node density of the same area size increases, the network’s energy consumption decreases, resulting in extending the network lifetime of a WSN. Additionally, it is observed that when the CH percentage is optimal, the energy consumption of a network is minimal. However, when the CH percentage of a network exceeds an optimal value, energy consumption increases, significantly reducing the network’s lifetime. The rest of this paper will be structured as follows. Firstly, the literature review is addressed in Sect. 2. In Sect. 3 the LEACH protocol is described in detail. Meanwhile, in Sect. 4 the network model is discussed. Section 5 displays and discusses the simulation results. Finally, in Sect. 6, the conclusion has been drawn.
2 Related Works “The Low Energy Adaptive Clustering Hierarchy (LEACH) protocol [13] is one of the most well-known protocols. It makes use of energy consumption by employing adaptive clustering via its advantage as a good benchmark for clustering routing protocols in WSNs and MANETs. Within LEACH, the nodes in the network field are clustered and established. Each cluster has a single leader node identified as the cluster head (CH), and this node is selected at random manner. Moreover, while the LEACH protocol retains
Towards Energy Savings in Cluster-Based Routing for WSNs
409
energy from sensor nodes, its energy efficiency is likely impacted by random and fast energy dissipation, which is increased by the cluster’s unequal distribution of nodes and the time restriction imposed by the TDMA MAC Protocol [13–15]. In LEACH protocol, the CHs are randomly assigned to operate as relay nodes for data transmission; afterward, the cluster heads shift roles with regular nodes to spend a uniform amount of energy in all nodes. The suggested hybrid approach extends the lifetime of nodes while decreasing the energy consumption of the transmission. Numerous research have recently examined the routing and energy consumption challenges related to LEACH protocol by modifying the mathematics models to increase overall performance using a variety of efficient ways [16, 17]. Meanwhile, intelligent algorithms [18–22] are also used as a viable strategy for lowering the energy consumption of WSNs and extending the network’s lifetime. Furthermore, other researchers have stressed the critical role of Fuzzy Logic System (FLS) in the decisionmaking process for CH efficiency in WSNs [23]. All these studies emphasize on the predefined protocol with specific parameters that affect the efficiency of the optimized LEACH protocol’s routing. Such parameters include the sensor node’s life time, the total number of packets received, the latency of the transmission, and the scalability of the number of sensor nodes. Nevertheless, most works evaluated their proposed protocols in a virtual environment without examining the effect of the original protocol’s parameters on the network’s efficiency. Thus it is critical to evaluate the routing protocol in major aspects and scenarios using realistic simulation environments such as Castalia and OMNET++ Simulator. This technique ensures that WSNs are designed in the actual world environment and provides a realistic implementation for further development of the LEACH protocol and its versions (LEACH-C, M-LEACH,…etc.) using various optimization techniques.
3 Low Energy Adaptive Clustering Hierarchy Protocol LEACH is a pioneering WSN clustering routing protocol. LEACH Protocol’s major purpose is to enhance energy efficiency by random CH selection. LEACH is operated in rounds that consist of two phases: Set-Up Phase and Steady-State Phase. Clusters are constructed and a cluster head (CH) is elected for each cluster during the setup phase. Meanwhile, during the steady phase, the data is detected, aggregated, compressed, and transmitted to the base station. i. Set-Up Phase: The Set-Up step involves the selection and construction of CHs, as well as the assignment of a TDMA schedule to member nodes. 1. Cluster Head Selection: Each node assists in the process of CH selection by randomly creating a value between (0 and 1). If the random number generated by the SN is smaller than the threshold value T (n), the node becomes CH, else it considers as CM and waits for ADV messages to join the nearby CH. Equation 1 is used to find the value of T (n).
410
E. A. Al-Hussain and G. A. Al-Suhail
TðnÞ ¼
P 1Pðr mod 1=PÞ
0
if n 2 G Otherwise
ð1Þ
Where: P is the percentage of the CHs, which is used at the beginning of each round (starting at time t), such that expected the number of CHs nodes for this round is K. P ¼ K=N
ð2Þ
2. Cluster Formation: Once the CHs are elected, they broadcast ADV messages to the rest of the sensors using CSMA MAC protocol. Non-CHs must maintain their receivers throughout the Set-Up phase to hear all CHs’ ADV messages. After this phase is complete, each sensor determines which cluster it belongs to based on the RSSI value. Meanwhile, each sensor node (SN) transmits JOINREQ messages to its corresponding CH using CSMA. 3. Schedule Creation: Each CH node generates a TDMA schedule based on the number of JOINT-REQ messages received. The schedule is broadcast back to the cluster’s nodes to inform them when they can transmit. ii. Steady-State Phase: The steady-state or transmission phase is where environmental reports are communicated from the network field. During this phase, each sensor node transmits its data to the CH during its assigned time slot (intra-cluster communication), meanwhile, each CH aggregated the data from the corresponding CMs and sent it to the BS (inter-cluster communication). The key advantages and limitations of the LEACH protocol can be summarized as follow (Table 1):
4 Network Model The following criteria are considered when describing the network model based on the proposed protocol: 1. Sensor Nodes are uniformly distributed across a M M interesting area, and throughout the process, all nodes and the BS remain stationary (non-mobile). 2. Each sensor node is capable of sensing, aggregating, and transmitting data to and from the base station (BS) and other sensors (i.e., acts as a sink node). 3. The network’s nodes are non rechargeable and have homogeneous initial energy. 4. To ensure optimal performance, the Sink Node (BS) is positioned in the network field’s center. Quite frequently, the assumption is made that the communication links between the nodes are symmetrical. As a result, when it comes to packet transmission, any two nodes’ data rate and energy consumption are symmetrical. 5. The nodes operate in power control mode, with the output power determined by the receiving distance between them.
Towards Energy Savings in Cluster-Based Routing for WSNs
411
Table 1. Advantages and Limitations of LEACH protocol. Advantages ▪ The clustering technique used by the LEACH protocol results in decreased communication between the sensor network and the BS, extending the network’s lifetime ▪ CH utilizes a data aggregation technique to reduce associated data on a local level, resulting in a significant reduction in energy consumption ▪ Each sensor node has a reasonable chance of becoming the CH and subsequently a member node. This maximizes the lifetime of the network ▪ By utilizing TDMA Scheduling, intracluster collisions are avoided, extending the battery life of sensor nodes
Limitations ▪ Expansion of the network may result in a trade-off between the energy distances of a CH and a BS ▪ Due to the random number principle, nodes do not resurrect to become CHs, which further reduces their energy efficiency ▪ No consideration is made of heterogeneity in terms of energy computational capabilities and link reliability ▪ The TDMA approach imposes constraints on each frame’s time slot
5 Simulation Results and Performance Analysis This section discusses the LEACH’s performance evaluation. The LEACH protocol is examined when a network of 100 sensor nodes is uniformly distributed over a 100 x 100 m2 area. The BS is positioned in the sensor field’s center. All nodes should have initial energy of 3 J. Moreover, we used around the time of 20 s in our scenarios with a maximum simulation time equal to 300 s. The size of all data messages is the same and the slot time is utilized to 0.5 in all simulation situations. The total overview of simulation parameters is shown in Table 2.
Table 2. Simulation parameters. Parameters Network size No. of nodes No. of clusters Location of BS Node distribution BS mobility Energy model Application ID
Value 100 100 m2 100 5 50 50 m Uniform Off Battery Throughput test
Parameters Initial energy Simulation time Round time
Value 3J 300 s 20 s
Packet header size 25 Bytes Data packet size 2000 Bytes Bandwidth 1 Mbps
412
5.1
E. A. Al-Hussain and G. A. Al-Suhail
Performance Evaluation of LEACH Protocol
In this section, numerous factors are considered when evaluating Low Energy Adaptive Clustering, including the number of nodes, the CH percentage, and the area size. The LEACH protocol’s performance is quantified in term of the total energy consumed by sensor nodes during each round for data processing and communication. Also, reliability is another metric evaluated by the total number of received data packets. Experimental Case I Figures 1 and 2 depict the effect of node density (number of nodes per m2) and area size on energy consumption. Where (50, 100, 200) sensor nodes are uniformly distributed across 100 100 m2 and 200 200 m2 areas, respectively. Each node has initial energy of 3J, with a CH percentage of 5%. If the CH percentage remains constant but the network’s node density increases, this results in an increase in the number of CHs in the network proportional to the network’s node density. The energy consumption of nodes is minimal at CH = 5% of 100 100 m2 area networks with 100 nodes (5 CHs selected), and minimal at 200 nodes (10 CHs selected) of 200 200 m2 network. This is because as the coverage area increases, the node consumes more energy transmitting the sensed information to the sink with the fewest CHs possible.
Fig. 1. Total energy consumption
Fig. 2. Total energy consumption
In Fig. 1, it is shown that when the CH percentage is optimal, the energy consumption of a network becomes minimal. However, when the CH percentage of a network exceeds an optimal value, energy consumption increases, significantly reducing the network’s lifetime. So that it’s important to choose the optimal value of the CHs percentage to avoid extra power consumption from the sensor nodes. Experimental Case II Figures 3 (a–d) illustrate the effect of node density (number of sensor nodes per m2), area size, and packet rates expressed as a percentage of CHs on the total number of packets received at the sink. The network is configured as in Table 3:
Towards Energy Savings in Cluster-Based Routing for WSNs Table 3. Network Configuration. 2
Area (m ) Node density 100 100 0.002 0.006 0.01 200 200 0.002 0.006 0.01
No. of nodes CH percentage Packet rate 20 5%, 8%, 10% 1, 3 60 100 80 5%, 8%, 10% 1, 3 240 400
(a) packet rate = 1 packet/sec/node.
(b) packet rate = 3 packet/sec/node.
(c) packet rate = 1 packet/sec/node.
(d) packet rate = 3 packet/sec/node.
Fig. 3. (a–d): The effect of node density, area size, and the packet rates with CHs percentage on the total number of packets received at the sink
413
414
E. A. Al-Hussain and G. A. Al-Suhail
In Figs. 3 (a–d), the obtained results illustrate that increasing the packet rate results in a decrease in the network’s packet reception rate, this occurs due to increased CH congestion. Increased packet rate enables source sensor nodes to relay the sensed data more quickly to their CHs during their assigned time slot. CH is now receiving more packets from its associated sensor nodes than it is broadcasting to a sink as a result of this increase in the packet rate. In effect, the Congestion arises in the WSN as a result of this condition. Thereby, the sensor buffer begins to overflow, increasing packet loss and lowering the rate at which packets are received in the WSN.
6 Conclusions and Discussion The Low Energy Adaptive Clustering Hierarchy (LEACH) is evaluated with many considerations, including node density, CH percentage, packet rates, and Area size. As seen from the findings, the CH percentage remains constant but the network’s node density increases. This results in an increase in the number of CHs in the network proportional to the network’s node number. Moreover, when the CH percentage is optimal, the energy consumption of a network is minimal. However, when the CH percentage of a network exceeds an optimal value, energy consumption increases, significantly reducing the network’s lifetime. So that it’s important to choose the optimal value of the CHs percentage to avoid extra power consumption from the sensor nodes. The energy consumption of nodes is minimal at CH = 5% of 100 100 m2 area networks with 100 nodes (5 CHs selected), and minimal at 200 nodes (10 CHs selected) of 200 200 m2 network. This is because as the coverage area increases, the node consumes more energy transmitting the sensed information to the sink with the fewest CHs possible. As the number of CHs increases, the amount of energy consumed is reduced proportionately. In addition, the obtained results also illustrate that increasing the packet rate can cause in a decrease in the network’s packet reception rate due to the increase in CH congestion. Note that once packet rate is increased this would enable source sensor nodes relay the sensed data more quickly to their CHs during their assigned time slot. CH is now receiving more packets from its associated sensor nodes than it is broadcasting to a sink, then this may increase the packet rate. As a result, Congestion arises in the WSN and the sensor buffer begins to overflow. This means that packet loss becomes high and a significant reduction happens in the resultant packet rate during packets delivery in the WSN. For future work, fuzzy logic systems and intelligent algorithms such as FPA, GWO, ACO, and ABC algorithms can be utilized to improve the routing strategy in the LEACH protocol. Additionally, multi-hop routing techniques can be also considered for optimal monitoring system design.
Towards Energy Savings in Cluster-Based Routing for WSNs
415
References 1. Priyadarshi, R., Gupta, B., Anurag, A.: Deployment techniques in wireless sensor networks: a survey, classification, challenges, and future research issues. J. Supercomput. 76(9), 7333– 7373 (2020). https://doi.org/10.1007/s11227-020-03166-5 2. Banđur, Đ, Jakšić, B., Banđur, M., Jović, S.: An analysis of energy efficiency in Wireless Sensor Networks (WSNs) applied in smart agriculture. Comput. Electron. Agric. 156, 500– 507 (2019) 3. Kalidoss, T., Rajasekaran, L., Kanagasabai, K., Sannasi, G., Kannan, A.: QoS aware trust based routing algorithm for wireless sensor networks. Wireless Pers. Commun. 110(4), 1637–1658 (2019). https://doi.org/10.1007/s11277-019-06788-y 4. Ketshabetswe, L.K., Zungeru, A.M., Mangwala, M., Chuma, J.M., Sigweni, B.: Heliyon 5, e01591 (2019) 5. Mann, P.S., Singh, S.: Energy-efficient hierarchical routing for wireless sensor networks: a swarm intelligence approach. Wireless Pers. Commun. 92(2), 785–805 (2016). https://doi. org/10.1007/s11277-016-3577-1 6. Fanian, F., Rafsanjani, M.K.: Cluster-based routing protocols in wireless sensor networks: a survey based on methodology. J. Netw. Comput. Appl. 142, 111–142 (2019) 7. Singh, H., Bala, M., Bamber, S.S.: Taxonomy of routing protocols in wireless sensor networks: a survey. Int. J. Emerg. Technol. 11, 63–83 (2020) 8. Rostami, A.S., Badkoobe, M., Mohanna, F., Keshavarz, H., Hosseinabadi, A.A.R., Sangaiah, A.K.: Survey on clustering in heterogeneous and homogeneous wireless sensor networks. J. Supercomput. 74, 277–323 (2018) 9. Al-Shaikh, A., Khattab, H., Al-Sharaeh, S.: Performance comparison of LEACH and LEACH-C protocols in wireless sensor networks. J. ICT Res. Appl. 12, 219–236 (2018) 10. Khedr, A.M., Aziz, A., Osamy, W.: Successors of PEGASIS protocol: a comprehensive survey. Comput. Sci. Rev. 39, 100368 (2021) 11. Asqui, O.P., Marrone, L.A., Chaw, E.E.: Evaluation of TEEN and APTEEN hybrid routing protocols for wireless sensor network using NS-3. In: Rocha, Á., Ferrás, C., Montenegro Marin, C.E., Medina García, V.H. (eds.) ICITS 2020. AISC, vol. 1137, pp. 589–598. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-40690-5_56 12. Ullah, Z.: A survey on Hybrid, Energy Efficient and Distributed (HEED) based energy efficient clustering protocols for wireless sensor networks. Wirel. Pers. Commun. 112(4), 2685–2713 (2020). https://doi.org/10.1007/s11277-020-07170-z 13. Kwon, O.S., Jung, K.D., Lee, J.Y.: WSN protocol based on leach protocol using fuzzy. Int. J. Appl. Eng. Res. 12, 10013–10018 (2017) 14. Lee, J.S., Teng, C.L.: An enhanced hierarchical clustering approach for mobile sensor networks using fuzzy inference systems. IEEE Internet Things J. 4, 1095–1103 (2017) 15. Amutha, J., Sharma, S., Sharma, S.K.: Strategies based on various aspects of clustering in wireless sensor networks using classical, optimization and machine learning techniques: Review, taxonomy, research findings, challenges and future directions. Comput. Sci. Rev. 40, 100376 (2021) 16. Basavaraj, G.N., Jaidhar, C.D.: H-LEACH protocol with modified cluster head selection for WSN. In: International Conference on Smart Technologies for Smart Nation (SmartTechCon), pp. 30–33. IEEE (2017) 17. Cui, Z., Cao, Y., Cai, X., Cai, J., Chen, J.: Optimal LEACH protocol with modified bat algorithm for big data sensing systems in Internet of Things. J. Parallel Distrib. Comput. 132, 217–229 (2019)
416
E. A. Al-Hussain and G. A. Al-Suhail
18. Devika, G., Ramesh, D., Karegowda, A.G.: Swarm intelligence-based energy‐efficient clustering algorithms for WSN: overview of algorithms, Analysis, and Applications. In: Swarm Intelligence Optimization, pp. 207–261 (2020) 19. Tamtalini, M.A., El Alaoui, A.E.B., El Fergougui, A.: ESLC-WSN: a novel energy efficient security aware localization and clustering in wireless sensor networks. In: 1st International Conference on Innovative Research in Applied Science, Engineering and Technology (IRASET), pp. 1–6. IEEE (2020) 20. Sharma, N., Gupta, V.: Meta-heuristic based optimization of WSNs energy and lifetime-a survey. In: 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence), pp. 369–374. IEEE (2020) 21. Yuvaraj, D., Sivaram, M., Mohamed Uvaze Ahamed, A., Nageswari, S.: An efficient Lion optimization based cluster formation and energy management in WSN Based IoT. In: Vasant, P., Zelinka, I., Weber, G.-W. (eds.) ICO 2019. AISC, vol. 1072, pp. 591–607. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-33585-4_58 22. Mitiku, T., Manshahia, M.S.: Fuzzy logic controller for modeling of wind energy harvesting system for remote areas. In: Vasant, P., Zelinka, I., Weber, G.-W. (eds.) ICO 2019. AISC, vol. 1072, pp. 31–44. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-33585-4_4 23. Al-Husain, E., Al-Suhail, G.: E-FLEACH: an improved fuzzy based clustering protocol for wireless sensor network. Iraqi J. Electr. Electron. Eng. 17, 190–197 (2021)
Utilization of Self-organizing Maps for Map Depiction of Multipath Clusters Jonnel Alejandrino1(&), Emmanuel Trinidad2, Ronnie Concepcion II3, Edwin Sybingco1, Maria Gemel Palconit1, Lawrence Materum1, and Elmer Dadios3 1
Department of Electronics and Computer Engineering, De La Salle University, 2401 Taft Avenue, 1004 Manila, Philippines {jonnel_alejandrino,edwin.sybingco, maria_gemel_palconit,lawrence.materum}@dlsu.edu.ph 2 Department of Electronics Engineering, Don Honorio Ventura State University, 2001 Bacolor, Philippines [email protected] 3 Department of Manufacturing Engineering and Management, De La Salle University, 2401 Taft Avenue, 1004 Manila, Philippines {ronnie.concepcion,elmer.dadios}@dlsu.edu.ph
Abstract. Clustering of multipath components (MPC) simplifies the analysis of the wireless environment to produce the channel impulse response which leads to an effective channel model. Automatic clustering of the MPC has been utilized as a replacement to the traditional manual approach. The arbitrary nature of MPC that interacts with the surrounding environment still challenges wireless researchers to utilize algorithms that are fitted based on the measured data of the channel. For enhancing the clustering process, visualization plays a considerable part in inferring knowledge in the dataset and the clustering results. Hence, the combination of the automatic and manual approach in clustering enhances the process, leading to efficient and accurate extraction of the clusters using visualization. Self-Organizing Map (SOM) has been proven helpful in aiding the clustering and visualization in different fields process which can be combined to form a hybrid system in clustering problems. In this paper, the investigation of the effectiveness of SOM in visualizing the MPC extracted from the COST2100 channel model (C2CM) and visualize clustering tendencies of the dataset. Keywords: Clustering organizing maps
Multipath components (MPC) Visualization Self-
1 Introduction Wireless devices grow exponentially along with the demand for higher data rates, reliability, and massive connections which formulates stringent standards to be met by wireless designers [1]. These demands can be achieved by exploiting the spatial domain of the wireless system by employing multiple-input multiple-output (MIMO) antenna systems. In designing and simulating a radio system, channel models are used to represents the environment to assess the effect of the multiple obstructions in the © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 417–426, 2022. https://doi.org/10.1007/978-3-030-93247-3_41
418
J. Alejandrino et al.
propagated signals. Attaining highly efficient wireless communication systems are evaluated first by simulating the environment using channel models that lessen the need to build the systems for testing. Cluster-based channel models have been proposed to data analysis and low complexity by clustering the measured values to obtain the channel impulse response (CIR). Clustering can be seen as an unsupervised class in the field of machine learning (book clustering) due to the absence of target labels at the output function. Clustering arranges the dataset based on the similarity of features. Many methods have been used to cluster the wireless multipath clusters. The techniques have been proposed based on algorithms that extract significant features of the MPC. Studies and measurements in channel modeling have concluded that MPC arrives in clusters [2]. These phenomena are the basis of producing cluster-based channel models. Many measurement campaigns have been proposed to extract the clustering structure of MPC. The traditional approach uses a manual identification of clusters by visual means [3]. The manual approach can be practical if small data are gathered. However, in the case of multipaths, especially in urban environments, the manual approach becomes tedious and hard to distinguish visually due to overlapped visualization of data points. Another drawback of manual clustering is the subjective nature of the approach, which may lead to different interpretations. The MIMO angular features of each MPC results in high dimensional data, typically 5 to 7 dimensions needed to evaluate each MPC. Czink et al. [4] proposed the framework in automatically clustering the MPC using k-means that enhances the clustering process instead of the manual approach. The automatic framework yielded rich developments and investigations of different clustering algorithms to use in clustering MPCs. However, clustering algorithms pose problems such as the initialization process of determining the optimal number of clusters. Furthermore, automatic clustering using algorithms lessens the subjective nature but limits the physical interpretation of the clustered MPC data. In addition, the human-in-the-loop process has been proposed in the literature [5, 6], where the human intervention in different parts of the clustering process has been reported useful. Interactivity makes the human domain knowledge be applied and effectively interpret the clustering result or the data’s inherent structure. For inferring knowledge beforehand, visualization is an efficient tool to represent and reveal hidden structures of data [7]. In the past decade, relevant techniques have been proposed to overcome some drawbacks of using SOM. In [8], the SOM is modified to consider the winning frequency of each neuron in the map. The quantization and topographic error are used to evaluate their proposed modified SOM. Furthermore, to reduce the randomization of the initial weights, a proposed augmentation that considers the selection of weights is studied in [9]. Also, the utilization of SOM to find the number of clusters in [10] has been done. A protocol independent approach was proposed in [11] that utilized the clustering of the SOM that reduces computational costs. Relative to the two-step approach, the initial value of SOM was used as an input to the K-means clustering algorithm to reduce the initial value problem of the latter algorithm [12]. The number of clusters is usually predetermined in many clustering algorithms. With the proposed approach in [13], the number of clusters was derived from the topology based on a polar SOM.
Utilization of Self-organizing Maps for Map Depiction of Multipath
419
Lastly, an unsupervised image classification problem in [14] is evaluated using a modified SOM with two layers that have been considered as Deep SOM with its extension E-DSOM is shown to have comparable results with Auto Encoders. The above-mentioned literature briefly shows the SOM to be of significant aid in different processes of the unsupervised learning problems with different techniques that have been proposed to aid the visualizations of datasets. The rest of this paper is organized as follows. Section 2 briefly reviews related works in the clustering process of MPCs and some techniques that aid the SOM. Section 3 presents the dataset used and the procedure of the SOM. The results are presented and analyzed in Sect. 4. Finally, the paper is concluded in Sect. 5.
2 Multipath Clustering A model from wireless sensor networks based on ANN and SVM [2], many channel models have been standardized and used in testing wireless systems. Cluster-based channel models caught the attention of researchers, which shows different approaches in clustering MPCs. A middle-ground approach is proposed in [15] to perform automatic and manual clustering of the MPC. Through the use of MIMO, a hybrid data acquisition model Alejandrino et al. [16] has been proposed, and the angular properties of the MPC have also been extracted. Up to today, there are still no algorithms that outperform the other due to the stochastic nature of the measured data that varies from one environment to another. The clustering of the MPC still covers a challenge to wireless system engineers. A proposed channel model in an indoor scenario with additional two metrics for clustering, namely the MPC length and the arrival interval, was done by [17]. If the introduced parameters are added in the clustering domain, the algorithm will add additional dimensionality in the clustering process. The works in [18] introduce a score- fusion technique using 5 CVI to obtain the number of optimal clusters where they simulated an urban environment. The MPC data is fed into the Kmeans algorithm. Comparison of clustering algorithms has been shown in [19], where the K-Power means has been seen to have more accurate performance. A comparative study of spectral and signal clustering with AI-based approach showed in [20], where their proposed algorithm shows a significant increase in proportional to the increasing number of clusters. Several techniques have been developed to visualize data from the traditional scatter plot to the sophisticated technique using Application based cluster and connectivity-specific routing Protocol [21]. Aforementioned dimensionality reduction techniques have been utilized to project the MPC and visualize it in a scatterplot matrix. Reducing the dimension of high-dimensional data sets lowers the computational cost and provides a clearer picture through several visualizations. Different algorithms have been a broad scope of research due to the vast amount of measurement campaigns that differs from one environment to another. The visualization aspect of the measured data and clustered results have been limited in the traditional 2D and 3D scatterplot. Hence, this paper aims to apply the SOM in visualizing all the parameters in a topology preserving map.
420
J. Alejandrino et al.
3 SOM and Dataset This section defines the advantages of SOM in clustering applications. It also describes the acquisition of the dataset. SOM is utilized in this proposed visualization because it has a relative approach with multipath clustering when compared to artificial neural network. SOM also has the potential to visualize the clustering and mapping of topology-based components of multipath communication. Data are acquired through series of model and manipulation of components. Topology was described by reconfiguring the standard map through its indicated neurons. The accustomed model vectors are developed, initialized, and shifted into the best neuron available from the given input vector. The developed SOM serves as the acquiring map that captures and store the dataset to complement the neighboring group of points. SOM algorithm description, cluster-based model used, and data acquisition model were elaborated below. 3.1
SOM
SOM has been widely used in different fields for clustering and exploratory data analysis [23, 24]. The learning is identified as competitive and cooperative as opposed to the error-correcting nature of the artificial neural networks. Connections of the weights are based on the number of features n of the dataset vector, as shown in Fig. 1. The SOM proposed by Tuevo Kohonen, also called the Kohonen network, is an unsupervised competitive neural network that aims to represent high-dimensional patterns into a lattice, topology usually in a rectangular or hexagonal structure. The structure dictates the number of neighboring points in the neuron where four and six are used for rectangular and hexagonal structures, respectively. SOM can also be seen as a topology-based clustering method and provides a mapping that aids the visualization of cluster structures in the dataset [25]. One example is in agricultural application of wireless connectivity [26]. The neighborhood function preserves the topology, and the data points can be projected into 2D space, which can be easily visualized. Two advantages of using SOM in clustering and visualization are; first, the clustering structure of the data set can be visualized; second, is the visualization of the distribution [25]. Essentially, the learning process of the SOM is as follows, the competition for the best matching unit (BMU), cooperative, and the weight update. The initialization of a map is to describe its topology where the number of neurons is indicated. The model vectors are also initialized and updated and are moved towards the BMU or winner neuron c from the input vector xi. The initialized SOM map can be seen as a net that captures the structure of the dataset and organizing itself to match and close the neighboring points. The SOM algorithm can be summarized as follows [24]. 1. Initialize random values of the weights wi 2. Find the winning neuron c at time (t) by using Euclidean norm c ¼ argminfxðtÞ mi ðtÞg where x ¼ ½x1 ; . . .; xm 2 RM
Utilization of Self-organizing Maps for Map Depiction of Multipath
421
Fig. 1. SOM achitecture [22]
3. The weights of the winner neuron and neighbors are updated using: mi ðt þ 1Þ ¼ mi ðtÞ þ hci ðtÞ½xðtÞ mi ðtÞ where t is the learning step and hci(t) is the neighborhood function The evaluation of the performance of the map, the quantization error Qe, and topographic error Te are used [8]. The neighborhood function can be utilized where the Gaussian function is the most common using the unified distance matrix (U-matrix) proposed by Ultsch [24] as a visualization tech- nique to show the boundaries between clusters of data of the SOM. This process is achieved by calculating the Euclidean distance between neurons to reveal the local structures of the data. The visualization uses color-coding schemes to show the distance between neurons. The hits or size markers represent the distribution of each neuron. 3.2
COST2100 Channel Model
The cluster-based channel model has gained attention over the past decade. The European Cooperation in Science and Technology (COST) has developed the C2CM [21] which covers the parameters of MPC in the azimuth and elevation angles. In addition, the stochastic nature of MIMO channels can be reproduced alongside the multi-link and single link properties. The extracted MPC features are stacked as a vector containing the parameters such as azimuth of arrival and departure, the elevation of arrival and departure, delay, and power for each MPC. The vector can be represented as x = [s h, AOA /, AOA h, AOD /, AOD]. Hence, the measurements can be stacked into a matrix X corresponding to one snapshot of the measurements. The 5-dimensional feature of the vector can be normalized and transformed, and fed to clustering algorithms. The semi-urban non-line of Sight for a single link is used in this study due to
422
J. Alejandrino et al.
the huge amount of MPC produced in one snapshot. The extracted data consists of 1500 MPC, each with the corresponding features azimuth and elevation of arrival and departure, the delay, and the relative power. However, the relative power is truncated as a feature in the visualization process. The MPC also has their corresponding cluster-id that serves as the ground truth for validation. The dataset of one snapshot consists of 20 clusters with corresponding distributions.
4 Visualization and Analysis The SOM Toolbox is used in implementing the experiment used in this paper, as suggested in [23]. The experiment is conducted as follows and depicted in Fig. 1. The dataset served as input in MATLAB, followed by the initialization of the map structure. The data set is projected first as the initialization alongside the U-matrix and the corresponding parameters (Fig. 2).
Fig. 2. Procedure of the experiment
The first step is to initialize the dataset and the topology of the map. Random initialization is projected in Fig. 3, where the scattered points show the untrained SOM. In the proposed topology of the map, the authors used the number of MPC equals the number of neurons in the grid to provide better resolution. Concurrently the training process, the batch algorithm is preferred over sequential training for computational efficiency. The rough tuning computes for the global structure of the map and then proceeds with the fine-tuning for the neighborhood exploration in the map. Experimentally, the iterations were modified from 30, 500, 1000, and 1500. The 1500 iteration step has more errors than the 1000 iterations, resulting in overtraining the map. The iteration with lower topographic and quantization errors, hence, the map with 1000 steps for both the tunings map is used. The quantization error Qe = 0.501 and topographic error Te = 0.37 After the optimal reduced error is found, the U-matrix is presented alongside the parameters in Fig. 4, where the features are also seen to be organized according to their weights. The U-matrix and hits are also computed and projected and shown in Fig. 5. Visual inspection of the map shows the boundaries between clusters, where brighter colors indicate a more considerable distance between neurons and darker colors closer to each other. Figure 5 also shows a visualization of the distribution of data points in the nodes, and it can be observed that the boundaries have empty hits colored with red. The larger the indicator means more MPC in that node. In addition, the hits show agreement with the number of clusters from the ground-truth data that consists of 20 clusters.
Utilization of Self-organizing Maps for Map Depiction of Multipath
423
Fig. 3. Random initialization of the SOM
Fig. 4. SOM after rough and fine tuning
The use of SOM in visualizing data extracts all the features of each MPC since the connection of the input vector to the nodes is dictated by the number of parameters. However, the iterative process of reducing the errors and training steps can be reduced by utilizing the variants of SOM techniques that are also considered for future investigation.
424
J. Alejandrino et al.
Fig. 5. HITS projected in the U-matrix
5 Conclusion Visualization of measured data has been widely used in revealing the cluster structure of data. In this paper, the SOM is utilized to analyze its performance in visualizing the clusters of MPC. The manual approach’s laborious process and subjective nature can be overcome using the SOM visualization of the MPCs in which the U-matrix reveals cluster boundaries effectively. As the wireless propagation environment becomes complex, the need for such visualization assistance can be of great use to show cluster structure for CIR extraction. By visualizing the data, knowledge can be inferred before utilizing algorithms that can increase computational costs. In addition, visualization can also evaluate clustering results, and the advantages of both the automatic and manual approaches can be combined efficiently.
References 1. Series, M.: Minimum Requirements Related to Technical Performance for IMT-2020 Radio Interface(s) Report 2410-0 (2017) 2. Alejandrino, J., Concepcion II, R., Lauguico, S., Palconit, M.G., Bandala, A., Dadios, E.: Congestion detection in wireless sensor networks based on artificial neural network and support vector machine. In: 12th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management (HNICEM), pp. 1–6. IEEE (2020)
Utilization of Self-organizing Maps for Map Depiction of Multipath
425
3. Oestges, C., Clerckx, B.: Modeling outdoor macrocellular clusters based on 1.9-GHz experimental data. IEEE Trans. Vehicular Technol. 56(5), 2821–2830 (2007) 4. Czink, N., Cera, P., Salo, J., Bonek, E., Nuutinen, J., Ylitalo, J.: A framework for automatic clustering of parametric MIMO channel data including path powers. In: Vehicular Technology Conference, pp. 1–5. IEEE (2006) 5. Keim, D.A.: Information visualization and visual data mining. Trans. Visual. Comput. Graph. 8(1), 1–8 (2002) 6. Concepcion, R., II., dela Cruz, C.J., Gamboa, A.K., Abdulkader, S.A., Teruel, S.I., Macaldo, J.: Advancement in computer vision, artificial intelligence and wireless technology: a crop phenotyping perspective. Int. J. Adv. Sci. Technol. 29(6), 7050–7065 (2020) 7. Chen, W., Guo, F., Wang, F.: A survey of traffic data visualization. Trans. Intell. Transp. Syst. 16(6), 2970–2984 (2015) 8. Chaudhary, V., Ahlawat, A., Bhatia, R.S.: An efficient self-organizing map learning algorithm with winning frequency of neurons for clustering application. In: 3rd International Advance Computing Conference (IACC), pp. 672–067. IEEE (2013) 10. Mishra, M., Behera, H.: Kohonen self organizing map with modified K-means clustering for high dimensional data set. Int. J. Appl. Inf. Syst. 2(3), 34–39 (2012) 11. Alejandrino, J., et al.: Protocol-independent data acquisition for precision farming. J. Adv. Comput. Intell. Intell. Inf. 25(4), 397–403 (2021) 12. Wang, H., Yang, H., Xu, Z., Zheng, Y.: A clustering algorithm use SOM and K-means in intrusion detection. In: International Conference on E-Business and E-Government, pp. 1281–1284 (2010) 13. Xu, L., Chow, T., Ma, E.: Topology-based clustering using polar self-organizing map. Trans. Neural Netw. Learn. Syst. 26(4), 798–808 (2015) 14. Wickramasinghe, C.S., Amarasinghe, K., Manic, M.: Deep self-organizing maps for unsupervised image classification. IEEE Trans. Indust. Inf. 15(11), 5837–5845 (2019) 15. Materum, L., Takada, J., Ida, I., Oishi, Y.: Mobile station spatio-temporal multipath clustering of an estimated wideband MIMO double-directional channel of a small urban 4.5 GHz microcell. EURASIP J. Wirel. Commun. Netw. 2009, 1–16 (2009) 16. Alejandrino, J., Concepcion, R., Almero, V.J., Palconit, M.G., Bandala, A., Dadios, E.: A hybrid data acquisition model using artificial intelligence and IoT messaging protocol for precision farming. In: 12th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management (HNICEM), pp. 1–6. IEEE (2021) 17. Li, J., Ai, B., He, R., Yang, M., Zhong, Z., Hao, Y.: A cluster-based channel model for massive MIMO communications in indoor hotspot scenarios. Trans. Wirel. Commun. 18(8), 3856–3870 (2019) 18. Moayyed, M.T., Antonescu, B., Basagni, S.: Clustering algorithms and validation indices for mmWave radio multipath propagation. In: Wireless Telecommunications Symposium (WTS), pp. 1–7. IEEE (2019) 19. Teologo, A.: Cluster-wise Jaccard accuracy of KPower means on multipath datasets. Int. J. Emerg. Trends Eng. Res. 7, 203–208 (2019) 20. Ladrido, J.M., Alejandrino, J., Trinidad, E., Materum, L.: Comparative survey of signal processing and artificial intelligence based channel equalization techniques and technologies. Int. J. Emerg. Trends Eng. Res. 7(9), 31–322 (2019) 21. Alejandrino, J., Concepcion, R., Lauguico, S., Flores, R., Bandala, A., Dadios, E.: Application-based cluster and connectivity-specific routing protocol for smart monitoring system. In: 12th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management (HNICEM), pp. 1–6. IEEE (2020)
426
J. Alejandrino et al.
22. Palamara, F., Piglione, F., Piccinin, N.: Self- organizing map and clustering algorithms for the analysis of occupational accident databases. Saf. Sci. 49(8), 1215–1230 (2011) 23. Kohonen, T.: Essentials of the self-organizing map. Neural Netw. 37, 52–65 (2013) 24. Krak, I., Barmak, O., Manziuk, E., Kulias, A.: Data classification based on the features reduction and piecewise linear separation. In: Vasant, P., Zelinka, I., Weber, G.-W. (eds.) ICO 2019. AISC, vol. 1072, pp. 282–289. Springer, Cham (2020). https://doi.org/10.1007/ 978-3-030-33585-4_28 25. Shieh, S.-L., Liao, I.-E.: A new approach for data clustering and visualization using selforganizing maps. Expert Syst. Appl. 39(15), 11924–11933 (2012) 26. Concepcion, R.S., II., et al.: Adaptive fertigation system using hybrid vision-based lettuce phenotyping and fuzzy logic valve controller towards sustainable aquaponics. J. Adv. Comput. Intell. Intell. Inf. 25(5), 610–617 (2021)
Big Data for Smart Cities and Smart Villages: A Review Tajnim Jahan, Sumayea Benta Hasan, Nuren Nafisa, Afsana Akther Chowdhury, Raihan Uddin, and Mohammad Shamsul Arefin(&) Chittagong University of Engineering and Technology, Chittagong, Bangladesh [email protected]
Abstract. An urgent want remains for cities to get smarter, as to handle hugescale urbanization and finding new methods to manipulate complexity, increase efficiency and improve excellence of lifestyles. With the urbanization progress, urban control is going through a chain of evocations within the new state of affairs. Smart city which is a modern form of municipal construction, flourished gradually in quick development scheme of a new intelligence technology. For the construction and betterment of smart cities, big data technology serves important support. Therefore, by reviewing forty papers, this research represents the characteristics of smart city as well as villages, analyzes the solicitation of big data technologies in smart city or smart village design and shows the findings which could be used by the researchers to do further research. Keywords: Big Data Internet of Things Smart city Smart village Smart sensors Cloud computing Networks
1 Introduction Due to current progression in Information and Communication Technology, the idea of Smart City has grown to be a brilliant scope in improving the excellence of ordinary urban lifestyles. Technology has been exploited in improving access for public transport, traffic management, water optimization and power delivery, and enhancing regulation inducement services, schools as well as hospitals etc. These terms generate a large amount of data for analytical purposes. Different way of life has been provided by cities and urban regions than rural areas. There are settled entities and identifying this flourishing area of cross-disciplinary practices, and they mean to conventional, technical, financial and political factors which keep evolving, thus proposing opportunities for certain refinement of the idea of smart city [1]. For sustainability, it is a heavy task to conduct a study and a proposal in terms of extensive areas by city coverage, as well as differentiations in society [7]. Smart objectives are being expanded to support and qualify an enlarged range of solutions which is based on cellular architecture and wireless sensor networks. Few examples of smart objectives in our regular life are smart phones, watches, tablets, clinical devices, smart TVs and cars, security networks, building automation sensors and access control systems [10]. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 427–439, 2022. https://doi.org/10.1007/978-3-030-93247-3_42
428
T. Jahan et al.
In Smart Villages, consecutive and updated networks as well as services are improved as being a result of upgraded telecommunication technologies, novelties and the well use of knowledge for the advantages of inhabitants and businesses. One of the most important findings of countryside’s armature is noted in the scope of dynamism. Another one is apparent in digital architecture and skills [5]. For building smart city or village some components are needed. Figure named Fig. 1 shows the basic components of smart cities. While moving to big data, it is an enormous data analysis, which refers to an amount of data so immense that processing applications of traditional data are not capable to capture processes and present the outcomes in a suitable period [7]. Big data encompasses a broad variety of data types with inclusion of structured, unstructured and semi structured data. Using big data in terms of smart cities and villages is really a very challenging task to complete because planning a smart city is a balancing act. Some of the challenges include data mobility, data security, data integrity, volume of data, cost, data validity etc. It is difficult to define as there are so many consideration parts and components.
Fig. 1. Components of Smart City
Basically, Big Data Technologies is the improved software that incorporates data sharing, data mining, data storage and data visualization. The extensive term strains data and data framework and includes tools as well as techniques used to inquire into and transform the data. ‘Ahab’ which is a distributed and cloud-based stream processing architecture has been proposed in [11], which offer a consolidate way for the operators to understand preferably and enhance the managed infrastructure in response to data from the underlying architecture resources, i.e., connection of IoT devices, runtime edge configuration, as well as the implementation environment of overall application. In [22], time-series analysis has been used in urbane studies, by the mean time considering
Big Data for Smart Cities and Smart Villages: A Review
429
diverse environmental impacts. Analysis of the time series helps in resolving the past, which comes proficiently to warn the future. In [39] NoSQL database has been used that incorporates a spacious range of distinct database technologies. These are developing for designing modern applications. A non-relational or non SQL database that delivers a process for procurement and retrieval of data has been depicted. A multiagent method was developed using data mining by the authors for addressing the tasks of gathering and processing sensor data [19]. Different technologies for different purposes in big data platforms are used in our reviewed papers such as API, Navitia.io, CitySDK, SPARQL Query, R Programming, Predictive Analytics, Blockchain, etc. This paper reviews forty papers on sustainable smart city and smart villages that include the applications of massive information. In this paper we will discuss about the methodology of preparing a review paper, then will discuss the details of our collected paper and then will represent the overview on findings of these papers.
2 Methodology A methodical review refers to a survey of the proof on an obviously figured inquiry that utilizes deliberate and express techniques to distinguish, choose and fundamentally assess important essential exploration. It may be a review of details of previous research. This systematic review as follows gives guidelines to conduct the survey and take out the findings in a mannered way. 2.1
Phase 1- Planning
This section includes the process of how the relevant papers were selected (which database, which search terms, which inclusion/exclusion criteria).The writing consideration was sourced from prestigious sources such as Springer, ACM, MDPI, Elsevier, IEEE Access, Wiley, Taylor & Francis. The following search terms are used that were included in this audit: “big data in smart city”, “big data in urban area”, “big data in sustainable smart city”, “big data analysis in smart cities”, “big data framework for smart community”, “smart cities and smart villages research”. 2.2
Phase 2- Conducting
This section focuses on how the papers were checked. The papers were strictly observed for their reliability and validness to take as final sample papers to review on them. In order to meet with the aim of the research, the chosen papers were carefully considered. 2.3
Phase 3- Reporting
After the strict observation 40 relevant research papers were chosen for review. The research papers are categorized in five types to evaluate in a mannered way such as concept based, framework based, web analysis based, data analysis based and technology based research. The categorized evaluation identifies the contributions, process of work and flaws in research.
430
T. Jahan et al.
3 Paper Collection This section is to discuss the research paper collections for our research work. There are lots of paper published in terms of big data in smart cities and villages, but our search could find the available researches started from the year 2014. Hence, we do start reviewing on those papers which were found in the year 2014. And there are poor availability of papers in the year 2014, 2015 or 2016 individually. So we need to merge these years. Finally we gathered forty papers in terms of five categories mentioned in Sect. 2.3, with the time scale of 2014–2016, 2017, 2018 and 2019–2020. Table 1 shows the considered papers of those years at a glance.
Table 1. Evolutions of big data for smart cities and smart villages Conceptual research
Framework based research
2014–2016
2017
2018
2019–2020
Vision and paradigms [13] Transformations of studies [32]
Sustainability Union [16] Digital economy [35]
Urban Health [15]
Cloud-based [11]
Knowledge based [18] IoT for sustainability [21] Analytics framework [25] Smart Urban Planning [27]
Research in Europe [8] Systematic review [28] Synergistic Approach [24] Monitoring system [22] IoT data analytics [2] Decision management [17] Data mining [26]
Reasoning from attractors [1]
Data analysis based research
SmartSantandertestbed [10]
Web analysis based research
Big Data platform [39]
Technology based research
Connected communities [4] Data analysis [3]
Integrating multisource [31]
Processing and analysis [12] Healthcare data processing [20] Multisource geospatial [33]
Big Rhetoric in Toronto’s [36] Case Study Shanghai [14] Development [5] Smart road monitoring [19] API deployment [37]
Redefining Smart City [23] IoT and Big Data [40] Data quality Multimedia data [30] perspective [29] Transitoriented development [34] Enhancing pedestrian mobility [38] AI and Big Data [9] University Campus [7] Cities and Villages Research [6]
Big Data for Smart Cities and Smart Villages: A Review
431
4 Detailed Review of Papers This section focuses on the contribution, dataset, implementation and evaluation of each paper. Table 2 represents the conceptual research where authors have presented their thoughts on smart city using big data. Table 2. Conceptual research Paper title Vision and paradigms [13] Transformations of studies [32]
Contribution Integrated IoT ecosystems vision Transformation on urban studies
Dataset DBpedia dataset
OSM dataset, POI data, GPS records, travel survey Research in Conceptual Smart cities and villages Europe [8] boundaries research Urban Health Advances beyond Temperature, humidity, [15] infrastructure wind speed and direction, rainfall, health Sustainability Acknowledge the Examines different Union [16] obstacles definitions Data harvesting Systematic Historical, survey and Review [28] and mining local demographics processes Synergistic Intelligence to a World Population growth between 1950–2100 Approach [24] city UPTS is given Temperature, relative Monitoring Integrated humidity, CO, SO2, UV System [22] Environmental index, and noise Monitoring System Big Rhetoric in Benjamin Google LLC, personal, Toronto’s [36] Bratton’s Stack governance, Theory environmental Digital economy Applications in Concepts of 13 papers [35] digital economy 71 green parks locations Case Study Overcome the Shanghai [14] issue of Check-in distribution Development [5] Accelerated the Energy, mobility, waste community
Evaluation Development of novel services Transformations of urban studies in China Research has been promoted Sharpen the efficiency and livability Sustainability in technological endeavors Data produced, stored, mined, and visualized Improve data gathering related to the UPTS Demonstrate effects on environment
Stephen Graham’s phenomenon of ubiquitous computerized matrix Papers concept In-depth experimental analysis Deconstructed popular features
In [14], for check-in habits which practiced intensity maps and norms from LBSN data, authors reached an in-depth experimental analysis (seven districts of Shanghai). After processing the collected data of Weibo, 71 green parks locations had been chosen. Author’s focal point was overcoming Check-in distribution of visitants in various green spaces.
432
T. Jahan et al.
Authors of [5] inspected famous features of smart village as well as cities, sustainability and community. Then in accelerating the value of CCD of applied (development) projects, products and services, authors brought them back together. For the development, new community-centered method is suggested in order to highlight that only by technological solutions, sustainable living cannot be achieved. The authors considered data of Sidewalk labs which is an alternate of Google LLC and took personal, governance, environmental data and introduced Benjamin Bratton’s Stack Theory as an approach for conceptualizing the logics of smart cities specifically and more generally the digital capitalism [36]. This research [15] advances the need to enlarge the thought of Big Data beyond the architecture for comprising that of urbane health thus, serving a more compatible set of data that may guide to knowledge, as to the connectivity of community with the city and how this concerns to the thematic of urban health. In [22] the researchers gave an prove-based case study in demonstrating the effects of few factors by including- data calibration directed outdoors is not highly rigorous, cost of the approach is appropriate for collecting onsite data but may be it requires cost diminishing for mass creation for a scale-up project and lastly, the battery life of the device generally bided for 6 to 8 h. Table 3 represents the framework based research. Where in [19] a multiple agent method was established by the writers to locate the works of collecting then operating sensor data and offered a convergent model for the process of big sensor data. By using fog, mobile computing technologies and cloud includes three layers. The authors developed structure in [37] that ensures interoperability of historical energy data, real-time as well as online data for managing district energy in providing energy information to facilitate presumption operations. Authors presented a multi-tier fog computing model based on analytics service in [2] with the environment of Raspberry Pi and Spark for smart city applications. Where in the model of multiple tier fog computing they mentioned both ad-hoc fogs with the convenience computing resources. And also presented the consigned fogs with consigned computing resources. The authors established a Big Data analytics embedded (BDA) framework in [17], where two main aspects are served. First of all, it simplifies exploitation of urban Big Data (UBD) by planning, modeling and then maintaining smart cities. The second one is it occupies BDA for managing and then process massive UBD to promote the quality of urban services. In [40] researchers proposed a framework which utilizes, Hadoop Biological Community with Spark at the best to process large measure of information. Previous massive studies has been brought together for smart cities and sustainable cities by [26], which also includes research that directs at more conceptual, theoretical and overarching level.
Big Data for Smart Cities and Smart Villages: A Review
433
Table 3. Framework based research Paper title Cloud-based [11]
Dataset User API helps to provide data
IoT data analytics [2]
Contribution Able to autonomously optimize application Multi-tier fog computing model
Reasoning from attractors [1]
Introduce the principle of attractors as a novel paradigm
Attractor types, SCN context
Decision management [17] Knowledge based [18]
Facilitate exploitation
Smart road monitoring [19] IoT for sustainability [21] Analytics framework [25] Data mining [26]
Convergent model
Traffic, parking lots, pollution, water consumption Mobility, energy, health, food, education, weather forecast Traffic flow, number of road accidents, temperature indicators Environment, waste -water management, traffic, transport, buildings, energy, mobility Sensors, detectors, GPS, Chip cards, social media
Smart Urban planning [27] Redefining Smart City [23] API deployment [37] IoT and Big Data [40]
Defined an architecture
Augmenting the informational landscape Use profits of data to progress the life quality Enhanced insights and enables the better–informed resolve–making Real-time rational decision making then managing usercentric event Dimensions of culture, metabolism, and governance Explores the role of APIs Build up the shrewd city
Five datasets
Selects 14 research studies
Data from different projects
Data set from 2004 and 2018
Evaluation Able to autonomously optimize QoS aware resource management schemes Multidimensionality in identifying multifunctional communities Integrate data normalizing and filtering techniques Architecture and solution have been presented Multi-agent approach developed Brings a large number of previous studies Build sustainable city Intends to develop, adorn then discuss a systematic architecture Data filtration as well as normalization are utilized Redefined paradigm
MySQL, Couch and meta data generated
Structure ensures interoperability
Wireless gadgets, climate and water, activity datasets
Enormous measure of information
434
T. Jahan et al. Table 4. Data analysis based research
Paper title SmartSantander testbed [10]
Data quality perspective [29] Multimedia data [30] Integrating multi-source [31] Transit-oriented development [34]
Contribution Correlates into temperature, traffic, seasons and the working days Ensures potentiality of data Processing and management of multimedia data Actual land use in big cities Investigates transitoriented development attributes
Dataset SmartSantander
IEEE research papers
CCTV surveillance, New York, ICT data sets Footprint, taxi dataset, We Chat, and a POI and street view OpenStreetMap, retrieves a 2017 point of interest, obtains bikesharing
Evaluation Analyzed even when bursts behaviors are present Develops to ensure data quality actions System extracts meaningful information Precise delineating information Currency and fineness of spatiotemporal grain
Table 4 represents the data analysis based articles. Authors of this research [34] investigated the way of exploiting big and open data to examine the connections between transit-oriented development (TOD) attributes with the quantitatively outcomes of TOD. By exploiting BOD, this study reconfirms the subsistence of tradeoffs, like- composing the net ratio of frequent riders versus enhancing metro ridership to study different metro station areas. With the value of 30% or 50% car-priority streets, values of metro station area was measured by using walkability and then compared with primitive data, BOD have the advantages of currency and fineness of spatiotemporal grain. The survey in [30] helps to generate processing and management of big multimedia data has been collected from smart city applications by various machine learning algorithm such as SDL, UDL, API and operating systems including Linux, iOS and cloud computing. This work [31] provides detailed descriptive knowledge on actual land which is used in Tianhe District, China. And that shows the use of multiple sources big data about actual land usages in city. Here, author’s offered method will be individually helpful for the planning of urban, by allowing the planners in identifying original land usages in large cities of China as well as other quickly developing countries in terms of the building. Table 5 represents the web analysis based articles. [38] Provides an inherent argument for the application. Then, gave IoT technologies to boost pedestrian mobility, utilizing pedestrian movements understanding to utter future framework development. Microsoft Excel and PowerBI were used for static analysis, advanced interactive visualization and analysis as well. Also improved the flow of pedestrian in the Melbourne CBD. 2 and built a clear merger to the trends which is identified by analyzing the pedestrian data.
Big Data for Smart Cities and Smart Villages: A Review
435
Table 5. Web analysis based research Paper title Big Data platform [39]
Contribution Fill the gap between big data platform
Enhancing pedestrian mobility [38]
Pedestrian flow in the Melbourne CBD
Dataset Real time data sets and SmartSantander Testbed 53 Specific locations
Evaluation Handle both historical and real time data Foundational argument
A new platform to realize big data CiDAP has been disguised in [39] which is able to handle historical data as well as real time data. And it is flexible with different scales of data while many issues like security of data and system are ignored. The system deployed and for the next has been integrated with a continuing IoT experimental testbed and provided a valuable example for the designers of future smart city platform to fill the gap between big data platform looks alike in high level and how it must be realized. Table 6 shows the technology based research evaluation. By encouraging across scientific debate on multifarious provocation, [6] this special issue proposes a useful overview for the most recent evolutions in the multifaceted and, regularly overlying, fields of smart updated cities and smart updated villages’ research, here authors delivered a combined discussion for the major issues and challenges which are related to Smart Cities as well as Villages Research. And various soft issues related to this scientific domain containing Happiness, Well-being, Security, and Safety while giving definition to the way for future research. The proposal in [7] combines technologies such as IoT, Hadoop and Cloud Computing, in a conventional university campus, basically through the perception of data by Internet of Things. The approaches of allocated and multilevel analysis here, could be a strong starting point in finding a reliable and effective solution for the evaluation of an intelligent environment which is based on sustainability. In [12] authors inspected Internet of Things, secondly Cloud Computing, then Big Data and Sensors technologies along with the focus in finding their common operations then combine them. And offered new processes for collecting and then managing the sensors’ data in a smart building, which manages in IoT environment. For the first time, the study [33] mentioned a monotonous gravity model of uptown buildings and population. And authors built a multiple scale population model to diminish census data and achieved a high delicacy population map at a fine structural resolution of 25 m. In [20] authors, proposed PRIMIO model which introduces VM migration by accounting user mobility and cloudlets computational and esteemed the rates of resource over-provisioning by the VM migration, allowing the whole system to operate computing resources ideally. Here, the user’s mobility and outlined VM resources in cloud, oration the VM migration problem.
436
T. Jahan et al. Table 6. Technology based Research
Paper title Connected communities [4] Data analysis [3] AI and Big Data [9] Processing and analysis [12] University Campus [7]
Healthcare data processing [20] Cities and Villages Research [6] Integrating multisource [33]
Contribution TreSight, for smart traversing and defendable cultural estate Hierarchical Fog Computing architecture Offers theoretical value Combine four aforementioned technologies and functionality Implementation of an intelligent environment Model of joint VM migration includes Optimization of Ant Colony Overview of the most recent developments Iterative model
Dataset OpenDataTrentino regarding points of interest, weather, typical restaurants
Evaluation Context-Aware solution
Sensor network
Employing advanced machine learning algorithms Integrated SIS technologies
Organizations’ website, policy documents, and newspaper articles Temperature, movement, light and moisture
Consumption of drinks in examination seasons, areas with the highest population density Users mobility and cloudlet sever load
Selected 15 research studies
Population density, Land cover, Road, Real time Tencent user density (RTUD)
Find common operations and combine Facilitate management
Utilize computing resources optimally
Discussion for the key issues and challenges Evaluate equitable standard living areas in terms of census units
5 Discussion In this observation, we have expressed the concept of smart towns from the angle of different data and studied various concepts, data processing techniques and frameworks. After reviewing forty papers we have observed that there is no noticeable concept and framework to make a smart village because of many constraints. This can be notified that the rural areas are facing poverty, low level of education and finite access to technology as their main problems. As smart villages’ research is a newcomer, the researchers can focus on making a village smarter in future by exploring the issues and challenges. For making a smart village, it should be equipped with a stronger interconnection between existing and new smart technologies that have the ability to communicate with one another.
Big Data for Smart Cities and Smart Villages: A Review
437
6 Conclusion The most important purpose of a smart city is to enhance the existence of its population by means of imparting them a sustainable environment at minimum expenses. To do so it requires a realization of various facts, e.g., data collection and processing. Big data technology has provide efficient support for the bettermnet of smart cities. There are so many challenges of smart cities. Day by day technologies are being updated and for that issues of conflict in data, issues of security, issues of privacy and authenticity are creating. So to face these issues and to make reliable system towards smart area researchers have to work more carefully as for the next year the research on smart city and smart village will be the trend. So, this research mentioned forty research papers by categorizing in five sectors with time scales from the year 2014 to 2020, while summarizing author’s contributions in the field of evaluation of big data in smart cities and smart villages.
References 1. Ianuale, N., Schiavon, D., Capobianco, E.: Smart Cities, Big Data, and Communities: reasoning from the viewpoint of attractors. IEEE Access 4, 41–47 (2016). https://doi.org/10. 1109/ACCESS.2015.2500733 2. He, J., Wei, J., Chen, K., Tang, Z., Zhou, Y., Zhang, Y.: Multitier fog computing with largescale IoT data analytics for Smart Cities. IEEE Internet Things J. 5(2), 677–686 (2018). https://doi.org/10.1109/JIOT.2017.2724845 3. Tang, B., Chen, Z., Hefferman, G., Wei, T., He, H., Yang, Q.: A hierarchical distributed fog computing architecture for Big Data analysis in Smart Cities. ACM (2015). https://doi.org/ 10.1145/2818869.2818898 4. Sun, Y., Song, H., Jara, A.J., Bie, R.: Internet of Things and Big Data analytics for smart and connected communities. IEEE Access 4, 766–773 (2016). https://doi.org/10.1109/ACCESS. 2016.2529723 5. Zavratnik, V., Podjed, D., Trilar, J., Hlebec, N., Kos, A., Duh, E.S.: Sustainable and community-centred development of Smart Cities and Villages. Sustainability 12(10), 3961 (2020). https://doi.org/10.3390/su12103961 6. Visvizi, A., Lytras, M.D.: Sustainable Smart Cities and Smart Villages research: rethinking security, safety, well-being, and happiness. Sustainability 12(1), 215 (2019). https://doi.org/ 10.3390/su12010215 7. Villegas-Ch, W., Palacios-Pacheco, X., Luján-Mora, S.: Application of a Smart City model to a traditional University Campus with a Big Data architecture: a sustainable Smart Campus. Sustainability 11(10), 2857 (2019). https://doi.org/10.3390/su11102857 8. Visvizi, A., Lytras, M.: It’s not a fad: Smart Cities and Smart Villages research in European and global contexts. Sustainability 10(8), 2727 (2018). https://doi.org/10.3390/su10082727 9. Mark, R., Anya, G.: Ethics of using Smart City AI and Big Data: the case of four large European cities. ORBIT J. 2(2), 1–36 (2019). https://doi.org/10.29297/orbit.v2i2.110 10. Jara, A.J., Genoud, D., Bocchi, Y.: Big Data for Smart Cities with KNIME a real experience in the SmartSantandertestbed. Intell. Technol. Appl. Big Data Analyt. 45(8), 1145–1160 (2014). https://doi.org/10.1002/spe.2274
438
T. Jahan et al.
11. Vogler, M., Schleicher, J.M., Inzinger, C., Dustdar, S.: Ahab: a cloud-based distributed Big Data analytics framework for the Internet of Things. Big Data Cloud Things 47(3), 443–454 (2016). https://doi.org/10.1002/spe.2424 12. Plageras, A.P., Psannis, K.E., Stergiou, C., Wang, H., Gupta, B.B.: Efficient IoT-based sensor BIG Data collection–processing and analysis in smart buildings. Future Gen. Comput. Syst. 82, 349–357 (2018). https://doi.org/10.1016/j.future.2017.09.082 13. Petrolo, R., Loscrì, V., Mitton, N.: Towards a smart city based on cloud of things, a survey on the smart city vision and paradigms. Emerg. Telecommun. Technol. 28(1), e2931 (2015) 14. Liu, Q., et al.: Analysis of green spaces by utilizing Big Data to support Smart Cities and environment: a case study about the city center of Shanghai. ISPRS Int. J. Geo-Inf. 9(6), 360 (2020). https://doi.org/10.3390/ijgi9060360 15. Allam, Z., Tegally, H., Thondoo, M.: Redefining the use of Big Data in Urban Health for increased liveability in Smart Cities. Smart Cities 2(2), 259–268 (2019). https://doi.org/10. 3390/smartcities2020017 16. Kudva, S., Ye, X.: Smart Cities, Big Data, and sustainability union. Big Data Cognit. Comput. 1(1), 4 (2017). https://doi.org/10.3390/bdcc1010004 17. Silva, B., et al.: Urban planning and Smart City decision management empowered by realtime data processing using Big Data analytics. Sensors 18(9), 2994 (2018). https://doi.org/ 10.3390/s18092994 18. Badii, C., Bellini, P., Cenni, D., Difino, A., Nesi, P., Paolucci, M.: Analysis and assessment of a knowledge based Smart City architecture providing service APIs. Future Gen. Comput. Syst. 75, 14–29 (2017). https://doi.org/10.1016/j.future.2017.05.001 19. Finogeev, A., Finogeev, A., Fionova, L., Lyapin, A., Lychagin, K.A.: Intelligent monitoring system for smart road environment. J. Ind. Inf. Integr. 15, 15–20 (2019). https://doi.org/10. 1016/j.jii.2019.05.003 20. Islam, M., Razzaque, A., Hassan, M.M., Nagy, W., Song, B.: Mobile cloud-based big healthcare data processing in Smart Cities. IEEE Access 5, 11887–11899 (2017). https://doi. org/10.1109/ACCESS.2017.2707439 21. Bibri, S.E.: The IoT for smart sustainable cities of the future: an analytical framework for sensor-based big data applications for environmental sustainability. Sustain. Cities Soc. 38, 230–253 (2018). https://doi.org/10.1016/j.scs.2017.12.034 22. Wong, M., Wang, T., Ho, H., Kwok, C., Keru, L., Abbas, S.: Towards a Smart City: development and application of an improved integrated environmental monitoring system. Sustainability 10(3), 623 (2018). https://doi.org/10.3390/su10030623 23. Allam, Z., Newman, P.: Redefining the Smart City: culture, metabolism and governance. Smart Cities 1(1), 4–25 (2018). https://doi.org/10.3390/smartcities1010002 24. Lucas, C.M., de Mingo López, L., Blas, N.G.: Natural computing applied to the underground system: a synergistic approach for Smart Cities. Sensors 18(12), 4094 (2018). https://doi.org/ 10.3390/s18124094 25. Abbad, H., Bouchaib, R.: Towards a Big Data Analytics Framework for Smart Cities. ACM (2017). https://doi.org/10.1145/3175628.3175647 26. Bibri, S.E., Krogstie, J.: The Big Data deluge for transforming the knowledge of smart sustainable cities: a data mining framework for urban analytics. ACM (2018) 27. Babar, M., Arif, F.: Smart urban planning using Big Data analytics based Internet of Things. Future Gen. Comput. Syst. 77, 65–76 (2017). https://doi.org/10.1145/3123024.3124411 28. Moustaka, V., Vakali, A., Anthopoulos, L.G.: A systematic review for Smart City Data analytics. ACM Comput. Surv. 51(5), 1–41 (2019). https://doi.org/10.1145/3239566 29. Baldassarre, M.T., Caballero, I., Caivano, D., Garcia, B.R., Piattini, M.: From big data to smart data: a data quality perspective. ACM (2018). https://doi.org/10.1145/3281022. 3281026
Big Data for Smart Cities and Smart Villages: A Review
439
30. Usman, M., Jan, M.A., He, X., Chen, J.: A survey on big multimedia data processing and management in smart cities. ACM Comput. Surv. 52(3), 1–29 (2019). https://doi.org/10. 1145/3323334 31. Niu, N., et al.: Integrating multi-source big data to infer building functions. Int. J. Geograph. Inf. Sci. (2017). https://doi.org/10.1080/13658816.2017.1325489 32. Long, Y., Liu, L.: Transformations of urban studies and planning in the big/open data era: a review. Int. J. Image Data Fusion 7(4), 295–308 (2016). https://doi.org/10.1080/19479832. 2016.1215355 33. Yao, Y., et al.: Mapping fine-scale population distributions at the building level by integrating multisource geospatial big data. Int. J. Geograph. Inf. Sci. (2017). https://doi.org/ 10.1080/13658816.2017.1290252 34. Zhou, J., Yang, Y., Webster, C.: Using big and open data to analyze transit-oriented development: new outcomes and improved attributes. J. Am. Plan. Assoc. 86(3), 364–376 (2020). https://doi.org/10.1080/01944363.2020.1737182 35. Tan, K.H., Ji, G., Lim, C.P., Tseng, M.-L.: Using big data to make better decisions in the digital economy. Int. J. Prod. Res. 55(17), 4998–5000 (2017). https://doi.org/10.1080/ 00207543.2017.1331051 36. Tierney, T.F.: Big Data, big rhetoric in Toronto’s Smart City. Archit. Cult. 7(3), 351–363 (2019). https://doi.org/10.1080/20507828.2019.1631062 37. Jnr, B.A., Petersen, S.A., Ahlers, D., Krogstie, J.: API deployment for big data management towards sustainable energy prosumption in smart cities-a layered architecture perspective. Int. J. Sustain. Energy 39(3), 263–289 (2019). https://doi.org/10.1080/14786451.2019. 1684287 38. Carter, E., Adam, P., Tsakis, D., Shaw, S., Watson, R., Ryan, P.: Enhancing pedestrian mobility in Smart Cities using Big Data. J. Manag. Analyt. 7(2), 173–188 (2020). https://doi. org/10.1080/23270012.2020.1741039 39. Cheng, B., Longo, S., Cirillo, F., Bauer, M., Kovacs, E.: Building a big data platform for smart cities: experience and lessons from Santander. IEEE Access (2015). https://doi.org/10. 1109/BigDataCongress.2015.91 40. Yadav, P., Vishwakarma, S.: Application of Internet of Things and Big Data towards a Smart City. IEEE Access (2018). https://doi.org/10.1109/IoT-SIU.2018.8519920
A Compact Radix-Trie: A Character-Cell Compressed Trie Data-Structure for Word-Lookup System Rahat Yeasin Emon(&)
and Sharmistha Chanda Tista
Department of Computer Science and Engineering, Chittagong University of Engineering and Technology, Chattagram 4349, Bangladesh [email protected], [email protected], [email protected] Abstract. String words are a sequence of characters. Efficient data structure needs to store a word-list in memory to reduce the space complexity. Trie-tree is a popular word lookup data structure, whose word lookup time complexity is O (l) (‘l’ is the searched-word length). Array-based trie-tree, which has linear searching time complexity, is a memory inefficient data structure, which has lots of unused character-cells. Dynamic data structure (e.g., linked-list, binary search tree) based trie-tree, compresses character-cells through word prefix sharing. This paper proposes a more character-cells compressed, space-efficient trie-tree, for word-list storing and searching which has a new empty node property (get data from another trie-node) thus reduces character-cells requirement. The proposed trie data structure needs very few numbers of character-cells. From the experimental results, we have seen that using the proposed data structure to represent any dictionary word-list, 99.95% character-cells are compressed and 99.90% trie-nodes are empty. Keywords: Data structure Trie Radix-trie/PATRICIA-trie data structures Character-cells Space complexity
Word-lookup
1 Introduction and Background Tree is a special type of data structure, it is defined as a hierarchical collection of nodes. It has one root node and several hierarchical subtree nodes. Each node has a value or key and a list of child nodes. The bottom node in the tree hierarchy which doesn’t have a child is called the leaf node. Trie-tree [1] is a vastly used word lookup data structure whose time complexity is O (l). It is used in several types of computer-science applications such as dictionary management [6–8], generating auto word suggestion [9], spell-checking [10, 11], pattern matching [12, 13], and IP-address searching [14–16], natural language processing, data-mining [17, 18], database-system, compiler, and computer-networks, and text compression. Trie-tree is a character-wise tree where string-word are stored in a tree-type manner. The key of a node is a single character and the number of child nodes of a node is usually the size of the alphabet of the corresponding word list. For example, the size of © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 440–449, 2022. https://doi.org/10.1007/978-3-030-93247-3_43
A Compact Radix-Trie: A Character-Cell Compressed Trie Data-Structure
441
the alphabet of the English language is 26. For that case, to represent English word-list in trie-tree, the value of a node is one of the 26 English alphabets and each node has a maximum 26 number of child-nodes.
Fig. 1. Building process (word insertion) of trie-tree
The above Fig. 1 depicts the insertion process of word-list – ‘tree’, ‘trie’, ‘tank’, ‘work’, and ‘wood’ into trie-tree. If dynamic data structures (e.g., linked-list) are used to represent the child-list of a node, then the trie-tree structure has data compress ability through prefix sharing. For example, the above Fig. 1(e) is a trie-tree for words ‘tree’, ‘trie’, ‘tank’, ‘work’, and ‘wood’. These five words have total 20 character cells. But trie-tree needs 15 character cells to represent these five words. Here 25% of character cells are compressed. The proposed methodology tends to improve the trie-tree character-cells compression capability. The array-based trie-tree, that word lookup time-complexity is O(l). As its word lookup time-complexity is very low, historically this array-based trie-tree is commonly used everywhere [2–5] but it has high memory requirement. The above Fig. 2 depicts the implementation and memory requirement of arraybased trie-tree of word ‘tree’. In Fig. 2, we have shown that to build a trie-tree of the word ‘tree’, we have total 104 cells among them 4 cells are used-cells and the rest of the 100 cells are unused-cells. PATRICIA or Radix-trie [2] sometimes called compact prefix-tree, is a space optimize representation of native-trie. Radix-trie saves space by merging single parent nodes (nodes that have only one child) with its child-node.
442
R. Y. Emon and S. C. Tista
Fig. 2. Implementation of trie-tree of word ‘tree’ (array-based child-list)
Fig. 3. Radix-trie (from native trie-tree)
In Fig. 3, we have seen that single parent-nodes merged with its child-node. This compact representation of radix-trie has a minimal number of trie-nodes, which improves the space and searching time-complexity. But the Radix-trie, there can have lots of nodes that possess the identical type of data, is memory inefficient.
A Compact Radix-Trie: A Character-Cell Compressed Trie Data-Structure
443
An ASCII character-cell consumes 8 bits and a Unicode character-cell consumes 16 bits of memory. That data structure is space-efficient whose has minimal number of character-cells. This paper presents a new character-cell compressed trie-tree, for storing and searching string words. From the experimental results, we have seen that the proposed-trie can compress 99.95% character cells to represent any dictionary word list.
2 Proposed Trie 2.1
Character Path Node and Maximum Prefix Matched Node
The character sequences need to traverse a node from the root in trie-tree is termed as character-path of that node. For a searched word, the longest prefix matched characterpath-node in a trie is termed as the maximum prefix matched node (Fig. 4).
Fig. 4. Maximum prefix matched node
The above trie, to search ‘become’ word, the maximum matched prefix is ‘be’ and to search ‘begat’ word, the maximum matched prefix is ‘beg’ for that case ‘be’ character-path-node and ‘begin’ character-path-node is the maximum prefix-matchednode of search string ‘become’ and ‘begat’.
444
2.2
R. Y. Emon and S. C. Tista
Algorithm for Proposed Trie – Data Entry Procedure. Algorithm – Procedure – 1 Data entry in the proposed-trie node 1: Procedure enter_data_to_node(Node node, String word_substring) 2:
Check or create character_path_node of word_substring
3:
If such character-path-node exists or possible to create then
4:
Put node.refererenceCharacterPathNode = that character_path_node
5:
Put node.data = null
6: 7:
Else put node.data = word_substring Put node.refererenceCharacterPathNode = null
8: End procedure
Fig. 5. Proposed-trie data-entry process
The Fig. 5(a) represents a trie-tree of words (‘road’, ‘abandon’, ‘abroad’, and ‘about’), where the node’s entry-data depicts a side of a node. In Fig. 5(b), node-1 (‘road’), node-2 (‘ab’) and, node-3 (‘andon’) store the entry-data internally. In Fig. 5(c), node-4 entry-data ‘road’ already exists in trie-tree (here node-1). Node-4 points ‘road’
A Compact Radix-Trie: A Character-Cell Compressed Trie Data-Structure
445
character-path-node as data-node. In Fig. 5(d), node-5 creates ‘out’ character-path-node and points that node as data-node. 2.3
Algorithm for Proposed Trie – Insert word in Trie-tree Algorithm – Procedure-2 Insert word to proposed-trie 1: Procedure insert_word_to_proposed_trie(String word) 2: Go to maximum prefix-matched-node of word 3: If maximum prefix-matched-node is not-found then 4: Create new_node(), which is child of root-node 5: Put new_node.data = word 6: Put new_node.refererenceCharacterPathNode = null 7: Put root.childList.add(new_node) 8: Else if maximum prefix-matched-node found then 9: Put current_node = maximum prefix-matched-node of entry word 10: Check matched prefix and unmatched suffixes between current_node ************and entry string word 11: Split current_node as :: 12: Put current_node.data = matched_prefix 13: Create two new node as new_node1() and new_node2() 14: Put enter_data_to_node(new_node1, unmatched_suffix_of_current_node) 15: Put enter_data_to_node(new_node2, unmatched_suffix_of_entry_word) 16: End Procedure
The above Fig. 6 depicts the split node process of proposed-trie and Fig. 7 is the graphical representation of the proposed-trie word insertion process. Figure 7(a) depicts an empty trie-trie, starts with a root node. In Fig. 7(b), a new word ‘abandon’ is inserted in the empty trie. Root-node creates a child-node (here node-1) as it is the child-node of root it stores ‘abandon’ data internally. In Fig. 7(c) and Fig. 6, a new word ‘abroad’ is inserted. The ‘abandon’ node is the maximum prefix-matched node of the word ‘abroad’ where matched prefix is ‘ab’. Here node-1 split and creates two child nodes (node-2 and node-3). Node-1 possesses matched prefix ‘ab’, node-2 possesses ‘road’ data (the unmatched-suffix of-entryword), and node-3 possesses ‘andon’ data (the unmatched suffix of ‘abandon’ node). In Fig. 7(g), a new word ‘road’ is inserted into the proposed trie. Here we have seen that ‘road’ character-path-node already exists. Thus we only need to put a word ending sign in node-4. The character-path node lookup property reduces the character-cells of proposed trie-tree. In Fig. 8, depicts the proposed-trie of words (‘road’, ‘abroad’, ‘abandon’, ‘injury’, ‘inboard’, ‘board’, ‘juryboard’). These words have a total 44 character cells. The proposed trie data structure requires 22 character cells to store these seven words. The character-cells compressed ratio is 50% and the empty node ratio is 40%.
446
R. Y. Emon and S. C. Tista
Fig. 6. Insert new word and split node
Fig. 7. Word insertion process of proposed-trie (graphical presentation)
A Compact Radix-Trie: A Character-Cell Compressed Trie Data-Structure
447
Fig. 8. Proposed-trie
3 Experimental Results 3.1
Compaction of Character-cell of Proposed Data Structure
The following table shows the character-cells requirement of various data sets using the proposed compact-trie. Table 1. Proposed data structure character-cell compaction. Data set
Total character-cells Proposed-trie Character-cells 20,000 English words 135,418 33 466,544 English words 4,396,422 106 18,622 French words 134,303 56 26,280 German words 201,903 76 112,940 Bangla words 846,296 154 23,066 Hindi words 139,322 237
Compressed-ratio 99.98% 99.99% 99.96% 99.96% 99.98% 99.82%
In our first data set, 20,000 dictionary words have 135,418 character cells. In the third column, we have seen that to represent this huge data-set the proposed compacttrie has 33 character cells. Total 135,385 (135,418 – 33) character-cells are compressed. The character-cells compressed-ratio is 99.98%. In Table 1, for every data set, the compressed ratio is nearly 99.95%.
448
3.2
R. Y. Emon and S. C. Tista
Comparison Radix-trie and Proposed Compact-trie (Node Requirement)
In Table 2, we will show the experimental results of node requirement, of radix-trie, and the proposed compact-trie. Table 2. Node requirement, Radix-trie, and Proposed compact-trie. Data set
20,000 English words 4,66,544 English words
Radix-trie Total nodes
23,525 (all non-empty nodes) 5,96,084 (all non-empty nodes)
Proposed compact-trie Total nodes Total emptynodes 25,894 25,922 (28 nodes are non-empty) 6,26,689 6,26,625 (64 nodes are non-empty)
Empty-nodes percentage 99.89%
99.98%
The first data set, radix-trie has 23,525 non-empty nodes to store 20,000 dictionary words. But the proposed compact-trie has 25,922 nodes to represent the same data set, among them, the number of empty nodes is 25,894 and the number of non-empty nodes is 28 (25,922 - 25,894). The empty-nodes percentage is 99.89%. Here we have seen that for every data set, the proposed compact-trie empty-nodes percentage is nearly 99.90%.
4 Conclusion This paper has introduced a character-cells compressed, improved trie-tree for word lookup system. We have introduced a new empty node property to trie-tree. From the experimental results, we have seen that 99.90% of the proposed trie nodes are empty. These empty nodes reduce the character-cells requirement to a large extent. To represent any popular dictionary word-list, the proposed-trie can compress almost 99.95% of character cells. An ASCII character consumes 8 bits of memory, and a Unicode character consumes 16 bits of memory. The proposed data structure reduces space complexity by reducing the number of character cells. As the proposed trie compresses word-list character cells to a large extent, the methodology can be used as a text compression algorithm. Based on the proposed compact radix-trie, we will try to publish a text compression algorithm in the coming days.
A Compact Radix-Trie: A Character-Cell Compressed Trie Data-Structure
449
References 1. Fredkin, E.: Trie memory. Commun. ACM 3, 490–499 (1960) 2. Morrison, D.R.: PATRICIA—practical algorithm to retrieve information coded in alphanumeric. J. ACM 15(4), 514–534 (1968) 3. Askitis, N., Sinha, R.: HAT-trie: a cache-conscious trie-based data structure for strings. In: Proceedings of the 30th Australasian Conference on Computer science, pp. 97–105 (2007) 4. Heinz, S., Zobel, J., Williams, H.: Burst tries. ACM Trans. Inf. Syst. 20, 192–223 (2002) 5. Hanandeh, F., Alsmadi, I., Akour, M., Daoud, E.: KP-trie algorithm for update and search operations. Int. Arab J. Inf. Technol. 13(6) (2016) 6. Parmar, P., Kumbharana, C.K.: Implementation of trie structure for storing and searching of English spelled homophone words. Int. J. Sci. Res. Publ. 7(1) (2017) 7. Ferrández, A., Peral, J.: MergedTrie: efficient textual indexing. PLOS ONE 14, e0215288 (2019) 8. Aoe, J.-I., Morimoto, K., Sato, T.: An efficient implementation of trie structures. Softw. Pract. Exp. 22(9), 695–721 (1992) 9. Boo, V.K., Anthony, P.: A data structure between trie and list for auto completion. In: Lukose, D., Ahmad, A.R., Suliman, A. (eds.) KTW 2011. CCIS, vol. 295, pp. 303–312. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32826-8_31 10. Bhaire, V.V., Jadhav, A.A., Pashte, P.A., Magdum, P.G.: Spell checker. Int. J. Sci. Res. Publ. 5(4) (2015) 11. Xu, Y., Wang, J.: The adaptive spelling error checking algorithm based on trie tree. In: 2nd International Conference on Advances in Energy, Environment and Chemical Engineering (AEECE) (2016) 12. Deng, D., Li, G., Feng, J.: An efficient trie-based method for approximate entity extraction with edit-distance constraints. In: 2012 IEEE 28th International Conference on Data Engineering (2012) 13. Baeza-Yates, R.A., Gonnet, G.: Fast text searching for regular expressions or automaton searching on tries. J. ACM 43(6), 915–936 (1996) 14. Lim, H., Yim, C., Swartzlander, E.E.: Priority tries for IP address lookup. IEEE Trans. Comput. 59(6), 784–794 (2010) 15. Nilsson, S., Karlsson, G.: Ip-address lookup using LC-tries. IEEE J. Select. Areas Commun. 17(6), 1083–1092 (1999) 16. Thair, M., Ahmed, S.: Tree-combined trie: a compressed data structure for fast IP address lookup. Int. J. Adv. Comput. Sci. Appl. 6(12) (2015) 17. Qu, J.-F., Liu, M.: A fast algorithm for frequent itemset mining using Patricia* structures. In: Cuzzocrea, A., Dayal, U. (eds.) DaWaK 2012. LNCS, vol. 7448, pp. 205–216. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32584-7_17 18. Savnik, I., Akulich, M., Krnc, M., Škrekovski, R.: Data structure set-trie for storing and querying sets: theoretical and empirical analysis. PLOS ONE 16(2), e0245122 (2021)
Digital Twins and Blockchain: Empowering the Supply Chain Jose Eduardo Aguilar-Ramirez1 , Jose Antonio Marmolejo-Saucedo1(B) , and Roman Rodriguez-Aguilar2 1
2
Facultad de Ingenier´ıa, Universidad Panamericana, Augusto Rodin 498, 03920 Ciudad de M´exico, Mexico [email protected] Facultad de Ciencias Econ´ omicas y Empresariales, Universidad Panamericana, Augusto Rodin 498, 03920 Ciudad de M´exico, Mexico [email protected]
Abstract. Industry 4.0 is here, and it arrived with very promising new technologies that can foster he supply chain management across industries. In this paper we review multiple sources to identify the main characteristics of Digital Twins and Blockchain technologies and how they can work together to fulfill the needs of the supply chain. We identify some advantages and disadvantages that must be properly analyzed before adopting this approach into any business. Many applications behind these new benefits are still in development, but we believe these two technologies have great potential.
Keywords: Digital Twin Digitalization
1
· Blockchain · Supply chain · ERP ·
Introduction
Technology has had a huge impact on the development of the human race. Industry 1.0 was led by the steam machines surpassing human capacity; Industry 2.0 was led by the introduction of electricity in factories, as well as the assembly line of Henry Ford; Industry 3.0 was led by the development of computer automation and information technology (IT). Now, we are facing industry 4.0, led by the internet of things (IoT), Artificial Intelligence (AI), computer-based algorithms such as machine learning, and all the above mentioned connected to display data in real time to make decisions. In this new type of industry where everything is connected and digitalized, the need of sharing data in real time for better decision making, while maintaining data integrity throughout the supply chain is essential. That is where Digital Twins (DT) and Blockchain (BC) comes in. Digital Twins helps you to replicate any physical object or system in a digital environment, where you can run multiple tests as well as monitor the current c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 450–456, 2022. https://doi.org/10.1007/978-3-030-93247-3_44
Blockchain on Digital Twins
451
Fig. 1. Document type available in Scopus data base
state of the object. This technology relies heavily on the IoT and the connectivity and digitalization of all components. Blockchain on the other hand, is a technology that makes a good partnership with digital twins in terms of data integrity and storage. This paper will briefly describe how these two technologies complement each other and explain benefits and some limitations.
2
Research Methodology
Since our goal is to analyze benefits and limitations of digital twins with blockchain technology, we focused our search mainly behind these key words: Digital Twins and Blockchain. In Table 1 shows the number of papers currently available on Scopus database. Table 1. Number of documents per key words available in Scopus data base Key Words
Documents
Blockchain
23,937
Digital Twins
6,997
Digital Twins and Blockchain
101
Digital Twins and Blockchain and supply chain 16
The main source of information came mainly from the following type of documents: Articles, Conference Papers and book chapters. Other sources were used for getting some real-life examples. The information is presented in Fig. 1.
452
J. E. Aguilar-Ramirez et al.
Fig. 2. Documents available by year
Results for Digital Twins combined with Blockchain increased over time, indicating that there is increasing interest in the topic, as demonstrated in Fig. 2. As conclusion, the results for DT and BC’s integration with the supply chain had the lowest search results and highlights this is an area that needs more research. 2.1
Literature Review
Digital Twin. DT is a new kind of technology. The term “twin” was first used by NASA’s Apollo program in 2010, they used two identical and physical space vehicles to mirror the conditions of the other vehicle during the mission Boschert and Rosen (2016). Although this first approach didn’t include a digital representation, it clearly showed great benefits. Its scope has changed throughout the years. It was first described as a prototype that could mirror real conditions for simulation Boschert and Rosen (2016), and also a tool to assist on the product life cycle management of a product Huang et al. (2020). Ultimately it also refers to the digital representation of its counterpart and lets companies manage the life cycle of their products or even the supply chain Dietz and Pernul (2020). Now a days, many companies use DT in order to simulate and test different circumstances without closing or delaying the daily activities Felipe Bustamante and Singh (2020). DT provides a real time visualization into what is happening with the physical asset. This enables the past, present and future performance of the asset to be tracked and in combination with BC, guarantees the integrity of the information by recording every transaction of the asset Raj (2021).
Blockchain on Digital Twins
453
Blockchain. The concept was introduced in 2008 by Satoshi Nakamoto. BC is a data structure chained to each other in a sequential order. The reason why this technology is considered tampered-proofed is because of a public cyphered ledger. This means any modification is registered in a block and each block is connected to the previous and next block, creating a “blockchain” Zhong and Huang (2020). This secure BC function that fulfills different needs of security across the supply chain. Those abilities have already been identified by Enterprise Resource Planning (ERP) vendors and they are currently trying to integrate those to their ERP systems Parikh (2018). Sometimes information must be shared with partners outside the company, and since the connection between different ERP’s system from partners is not allowed or has to be verified by another entity, information doesn’t flow as efficient as it could; with this new model of decentralized platforms based on blockchain communication efficiency will increase Sokolov and Kolosov (2021). ERP and BC also share something in common: both store information from multiple areas of the business and both share the same information to other areas, the difference relay on the accessibility of this data as well as how they store it. BC makes it in a decentralized way, while ERP store everything in one place Haddara et al. (2021). BC also has a powerful tool named smart contracts. These smart contracts allow a natural flow of decisions in the supply chain without the need of a central authority checking if the conditions are enough to keep the flow Borowski (2021). The terms in which smart contracts work must be settle with experts from the same company or between different companies Nielsen et al. (2020). Boundaries must be applied to limit access to contracts and prevent them to be modified, as well as full transparency of who altered it Putz et al. (2021). 2.2
Integrating DT and BC
Since a Digital Twin is a digital representation of any physical entity, it is not only limited to one object, but can also be the mirror of a whole system with multiple individual entities Hemdan and Mahmoud (2021), Dietz and Pernul (2020). For example: we can create a DT of a building and by connect that DT with another building of the same block, we could create a system describing the whole block and so on. We could create a digital twin of a city following this logic. By combining both technologies, the storage and transmission of valuable information that de DT of the system carries would be safe and restricted thanks to its public and cyphered ledger Hasan et al. (2020), Wang et al. (2021). When connecting multiple DT’s with each other, a huge amount of information and storage is required from a centralized system. With BC technology integrated, the decentralized system can be used for better performance Wang et al. (2021). One proposed model of integration for both technologies is the use of peer-topeer networks that enables effective communication between participants from the same team as well as from other teams inside or outside the company Huang et al. (2020).
454
2.3
J. E. Aguilar-Ramirez et al.
Benefits
By fostering DT with BC technology, the main concerns behind the level of digitalization, data storage and transmission security of information are tackled. This has already been mentioned previously. By keeping data secure, intellectual property rights (IPR) that are shared through the supply chain by creating its DT, could detect any tampering and avoid any leaks Nielsen et al. (2020). For example, when buying a pre-owned car, with these technologies, you would track how many owners the car has had, and which parts of the vehicle are still original thanks to its DT with BC technology. Heber et al. (2017). Likewise, thanks to this traceability functions, any manufacturer could be able to detect a failure in a batch production item and easily track and correct those failures, increasing the operational and service levels Hemdan and Mahmoud (2021). As stated before, implementing a DT enhanced with BC in each product would create a digital certification that would nullify any kind of fraud and detect fake products with this proof of authenticity Raj (2021). At last, using and analyzing historical data, we can simulate when any potential breakdown may occur and prepare for that scenario. This ability is also known as “predictive twin” Raj (2021). So DT is not only limited to product life management, but also to a way to predict possible outcomes of the physical twin. 2.4
Limitations
The level of digitalization in businesses required to enable these new technologies from industry 4.0 is high. DT and BC rely heavily on IoT, sensors, machine learning and 5G to capture, transfer and analyze data. Without any of those, the process could not be possible to get. Heavy investment behind these systems would be required to obtain these benefits Nielsen et al. (2020). Since a high level of connectivity is required between multiple sensors and systems to measure, analyze, and display data in real time of the physical asset; there is a question behind how many sensors are required to get a complete evaluation of the object? There is no simple answer. The number of sensors will vary depending on the industry. A very suitable approach would be to track only key inputs that are needed to complete your objective Aaron Parrott and Warshaw (2020). This also leads us to requirements of data transmission. Linked with the level of digitalization, multiple sensors for every DT created demands a lot of processing, storage and transmission capacity; as long as those tools get higher performance, there might be a bottleneck inside the transmission process Tao et al. (2020). Later, the distrust between different companies of the same chain is a boundary that needs to be addressed Tao et al. (2020); but with BC technology in DT, a solution is available to this solve behavior since it is an immutable and secure way Hasan et al. (2020). And finally, BC tech must be standardized across all
Blockchain on Digital Twins
455
industries in order to create a successful connection of all DT and systems. Since BC is still being developed that formed, a common construction is still pending Tao et al. (2020).
3
Conclusion
Lots of the BC technology is yet to be investigated and properly discussed so far. There are many applications that are still in early stages of development and in some cases, far from industrial operation Sokolov and Kolosov (2021). For that same reason, maybe a full proven system/process that involves a DT with BC technology will not be available soon, but there is a lot of potential behind those technologies. Imagine a whole supply chain industry connected: providers, manufacturers, and customers, where each product has its own DT that provides immutable information and let customers validate its genuine origin and companies deliver great service level. This could be possible with both technologies working together. To get to that level, the aircraft industry has long been users of DT and are a high-tech sector, that are keen on tracking all components of an aircraft throughout all its lifetime Mandolla et al. (2019). For this reason, they may be a benchmark for every other industry that is looking for best practices using these new tools.
References Aaron Parrott, B.U., Warshaw, L.: Digital twins bridging the physical and digital Deloitte (2020) Borowski, P.F.: Digitization, digital twins, blockchain, and Industry 4.0 as elements of management process in enterprises in the energy sector. Energies 14(7), 1885 (2021) Boschert, S., Rosen, R.: Digital twin—the simulation aspect. In: Hehenberger, P., Bradley, D. (eds.) Mechatronic Futures, pp. 59–74. Springer, Cham (2016). https:// doi.org/10.1007/978-3-319-32156-1 5 Dietz, M., Pernul, G.: Digital twin: empowering enterprises towards a system-of-systems approach. Bus. Inf. Syst. Eng. 62(2), 179–184 (2020) Felipe Bustamante, J.H., Dekhne, A., Singh, V.: Improving Warehouse OperationsDigitally. Mckinsey (2020) Haddara, M., Norveel, J., Langseth, M.: Enterprise systems and blockchain technology: the dormant potentials. Procedia Comput. Sci. 181, 562–571 (2021) Hasan, H.R., et al.: A blockchain-based approach for the creation of digital twins. IEEE Access 8, 34113–34126 (2020) Heber, D., Groll, M., et al.: Towards a digital twin: how the blockchain can foster E/E-traceability in consideration of model-based systems engineering. In: DS 87-3 Proceedings of the 21st International Conference on Engineering Design (ICED 17), Product, Services and Systems Design, Vancouver, Canada, 21–25 August 2017, vol. 3, pp. 321–330 (2017) Hemdan, E.E.-D., Mahmoud, A.S.A.: BlockTwins: a blockchain-based digital twins framework. In: Choudhury, T., Khanna, A., Toe, T.T., Khurana, M., Gia Nhu, N. (eds.) Blockchain Applications in IoT Ecosystem. EICC, pp. 177–186. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-65691-1 12
456
J. E. Aguilar-Ramirez et al.
Huang, S., Wang, G., Yan, Y., Fang, X.: Blockchain-based data management for digital twin of product. J. Manuf. Syst. 54, 361–371 (2020) Mandolla, C., Petruzzelli, A.M., Percoco, G., Urbinati, A.: Building a digital twin for additive manufacturing through the exploitation of blockchain: a case analysis of the aircraft industry. Comput. Ind. 109, 134–152 (2019) Nielsen, C.P., da Silva, E.R., Yu, F.: Digital twins and blockchain-proof of concept. Procedia CIRP 93, 251–255 (2020) Parikh, T.: The ERP of the future: blockchain of things. Int. J. Sci. Res. Sci. Eng. Technol. 4(1), 1341–1348 (2018) Putz, B., Dietz, M., Empl, P., Pernul, G.: EtherTwin: blockchain-based secure digital twin information management. Inf. Process. Manag. 58(1), 102425 (2021) Raj, P.: Empowering digital twins with blockchain. Adv. Comput. 121, 267 (2021) Sokolov, B., Kolosov, A.: Blockchain technology as a platform for integrating corporate systems. Autom. Control Comput. Sci. 55(3), 234–242 (2021) Tao, F., et al.: Digital twin and blockchain enhanced smart manufacturing service collaboration and management. J. Manuf. Syst. (2020) Wang, W., Wang, J., Tian, J., Lu, J., Xiong, R.: Application of digital twin in smart battery management systems. Chin. J. Mech. Eng. 34(1), 1–19 (2021) Zhong, S., Huang, X.: Special Focus on Security and Privacy in Blockchain-Based Applications. Science China Press (2020)
Detection of Malaria Disease Using Image Processing and Machine Learning Md. Maruf Hasan(B) , Sabiha Islam , Ashim Dey(B) , Annesha Das , and Sharmistha Chanda Tista Computer Science and Engineering, Chittagong University of Engineering and Technology, Chittagong 4349, Bangladesh {u1604089,u1604070}@student.cuet.ac.bd, {ashim,annesha,tista chanda}@cuet.ac.bd
Abstract. Malaria is a contagious disease that claims millions of lives each year. A standard laboratory malaria diagnosis requires a careful study of both healthy and infected red blood cells. Malaria can be diagnosed by looking at a drop of the patient’s blood under a microscope and opening it on a slide as a blood smear. The quality of the blood smear also influences its accuracy and correctness in the classification and detection of malaria disease. This results in a large number of inevitable errors, which are not acceptable. The goal of this research is to create a computer-aided method for the automatic detection of malaria parasites using image processing and machine learning techniques. Uninfected or parasitized blood cells have been classified using handcrafted features extracted from red blood cell images. We have implemented Adaboost, K-Nearest Neighbor, Decision Tree, Random Forest, Support Vector Machine and Multinomial Naive Bayes machine learning models on a dataset of 27,558 cell images. Among these algorithms, Adaboost, Random Forest, Support Vector Machine, and Multinomial Naive Bayes achieved an accuracy of about 91%. Furthermore, the ROC curve demonstrates that the Random Forest classification model is the best. We hope that by decreasing the requirement for human intervention throughout the detection process, this approach can greatly improve the efficiency of malaria disease detection. Keywords: Malaria disease · Blood smear images · Image processing · Machine learning · Computer-aided diagnosis
1
Introduction
Malaria has become one of the severe infectious diseases for humankind. The bite of Anopheles mosquitoes is the main reason for transmitting this disease. According to Wikipedia, out of 400 species, only 30 species of Anopheles mosquitoes are malaria vectors. Nowadays, it is a serious public health issue around the globe, particularly in third-world countries. As per WHO (World Health Organization), 1.5 billion malaria cases were averted since 2020, but 4,09,000 people c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 457–466, 2022. https://doi.org/10.1007/978-3-030-93247-3_45
458
Md. M. Hasan et al.
died of malaria in 2019 [1]. The transmission of the malaria virus depends on climate conditions. Especially during the rainy season, this disease spreads rapidly because this is the breeding season for the Anopheles mosquitoes. It grows more intense when temperature rises to the point that a mosquito’s life span can be extended. Regarding the temperature issue, in many tropical areas such as Latin America, Asia, and also Africa, the malaria disease spreading rate is around 90%. According to WHO, in 2019, about 50% of the entire world’s population was in danger of malaria. Malaria is the leading cause of death in Sub-Saharan Africa. Western Pacific, the Eastern Mediterranean, the South-East Asia, and the Americas have all been recognized as high-risk zones by the WHO [2]. Most of the time, malaria can be predominant in remote areas where it is hard to find proper medical treatment. It is critical to detect malaria disease early and administer appropriate treatment; otherwise, the disease can be fatal. Qualified microscopists examine blood smears of infected erythrocytes as one typical method of detecting malaria. These are traditional diagnostic methods used in laboratories by microscopists, such as clinical diagnosis. Microscopic diagnoses are the most widely used malaria diagnosis procedures, taking only 15 min to complete. But the efficiency and accuracy of these methods are depended on the degree of human proficiency, which is challenging to find most of the time. Otherwise, accuracy fluctuates. Polymerase Chain Reaction (PCR) is the most sensitive and specific approach to recognize malaria parasites and is more typical for species identification [3]. Microscopists use an alternative PCR method for malaria diagnosis that allows sensitive and specific detection of Plasmodium species DNA from peripheral blood. Rapid Diagnostic Test (RDT), which is also a microscopic diagnosis method that provides high-quality microscopy services in distant locations with limited access for reliable detection of malaria infections [4]. This method is unsuccessful in some cases because effective results depend on the experience and knowledge of microscopists, and also, human error is inevitable. If there were more efficient automated diagnostic methods available for malaria detection, then this disease could easily be controlled. Recently, There are many automated machine learning or deep learning approaches have come across to detect this disease, which are claimed to be more efficient than conventional approaches [5–9]. In this work, we have used machine learning algorithms with automatic image recognition technologies for detecting parasite-infected red blood cells on standard microscope slide images. We have used the image smoothing technique, gray scale conversion and feature extraction. The main objectives of our work are: – To locate region of interest and extract key features from standard microscopic images of red blood cells using image processing techniques. – To train various machine learning models using the extracted features for classifying healthy and parasitized red blood cells. – To find the most suitable approach based on different evaluation metrics for detecting malaria disease.
Detection of Malaria Disease Using Image Processing and Machine Learning
459
The rest of the paper is arranged as follows: Sect. 2 presents related works we have investigated. Our methodology is illustrated in Sect. 3. Section 4 exhibits the obtained results in details. In the end, Sect. 5 concludes the paper.
2
Related Work
Nowadays, malaria has become a fatal life-threatening disease, causing deep research interest among scientists all over the world. Different techniques, methods, and algorithms have been used to detect parasitic blood cells in recent times. In the domain of machine learning, mostly the handcrafted features are used for decision making. Previously, the feature extraction was dependent on morphological factors [10] and the classification was analyzed by Support Vector Machine (SVM) and Principle Component Analysis (PCA). In disease recognition studies, Convolutional Neural Networks (CNN) gained stimulating results in terms of efficiency and accuracy [5]. In the advanced method, it is found that CNN is much more effective than the SVM classifier method for the purpose of image featuring [6]. In [7], to extract features of the optimal layer of a pretrained model, the 16-layered CNN model got a detection accuracy of 97.37% which is claimed to be superior to other transfer learning model with an accuracy of 91.99%. The CNN model was also explored for extracting features from the 96 × 96 resolution cell image data in [8]. Among the CNN architectures, the GoogleNet, ResNet, and VGGNet models showed an accuracy rate in the range of 90% to 96%. They used Contrast Limited Adaptive Histogram Equalization (CLAHE) for pre-processing the images to enhance the quality. In [9], they have introduced the Multi-Magnification Deep Residual Network, an enhanced deep learning approach for the categorization of microscopic blood smear photos. They have handled the problem of vanishing gradients, degradation, low-quality images by combining batch normalization and individual residual units. There are multiple image pre-processing techniques for instance, image enhancement and feature extraction that can be used. In [11], images were converted into grayscale and then Gray Level Co-occurrence Matrix (GLCM). Histogram of Oriented Gradients (HOG), Local Binary Pattern (LBP) was being applied for feature extraction. By using these pre-processing methods, different machine learning algorithms had the highest accuracy of 97.93% with the use of the Support Vector classification model. In [12], they have used different machine learning algorithms such as Cubic SVM, Linear SVM, and Cosine KNN, but Cubic SVM got the highest accuracy of 86.1% among them. They have tested only 110 thin films for their system. To choose a suitable and highly precise model for detecting the malaria parasite from a microscopic blood smear, autoencoder training from deep learning showed an accuracy of 99.23% with nearly 4600 flops of image [2]. Precisely this model with 28 × 28 images gave an accuracy of 99.51% whereas 32 × 32 images gave an accuracy of 99.22%. They compromised too little accuracy, only
460
Md. M. Hasan et al.
0.0029, to obtain a slightly higher image resolution quality for sensitive, specific, and precise performance on a smartphone, as well as a low-cost phone and web application for portable malaria diagnosis.
3
Methodology
First, we have identified a series of steps and designed a methodology to achieve our goal. Our overall methodology is represented in Fig. 1. A publicly available dataset was used in this work. The techniques for obtaining data, preprocessing and model training are covered in the following subsections.
Fig. 1. Block diagram of our methodology.
3.1
Dataset Description
The first step is to collect images of blood smears from malaria patients. We collected the dataset from Kaggle which is publicly available [13]. This dataset has 27,558 blood cell images which are divided into two classes: cells infected with
Detection of Malaria Disease Using Image Processing and Machine Learning
461
malaria, which have 13779 data, and cells that are not infected with malaria, which also have 13779 data. The original source of this dataset is Chittagong Medical College Hospital, Bangladesh [14]. Thin blood sample slides were collected by photographing 150 P. falciparum-infected, which is commonly known as malaria-infected, and 50 healthy patients. Figure 2 shows some sample data from the dataset.
Fig. 2. Sample images (a) Uninfected and (b) Parasitized.
3.2
Data Preprocessing
Transforming raw data before applying a machine learning algorithm is called preprocessing. Preprocessing data is an important phase in ML as the quality of data as well as functional details can be retrieved from it, which greatly affects the performance and correctness of a model. Image preprocessing begins with the input of an image and then performs some operations on that image, such as image sharpening, image filtering, and image enhancement. Initially, we have used the original images as an input, as shown in Fig. 3 and resized them to 120 × 120. Images can be smoothened by different blurring techniques such as Averaging, Gaussian Blurring, Median Blurring, Bilateral Filtering provided by OpenCV. Blurring techniques are beneficial in removing noise from images. Here, smoothing is accomplished using the Gaussian blur technique, as illustrated in Fig. 3. We have used Gaussian blurring, which is a very effective tool for removing Gaussian noise from images. We have used OpenCV and Python to convert images into grayscale images after smoothing them. We are more interested in the patterns of these images because there isn’t much information in color as a whole. 3.3
Feature Extraction
In this step, we have identified our region of interest from the preprocessed images. To locate the infected areas in these images, we have attempted to detect all contours. Simply, Contours are a curve which connects all continuous points (along the boundary) that have the similar intensity or color. Contours are an
462
Md. M. Hasan et al.
Fig. 3. Data processing steps.
effective tool for analysing shape as well as for object detection and recognition. For our work, features are extracted in this step by obtaining the five largest contour areas or bounded regions. When we have got higher accuracy for the five largest contour areas, but when we have considered less than the five largest areas, the accuracy is reduced. By considering more than five of the largest areas, accuracy remains the same. For uninfected images, we have got 1 contour area in 12544 images out of 13779 images, and only 273 images have 5 contour areas. For parasitized images, out of 13779 images, only 1585 images have 1 contour area and 1585 images have 5 contour areas.
Detection of Malaria Disease Using Image Processing and Machine Learning
3.4
463
Model Training
To detect uninfected and parasitized blood smears, six classifiers have been selected for training. They are AdaBoost (AD), K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), and Multinomial Naive Bayes (MNB). To train and evaluate our model, we have used 70% of the images for training and 30% for testing. To find out which model is better for detecting malaria disease, the suggested technique’s performance is evaluated using graphical and statistical indicators, including the confusion matrix, accuracy, F1-score, recall, precision, and ROC curve. The confusion matrix generates an array containing the number of true positives (TP), false positives (FP), false negatives (FN), and true negatives (TN). – Accuracy: The accuracy estimates the ratio of expected to actual values, regardless of whether the sample is positive or negative, as illustrated in the given Formula. TP + TN (1) Accrcy = TP + TN + FP + FN – Precision: Precision is defined as the ratio of all positive samples that are actually positive, as illustrated in the given Formula. Precson =
TP
(2)
TP + FP
– Recall: The recall is defined as the ratio of positive predictions to all positive predictions, as illustrated in the given Formula. Rec =
TP
(3)
TP + FN
– F1 score: The F1 metric is used to describe the classification performance of the system.As illustrated in the given Formula, it is calculated using the recall and precision rates. F1 =
4
2 ∗ Precson ∗ Rec Precson + Rec
=
2 ∗ TP 2 ∗ TP + FP + FN
(4)
Result Analysis
After following the steps mentioned earlier in preprocessing and feature extraction, the classifiers are trained using the Scikit-learn library. The performance of these classifiers is compared as shown in Table 1. The overall classification performance varies between 84% and 91%. According to the classification report of Table 1, the performance of the SVM, AD, RF, and MNB is slightly better in terms of test accuracy and classification report. These classifiers achieved
464
Md. M. Hasan et al.
Fig. 4. Confusion matrices.
an average accuracy of 90.63%. Figure 4 shows the confusion matrices of the implemented classifiers. We can see that SVM can predict 3733 images correctly as parasitized and 3703 images correctly as uninfected, AD can predict 3700 images correctly as parasitized and 3734 images correctly as uninfected, RF can predict 3735 images correctly and 3694 images correctly as uninfected, MNB can predict 3692 images correctly as parasitized and 3713 images correctly as uninfected. Then we explored the stacking ensemble technique by combining the best performing models, but in Table 1 we can see that the test accuracy is 90.71%, which is lower than the test accuracy of the RF classifier. To select the best model among the four models with the same accuracy, we have further investigated the AUC-ROC curve as shown in Fig. 5. The objective of the AUC-ROC curve is to present the model’s overall detection rate. The horizontal line in the diagram indicates the model’s false-positive rate, while the vertical line indicates the model’s true-positive rate. We can conclude that the performance of the Random Forest Classifier is noticeably superior in terms of AUC as measured by the ROC curve.
Detection of Malaria Disease Using Image Processing and Machine Learning
465
Fig. 5. ROC curve. Table 1. Classification report in weighted average.
5
Model
Accuracy Precision Recall F1-Score
DT
83.62
83.63
83.63
83.62
AD
90.59
90.57
90.64
90.58
KNN
88.05
88.04
88.08
88.04
RF
90.76
90.75
90.77 90.76
MNB
90.54
90.53
90.58
SVM
90.64
90.63
90.65
90.64
Ensemble 90.71 (AD+RF+SVM+MNB)
90.70
90.73
90.71
90.54
Conclusion
Malaria is a contagious mosquito-borne disease and diagnosis of this disease requires thorough and careful examination of red blood smears. This diagnosis procedure is not only time-consuming but also its accuracy relies on the expertise of pathologists. Now-a-days, machine learning has become a popular strategy for handling the most complicated real-world issues. In this work, we have utilized machine learning along with image processing for reliable diagnosis of malaria disease. First, handcrafted features were extracted by identifying region of interest from a dataset of 27,558 microscopic images. For this purpose, five largest contours have been considered from the preprocessed images. Then, six machine learning models along with an ensemble model were trained using the extracted features. We successfully identified the results of parasitized and healthy nonparasitized photos of blood smears with the highest accuracy of about 91%. In future, we aim to incorporate deep learning approaches in this work for more accurate analysis and classification of red blood smear images.
466
Md. M. Hasan et al.
References 1. Who, “fact sheet: World malaria report 2020,” in world health organization, world health organisation (2020). https://www.who.int/teams/global-malariaprogramme/reports/world-malaria-report-2020. Accessed 23 Oct 2021 2. Fuhad, K.M.F., Tuba, J.F., Sarker, M.R.A., Momen, S., Mohammed, N., Rahman, T.: Deep learning based automatic malaria parasite detection from blood smear and its smartphone based application. Diagnostics 10(5) (2020). https://www. mdpi.com/2075-4418/10/5/329 3. H¨ anscheid, T., Grobusch, M.P.: How useful is PCR in the diagnosis of malaria? Trends Parasitol. 18(9), 395–398 (2002) 4. Wongsrichanalai, C., Barcus, M., Sinuon, M., Sutamihardja, A., Wernsdorfer, W.: A review of malaria diagnostic tools: microscopy and rapid diagnostic test (RDT). Am. J. Trop. Med. Hyg. 77, 119–27 (2008) 5. Khan, S., Islam, N., Jan, Z., Ud Din, I., Rodrigues, J.J.P.C.: A novel deep learning based framework for the detection and classification of breast cancer using transfer learning. Pattern Recogn. Lett. 125, 1–6 (2019). https://www.sciencedirect.com/ science/article/pii/S0167865519301059 6. Lecun, Y., Bengio, Y.: Convolutional networks for images, speech, and time-series (1995) 7. Liang, Z., et al.: CNN-based image analysis for malaria diagnosis. In: 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 493–496 (2016) 8. Militante, S.V.: Malaria disease recognition through adaptive deep learning models of convolutional neural network. In: 2019 IEEE 6th International Conference on Engineering Technologies and Applied Sciences (ICETAS), pp. 1–6 (2019) 9. Pattanaik, P., Mittal, M., Khan, M.Z., Panda, S.: Malaria detection using deep residual networks with mobile microscopy. J. King Saud Univ. Comput. Inf. Sci. (2020). https://www.sciencedirect.com/science/article/pii/S1319157820304171 10. Linder, N., et al.: A malaria diagnostic tool based on computer vision screening and visualization of plasmodium falciparum candidate areas in digitized blood smears. PLOS ONE 9(8), 1–12 (2014). https://doi.org/10.1371/journal.pone.0104855 11. Kumari, U., Memon, M., Narejo, S., Afzal, M.: Malaria disease detection using machine learning (2021). https://www.researchgate.net/publication/348408910 Malaria Disease Detection Using Machine Learning 12. Kumari, U., Memon, M., Narejo, S., Afzal, M.: Malaria detection using image processing and machine learning. IJERT NTASU-2020, 09(03) (2021) 13. Arunava: Malaria cell images dataset. https://www.kaggle.com/iarunava/cellimages-for-detecting-malaria. Accessed 23 Oct 2021 14. Rajaraman, S., et al.: Pre-trained convolutional neural networks as feature extractors toward improved malaria parasite detection in thin blood smear images. PeerJ 6, e4568 (2018)
Fake News Detection of COVID-19 Using Machine Learning Techniques Promila Ghosh1 , M. Raihan1(B) , Md. Mehedi Hassan1 , Laboni Akter2 , Sadika Zaman1 , and Md. Abdul Awal3 1
2
North Western University, 9100 Khulna, Bangladesh [email protected], [email protected], [email protected] Khulna University of Engineering and Technology, 9203 Khulna, Bangladesh 3 Khulna University, 9208 Khulna, Bangladesh [email protected]
Abstract. Covid-19 or Coronavirus is the most popular common term in recent time. The SARS-CoV-2 virus caused a pandemic of respiratory disturbance which is named as COVID-19. The coronavirus is outspread through drop liquids as well as virus bits which are released into the air by an infected person’s breathing, coughing or sneezing. This pandemic has become a great death threat to the people, even the children too. It’s quite unexpected that some corrupted individuals spread false or fake news to disrupt the social balance. Due to the news misguidance, numerous people have been misled for taking proper care. For this issue, we have analyzed some machine learning techniques, among them, an ensemble method Random forest has gained 90% with the best exactitude. The other models Naive Bayes got 85%, as well as another ensemble method created by Naive Bayes with Support Vector Machine (SVM), gained the exactitude as 88%. Keywords: Coronavirus · Fake news detection Random forest · Naive Bayes
· Ensemble learning ·
There are a huge number of people who have lost their lives, good health and capital in the COVID-19 pandemic situation. The current COVID-19 outbreak is announced by the World Health Organization as a worldwide emergency of public health concern (WHO). The severity of this viral illness is reflected worldwide new figures of 2268011 positive cases (through 18 April 2020) and 155185 reports of death [1]. The danger communication was frequently inadequate during the COVID-19 pandemic. From this perspective “fake news” has spread and a lot of confusion and inconvenience has spread. Spreading fake social media news may have a severe impact, particularly for political, reputational, and financial sectors along with on human society. The robust news media infrastructure on a social network is therefore crucial to automated, false identification of news [2]. The authenticity of the news is not enough based just on the news substance. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 467–476, 2022. https://doi.org/10.1007/978-3-030-93247-3_46
468
P. Ghosh et al.
It must also assess news social characteristics. In this article, we have selected Naive Bayes and voting systems Ensemble Machine Learning methods. Ensemble Learning is a powerful approach to improve model exactness [3]. We’ve tried to have an implementation of COVID - 19 fake news classification by adopting Naive Bayes, Naive Bayes with Support Vector machine ensemble system and Random Forest Machine Learning methods on a merged dataset with multiple preprocessing techniques. The other part of the assessment is organised as the following - the related works in Sect. 1 and methodology in Sect. 2 where the analysis have been described with a distinguishing approach to the exactness of the classifier algorithms respectively. In Sect. 3 the experiential result of this assessment has been discussed. Finally, this work has been terminated with Sect. 4 by the conclusion.
1
Related Works
Hlaing et al. [2] presented a multidimensional fake news dataset. In this study, 77.81% news was categorized as true, 12.87% as false, 3.99% as non-factual data, and the rest as primarily fake. Uppal et al. [4] applied two types models news content model, news models in the social context model with a 6586 sample size and obtained an accuracy of 0.76 F1 and 74.62%. In this work, the Twitter PHEME dataset utilized two-class rumours and non-rumours and adopted GRU, LSTM, Bi-RNN, and CNN methods. All models of NN come with the help of Keras as Conv1D, LSTM, Bi-LSTM, GRU, and Bi-GRU and the results achieved by the micro-average F1 values was Bi-GRU 0.564 [5]. Benamira et al. [6] proposed a semi-supervised graphical model of false news identification based on neural graph networks and focusing on content-based approaches for the detection of forming the problem into a binary text classification. Kaliyar et al. [7] designed a comprehensive neural deep network that can manage not just news item content, but the social networking relationships through tensor factorization method, reflecting the social context of newspaper articles which contains a mixture of information from users, groups, and news with 15257 number of users in BuzzFeed news data. Dong et al. [8] proposed a novel dual-stream attentive model of random forest. For Text Social and AttForest-2 technique, ablation investigations of the data set result was 84.4 %. Ahmad et al. [9] presented many textual characteristics to distinguish between false and actual contents. The accuracy was 99% of Ensemble learners Random forest. In a survey, Among all the analyses Naive Bayes, Random forest, Decision tree, Bi-LSTM etc. performed well on a different dataset [14]. A dataset that contained news from different financial websites was taken for fake financial news detection. They applied Tree LSTM, SVM, CNN-LSTM where CNN-LSTM got the best performance with 92.1% [15]. There are several works for false news detection with good accuracy, most of the datasets are public datasets focused on different categories of news not only COVID-19. From reviewing these papers, we’ve got the concepts and the algorithm selection decision based on their former works’ performance. In our manuscript, we’ve analyzed the ensemble techniques. Our
Fake News Detection of COVID-19 Using Machine Learning Techniques
469
main focus was to classify the COVID-19 fake and real news and tried to achieve the best performance.
2
Methodology
False news during this pandemic created social disturbances. To uproot the disturbance we’ve collected fake news data from the datasets we described, preprocessed them and finally applied the Naive Bayes, an ensemble method Random Forest and another ensemble method Naive Bayes with SVM. The whole analysis has been narrated in Fig. 1.
Start
Dataset Collection
Merge Dataset
Dataset Preprocessing
Apply Classifiers
Naive Bayes
Naive Bayes + SVM
Random Forest
Evaluate Results
Compare Results
End
Fig. 1. Work-flow of the study
2.1
Dataset Selection
Parth Patwa et al. has introduced a COVID -19 fake news and real news dataset where they annotated data manually from 10,700 social media posts and articles [10]. According to Fig. 3, we’ve selected 6420 data for our assumption consisted of 3060 fake news data and the least are the real news data. Another dataset consisting of 9727 fake news data and 7171 real data from different web portals and CBC NEWS that represented in Fig. 2 graphically [11]. We’ve merged two dataset as Fig. 4. In Fig. 2, Fig. 3 and Fig. 4 there was the bar charts plot based on “Value” vs “Count” where column “0” is for “Fake” news and “1” is for “Real” News.
470
P. Ghosh et al.
Fig. 2. The visualization of fake and real news of a dataset.
Fig. 3. The visualization of fake and real news of another dataset.
Fig. 4. The visualization of fake and real news of the final dataset.
2.2
Dataset Prepossessing
At the beginning of data preprocessing, we have observed the dataset and take the following steps. Before preprocessing the dataset generated by Word cloud has been displayed in Fig. 5. Using the different libraries of python, we converted the total text column as lower case using str.lower() function, removed the punctuations with a user defined function. As in the dataset, there were different types of symbols, we’ve removed them with a user-defined function. For the stopwords, the Natural Language Toolkit (NLTK) library has been adopted. A word that can be got as a single unit is called lemma or a word’s lemma. Stemming is an approach of removing the suffix from a specific word, reduce it to its lemma or root word. As an example “Claiming” is a word with the suffix “ing”, if we reduce “ing” then we will get the root word “Claim”. PorterStemmer() function from NLTK has been applied for stemming. At the final observation, there was some emojis and URLs. The URLs existence have been visualized clearly in Fig. 5 of Word cloud. By user-defined function, we’ve removed URLs and emojis.
Fake News Detection of COVID-19 Using Machine Learning Techniques
471
Fig. 5. The text data generated by Word cloud.
2.3
Naive Bayes (NB)
Naive Bayes (NB) is one of the prominent classifiers in Machine Learning (ML). NB works based on Bayes Theory which is conditional probability and statistics theory. For the Bayes Theory, let a hypothesis K and an event M, after getting the evidence P(K/M), the conditional probability is as Eq. 1. P (K/M ) =
P (M/K).P (K) P (M )
(1)
NB considers each feature independent even they are dependent. Getting all the independent features and their properties probability then the classification e happens to put in likelihood. In this assumption, we’ve split the dataset into two parts training and testing data. There were 25% testing data and the seed is 50. Using the LabelEncoder() function, the dependent values have been transformed. Scikit-learn and Tensorflow (TF) are the open-source libraries of ML [12,13]. Putting maximum text data 500, TfidfVectorizer() function of TF vectorized all the text data as a numeric value [13]. Finally, we’ve adopted text data or discrete data suitable classifier Multinomial Naive Bayes model by Scikit-learn [12]. 2.4
Naive Bayes and Support Vector Machine (NB and SVM)
Ensemble methods are the combination of multiple ML models. The ensemble method performs better classification comparing a single ML model. Voting is a part of the ensemble system. The voting technique in classification, classified the class with the most vote which is called hard voting or summing probability. Voting by the highest summing probability is called soft voting. Using the different libraries of python, we converted the total text column as lower case using str.lower() function, removed the punctuations with a user defined function. As in the dataset, there were different types of symbols, we’ve removed them with a user-defined function. For the stopwords, the Natural Language Toolkit (NLTK) library has been adopted. A word that can be got as a single unit is called lemma or a word’s lemma. Stemming is an approach of removing the suffix from a specific word, reduce it to its lemma or root word. As an example “Claiming” is a word with the suffix “ing”, if we reduce “ing” then we will get the root word “Claim”. PorterStemmer() function from NLTK
472
P. Ghosh et al.
has been applied for stemming. At the final observation, there was some emojis and URLs. The URLs existence have been visualized clearly in Fig. 5 of Word cloud. By user-defined function, we’ve removed URLs and emojis. 2.5
Naive Bayes and Support Vector Machine (NB and SVM)
Ensemble methods are the combination of multiple ML models. The ensemble method performs better classification comparing a single ML model. Voting is a part of the ensemble system. The voting technique in classification, classified the class with the most vote which is called hard voting or summing probability. Voting by the highest summing probability is called soft voting (Fig. 6).
Dataset
Naive Bayes
Support Vector Machine
Voting
Final Classification
Fig. 6. Voting ensemble method.
R(y, y ) = exp(−γ||y − y||)
(2)
To control the SVM model complexity degree = 3 has been taken. Finally, we’ve put two models NB and SVM in the function named VotingClassifier() from Sci-kit learn [12]. 2.6
Random Forest (RF)
RF model is one of the ensemble algorithms. It works with the ensemble of decision trees with the voting system for the classification [2]. In the ML decision tree algorithm, the features work as nodes based on the class labels. In RF there construct the decision trees for each sample and get results for each of them. At last, with the voting methods, the best results count as Fig. 7. For this RF model of our manuscript, we’ve used CountVectorizer for vectorization. Then we’ve put n estimators = 10 which determine the maximum number of the decision trees and random state = 0 in RandomForestClassifier() model of the Sci-kit learn library [12]. The complete results analysis of our assessment using the previously described methods have been elaborated as the following.
Fake News Detection of COVID-19 Using Machine Learning Techniques Random Sample Selection 1
Random Sample Selection 2
...
Random Sample Selection n
Decision Tree 1
Decision Tree 2
...
Decision Tree n
473
Voting
Final Classification
Fig. 7. Random forest classification.
3
Experimental Outcomes and Discussions
The complete results analysis of our assessment using the previously described methods have been elaborated as the following. Table 1. The confusion matrix of NB TN = 2766 FP = 424 FN = 461
TP = 2179
In the classification results, the True Positive (TP) stores the correctly classified positive results and similarly True Negative (TN) stores the negative results. On the other hand, the False Positive (FP) takes the incorrectly predicted or classified positive results and the False Negative (FN) takes the incorrectly classified Negative instances. A confusion matrix is a table with 2 types of dimensions, “The Actual” and “The Predicted” and they have - True Positives (TP), True Negatives (TN), False Positives (FP), False Negatives (FN). From Table 1, Table 2 and Table 3 we’ve got the confusion matrix of the following models. The confusion matrix helps us to visualize the proper correct or incorrect classification of the models. Accuracy, Precision, Recall along with the F1-score are depended on the measurement of TP, TN, FP and FN. Accuracy defines the exactness of the models, how the models work correctly as the Eq. 3. Accuracy =
TP + TN TP + FP + FN + TN
(3)
The Accuracy has been represented on Table 4 as NB got 85%, NB+SVM got 88% and RF got the best with 90%. But to get a better observation of the model we need more classification measurements as - precision, recall and f1-score. Precision is a ratio as Eq. 4, that corresponds to positive instances and whole predicted positive instances. For the fake data classification, the precision for NB, NB+SVM and RF are as follows - 0.86, 0.85, 0.88, due to the real data classification 0.84, 0.92 and 0.93.
474
P. Ghosh et al. Table 2. The confusion matrix of NB and SVM TN = 3023 FP = 169 FN = 405
TP = 2233
Table 3. The confusion matrix of RF TN = 3005 FP = 185 FN = 522
TP = 2118
Table 4. The accuracy of the models Models
Accuracy
NB
85%
NB+SVM 88% RF
90%
Table 5. The Analyses of “Fake” value Models
Precision Recall F1-score
NB
0.86
0.87
0.86
NB and SVM 0.85
0.94
0.89
RF
0.95
0.91
0.88
TP (4) TP + FP Another ratio Recall measures how exactly the ML model identifies the true positives. 0.87, 0.94, 0.95 are the “fake” classified Recall values and the rest of the values for the “real” - 0.83, 0.80 and 0.85. In Eq. 5 we’ve got the recall calculation method. P recision =
TP (5) TP + FN F1 score, which is the weighted mean or average of Recall and Precision as the following Eq. 6. Recall =
F 1Score = 2 ∗
precision.recall precision + recall
(6)
In “fake” values classification the F1 score is 0.86, 0.89, 0.91 as well as 0.83, 0.86 and 0.89 are for the “real” value classification. The total view of precision, recall and f1-score have been presented in Table 5 and Table 6.
Fake News Detection of COVID-19 Using Machine Learning Techniques
475
Table 6. The Analyses of “Real” value Models
Precision Recall F1-score
NB
0.84
0.83
0.83
NB and SVM 0.92
0.80
0.86
RF
0.85
0.89
0.93
We’ve merged two datasets here, there is no former work on this dataset. As it’s not possible to compare with previous work fairly, it’s not needed for our study.
4
Conclusion
The performance of these above-described models are satisfactory but this dataset is inadequate. Deploying the model in the future in a better manner we need more data. In this analysis reviewing from different aspects, Random Forest (RF) conducts the best performance comparing with the other models. It’s been cleared that Random Forest (RF) achieved the best exactitude with 90% among the Naive Bayes (NB) as well as the ensemble model of Naive Bayes and Support Vector Machine (NB and SVM). Not only the best accuracy but also the f1-score is also decent with 0.89 and 0.91. This model is at the earliest stage of our research, we are currently working to raise the exactness of the model.
References 1. Pradhan, D., Biswasroy, P., Kumar Naik, P., Ghosh, G., Rath, G.: A review of current interventions for COVID-19 prevention. Arch. Med. Res. 51(5), 363–374 (2020). https://doi.org/10.1016/j.arcmed.2020.04.020 2. Hlaing, M., Kham, N.: Defining news authenticity on social media using machine learning approach. In: 2020 IEEE Conference on Computer Applications (ICCA). IEEE (2021) 3. Islam, M., Raihan, M., Aktar, N., Alam, M., Ema, R., Islam, T.: Diabetes mellitus prediction using different ensemble machine learning approaches. In: 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT) (2020). https://doi.org/10.1109/icccnt49239.2020.9225551 4. Uppal, A., Sachdev, V., Sharma, S.: Fake news detection using discourse segment structure analysis. In: 2020 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence). IEEE (2020) 5. Kotteti, C., Dong, X., Qian, L.: Rumor detection on time-series of tweets via deep learning. In: MILCOM 2019–2019 IEEE Military Communications Conference (MILCOM). IEEE (2019) 6. Benamira, A., Devillers, B., Lesot, E., Ray, A.K., Saadi, M., Malliaros, F.D.: Semisupervised learning and graph neural networks for fake news detection. In: 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 568–569. IEEE, August 2019
476
P. Ghosh et al.
7. Kaliyar, R.K., Kumar, P., Kumar, M., Narkhede, M., Namboodiri, S., Mishra, S.: DeepNet: an efficient neural network for fake news detection using news-user engagements. In: 2020 5th International Conference on Computing, Communication and Security (ICCCS), pp. 1–6. IEEE, October 2020 8. Dong, M., Yao, L., Wang, X., Benatallah, B., Zhang, X., Sheng, Q.Z.: Dual-stream self-attentive random forest for false information detection. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE, July 2019 9. Ahmad, I., Yousaf, M., Yousaf, S., Ahmad, M.: Fake news detection using machine learning ensemble methods. Complexity 2020, 1–11 (2020). https://doi.org/10. 1155/2020/8885861 10. Patwa, P., et al.: Fighting an Infodemic: COVID-19 Fake News Dataset (2020) 11. Rahman, S., Raihan, M., Akter, L., Raihan, M.: Covid-19 news dataset both fake and real (1.0). Zenodo (2021). https://doi.org/10.5281/zenodo.4722484. Accessed 13 Sept 12. scikit-learn: machine learning in Python - scikit-learn 0.24.2 documentation (2021). emphScikit-learn.orghttps://scikit-learn.org/stable/. Accessed 12 Sept 2021 13. “TensorFlow”, emphTensorFlow (2021). https://www.tensorflow.org/. Accessed 12 Sept 2021 14. Kumar, S., Kumar, S., Yadav, P., Bagri, M.: A survey on analysis of fake news detection techniques. In: 2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS). IEEE (2021) 15. Zhi, X., et al.: Financial fake news detection with multi fact CNN-LSTM model. In: 2021 IEEE 4th International Conference on Electronics Technology (ICET). IEEE (2021)
Sustainable Modelling, Computing and Optimization
1D HEC-RAS Modeling Using DEM Extracted River Geometry - A Case of Purna River; Navsari City; Gujarat, India Azazkhan Ibrahimkhan Pathan1(&), P. G. Agnihotri1, D. Kalyan1, Daryosh Frozan2, Muqadar Salihi1, Shabir Ahmad Zareer1, D. P. Patel3, M. Arshad1, and S. Joseph4 1
Sardar Vallabhbhai National Institute of Technology, Surat 395007, Gujarat, India 2 Dr. S. & S. S. Ghandhy College of Engineering and Technology, Gujarat Technological University, Surat, India 3 Pandit Deendayal Energy University, Gandhinagar 382007, India 4 S.P.B Patel Engineering College, Mehsana, India
Abstract. For any hydraulic modeling, river cross-sections are the main input parameters to create geometry. The research is intended to utilize the 1D hydrodynamic flood modeling approach with the use of HEC-RAS (Hydrological engineering center-River Analysis System) mapper capabilities on the downstream of the Purna River. Earlier the geometry of the river was digitized with the HEC-GeoRAS extension in ARC-GIS. Present research indicates the newly released HEC-RAS version 5.0.4, through which 30 m resolution Cartosat-1 Digital Elevation Model (DEM), projection file, boundary condition were used as an input dataset and steady flow analysis has been carried out for flood modeling. In the present research, river geometry like river centerline, bank line, flow path line, and cross-section were directly digitized in GIS tools in HEC-RAS called RAS mapper. The outcomes of the model are useful for flood disaster authorities to mitigate flood and for forecasting future flooding scenarios. Keywords: Flood modeling
RAS mapper 1D steady flow
1 Introduction Flooding is certainly regarded as the world’s most damaging causes of natural disasters. During rainy seasons (June–September), the Himalayan Rivers cause flooding in 80% of the total flood-affected region in India. In many states of India such as Gujarat, Maharashtra, Andhra Pradesh, West Bengal, and Orissa extreme flooding is witnessed mostly annually during the monsoon season, affecting a tremendous loss in properties and lives. The primary causes of flooding in India are inadequate water systems particularly in the low land depositional area of the basins, inadequate river carrying capacity due to sedimentation, and inadequate flood management techniques. To minimize flood losses, appropriate flood management practices are needed, which in turn requires space-time flux flow variation in 1-D as well as 2-D. Few researchers have © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 479–487, 2022. https://doi.org/10.1007/978-3-030-93247-3_47
480
A. I. Pathan et al.
conducted studies on the hydrodynamic flood modeling with the integration of GIS of the Indian basin river floods [1, 2]. An effective model of the river flood includes a proper river bed representation and the floodplains geometry, with a precise explanation of the model input parameters, for forecasting the magnitude of flow and the flood water level along the river path preciously [3–5]. At present, software techniques have been created and are now being modified to extract river geometry features, which are effective for hydrodynamic modeling based on the GIS database (Schwanghart and Software [6]; Tesfa et al. 2009). Several studies have been conducted which attempt to address bathymetric data shortages in river flood modeling which highly depend on Digital Elevation Model (DEM) GIS integration obtained from remote sensing satellites or other data sets available globally [7–10]. Also, data assimilation techniques are used to recognize synthetic cross-sections similar to river geometry (Roux and Dartus et.al 2012). In the present study, one dimensional flood modeling approach has been utilized with the new version of HEC-RAS, in which RAS mapper has GIS capabilities to extract the geometry of river. Cartosat-1 DEM is used for the modeling, which is available freely at ISRO BHUVAN web site. This research demonstrates the utility of DEM in flood modeling with the integration of GIS. This approach is the advancement of the one dimensional hydrodynamic flood modeling in the region where the scarcity of collection data is a major issue. The novelty of this research include the utilization of freely available satellite images for flood assessment [18].
2 Study Area and Data Required Navsari city is situated on the coastal part of Gujarat near the Arabian Sea. The city is at 20° 32′ and 21° 05′ north latitude and 72° 42′ and 73° 30′ east longitude. The topographical area of the city is about 2210.97 km2. The study area map is illustrated in (Fig. 1). Due to heavy precipitation, the water level may rise in the study area and the surrounding area gets inundated annually in monsoon. There is no setup provided by the government in this region to reduce the impact of the floods. The river flow data of the last 20 years are obtained from the Navsari irrigation department. The two major flooding events which took place in the city of Navsari were for the years 1968 and 2004. Cartosat-1 DEM is utilized for the extraction of river geometry which is globally available (www.bhuvan.nrsc.gov.in). The spatial projection needs to be set in Arc GIS for coordinate systems used in HEC-RAS [1, 12]. The 2004 year flood data is required for validating the model.
3 Method and Material HEC-RAS mapper is a GIS function capable of collecting GIS information such as the centerline of the river, bank lines, flow direction lines, and cross-section lines by river digitalization. The following are presented input parameters for one dimensional flood modeling in the HEC-RAS mapper. The flow chart of the region of study is presented in (Fig. 2).
1D HEC-RAS Modeling Using DEM Extracted River Geometry
Fig. 1. Location map of the study area
Fig. 2. Methodology flowchart
481
482
3.1
A. I. Pathan et al.
River Geometry Extraction
To build the river alignment within the river reach, a light blue colour line which shows the river center line is shown in (Fig. 3), which flows from upstream to downstream. To separate the primary river from the left and right banks of the floodplain, the red colour lines show the river bank lines. To regulate the flow of the river, flow path lines are digitized presented by the red colour shown in (Fig. 3). The Green colour line shows the elevation data which is extracted from DEM, which is perpendicular to river flow.
Fig. 3. River geometry extraction
3.2
One Dimensional Flood Modeling Using HEC-RAS (Hydrological Engineering Center-River Analysis System
HEC-RAS is software that describes water hydraulics flowing through common rivers and other channels. It is a computer-based modeling program for water moving through open channel systems and calculating profiles of water surfaces. HEC-RAS identifies specific applications viable in floodplain mitigation measures [13, 14]. Saint Venant’s equation is utilized in HEC-RAS to solve the energy equation for one-dimensional hydrodynamic flood modeling [15, 16] expressed as, Z2 þ Y2 þ
a2 V22 a1 V21 ¼ Z1 þ Y1 þ þ he 2g 2g
Where, Y1, Y2 indicates the water depth at cross-sections, Z1, Z2 expressed as the elevation of the main river channel,
ð1Þ
1D HEC-RAS Modeling Using DEM Extracted River Geometry
483
a1, a2 demonstrates the velocity weighting coefficients, V1, V2 shows the average velocities, g indicates the acceleration due to gravity, he expresses as energy head loss. 3.3
Execution of 1D Model
Cartosat-1 Dem is downloaded from the ISRO BHUVAN portal for the digitization of river geometry such as river centerline, river bank lines, flow path lines, and crosssection lines with the arrangement of the spatial coordinate system in HEC-RAS through the RAS mapper window along with Google map as shown in (Fig. 4). The extraction of the cross sections presents the station-elevation data through a geometric data window as illustrates in (Fig. 5) and (Fig. 6) represents river geometry in HEC-RAS. River maximum discharge of the 2004 year flood was used as an upstream condition and the normal depth of the Purna river was utilized as a downstream condition in HEC-RAS, the rugosity coefficient is taken as 0.035 (Chow and Maidment, 1985.), Agriculture, barren land, build up- urban, forest, and water body data are taken as land use and land cover, and steady flow analysis is carried out for flood modeling.
Fig. 4. Extraction of River geometry in RAS mapper with google base map
4 Results and Discussion This study is being performed in the lower part of the city. Discharge data for the 2004 year flood events are used to simulate steady flow analysis. Due to data scarcity in the study area and only one gauging site is available, it is mandatory to simulate only 2004 year flood events to verify mode accuracy. Cross section were extracted from DEM provides a good results in the data spares region. Flood modeling approach in such region would be effective during peak flood event (Gichamo et al. 2012). The results from the model for the present study indicates the depth of water at each cross-section. The discharge was measured from the gauge station near Kurel village about 1.5 km from the downstream side. The depth of water is measured corresponding to the
484
A. I. Pathan et al.
Fig. 5. Extracted river geometry in HEC-RAS
discharge of 8836 m3/s for 1D hydrodynamic flood modeling. Results obtained from the simulation indicates that the cross-section number 1 and 2 are quite affected during peak discharge, and cross-section 19 and 20 close to Navsari city were more affected by the flood event. The results which are simulated show that the water lever at cross-section one and cross-section twenty (Fig. 6). The water level at the downstream part of the study area demonstrates that the people surrounding cross-section number twenty suffer more during peak discharge and there are lots of property losses and lives during the 2004 year flood events. Figure 7 indicates simulated the one-dimensional flood depth map for the 2004 year flood event. 1D hydrodynamic flood modeling can be advantageous for the region where the flash flood is a major phenomenon annually [10].
Fig. 6. (a) water level at CS-1; (b) water level at CS-2
1D HEC-RAS Modeling Using DEM Extracted River Geometry
485
Fig. 7. Predicted depth of water for the 2004 years flood event
5 Conclusion The present study shows the applicability of the HEC-RAS for the river geometry extraction with the application of geospatial techniques (HEC-RAS mapper function). A 1D hydrodynamic flood modeling approach was presented using Cartosat-1 DEM on the Purna River, Navsari, Gujarat, India. The new version of HEC RAS version 5 was utilized in the present study for GIS applications in flood modeling. River geometry includes: river centerline, bank lines, flow path lines, cross-section cut lines were digitized in RAS mapper tools without ARC GIS being used in the present study. The Validation of the model is being carried out by comparing the observed water depth with the simulated water depth at the location of the gauging site. The output of the model is promising and demonstrate strong potential in the area of data scarcity for using the suggested method. The applicability of open-source datasets would be an effective worldwide approach in flood modelling (Table 1) [17].
486
A. I. Pathan et al.
Table 1. Differences between observed and simulated water depth at Gauge station [17] Satellite
Years
Cartosat-1
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
Observed depth of water (m) 10.9 10.23 10.32 11.64 13.3 10.5 10.96 9.75 5.61 10.9 10.23 13.8
Simulated depth of water (m) 11.3 12 9.92 13.1 12.32 9.5 11.69 7.89 6.3 11.3 12 13.79
Difference (m) 0.01 0.27 −0.22 1.6 0.92 0.37 −1.1 0.02 0.13 −0.06 0.02 0.01
References 1. Khan, A., Pathan, I., Agnihotri, P.G.: 2-D Unsteady Flow Modelling and Inundation Mapping for Lower Region of Purna Basin Using HEC-RAS (2020). Accessed 06 May 2020 2. Vijay, R., Sargoankar, A., Gupta, A.: Hydrodynamic simulation of river Yamuna for riverbed assessment: a case study of Delhi region. Environ. Monit. Assess. 130(1–3), 381– 387 (2007). https://doi.org/10.1007/s10661-006-9405-4 3. Kale, V.S.: Flood studies in India: a brief review. J. Geol. Soc. India 49, 359–370 (1997) 4. Pathan, A.K.I., Agnihotri, P.G.: 2-D unsteady flow modelling and inundation mapping for lower region of Purna basin using HEC-RAS. Nat. Environ. Pollut. Technol. 19, 277–285 (2020) 5. Merwade, V., Cook, A., Coonrod, J.: GIS techniques for creating river terrain models for hydrodynamic modeling and flood inundation mapping. Environ. Model. Softw. 23(10–11), 1300–1311 (2008). https://doi.org/10.1016/j.envsoft.2008.03.005 6. Schwanghart, W., Kuhn, N.J.: TopoToolbox: a set of Matlab functions for topographic analysis. Environ. Model. 5, 770–781 (2010). Accessed 03 Sept 2020 7. Tesfa, T.K., Tarboton, D.G., Watson, D.W., Schreuders, K.A., Baker, M.E., Wallace, R.M.: Extraction of hydrological proximity measures from DEMs using parallel processing. Environ. Model. Softw. 26, 1696–1709 (2011) 8. Abdulkareem, J.H., Pradhan, B., Sulaiman, W.N.A., Jamil, N.R.: Review of studies on hydrological modelling in Malaysia. Model. Earth Syst. Environ. 4(4), 1577–1605 (2018). https://doi.org/10.1007/s40808-018-0509-y 9. Pathan, A.I., Agnihotri, P.G.: A combined approach for 1-D hydrodynamic flood modeling by using Arc-Gis, Hec-Georas, Hec-Ras Interface-a case study on Purna River of Navsari City Gujarat. IJRTE 8, 1410–1417 (2019) 10. Maharjan, L., Shakya, N.: Comparative study of one dimensional and two dimensional steady surface flow analysis. J. Adv. Coll. Eng. Manag. 2, 15 (2016). https://doi.org/10. 3126/jacem.v2i0.16095
1D HEC-RAS Modeling Using DEM Extracted River Geometry
487
11. Roux, H., Dartus, D.: Sensitivity analysis and predictive uncertainty using inundation observations for parameter estimation in open-channel inverse problem. J. Hydraul. Eng. 134, 541–549 (2008). https://doi.org/10.1061/ASCE0733-94292008134:5541 12. Gichamo, T.Z., Popescu, I., Jonoski, A., Solomatine, D.: River cross-section extraction from the ASTER global DEM for flood modeling. Environ. Model. Softw. 31, 37–46 (2012). https://doi.org/10.1016/j.envsoft.2011.12.003 13. Ouma, Y.O., Tateishi, R.: Urban flood vulnerability and risk mapping using integrated multiparametric AHP and GIS: Methodological overview and case study assessment. Water (Switzerland) 6(6), 1515–1545 (2014). https://doi.org/10.3390/w6061515 14. Ahmad, H., Akhtar Alam, M., Bhat, S., Ahmad, S.: One dimensional steady flow analysis using HECRAS – a case of River Jhelum, Jammu and Kashmir. Eur. Sci. J. ESJ 12(32), 340 (2016). https://doi.org/10.19044/esj.2016.v12n32p340 15. Brunner, G.: HEC-RAS River Analysis System. Hydraulic Reference Manual. Version 1.0 (1995). Accessed 07 May 2020 16. Chow, V.T., Maidment, D.R., Larry, W.: Applied Hydrology, International edn. McGrawHill, New York (1988) 17. Pathan, A.I., Agnihotri, P.G.: Application of new HEC-RAS version 5 for 1D hydrodynamic flood modeling with special reference through geospatial techniques: a case of River Purna at Navsari, Gujarat, India. Model. Earth Syst. Environ. 7(2), 1133–1144 (2021) 18. Pathan, A.I., Agnihotri, P.G., Patel, D., Prieto, C.: Identifying the efficacy of tidal waves on flood assessment study—a case of coastal urban flooding. Arab. J. Geosci. 14(20), 1–21 (2021)
A Scatter Search Algorithm for the Uncapacitated Facility Location Problem Telmo Matos(&) CIICESI, Escola Superior de Tecnologia e Gestão, Politécnico do Porto, Porto, Portugal [email protected]
Abstract. Facility Location Problems (FLP) are complex combinatorial optimization problems whose general goal is to locate a set of facilities that serve a particular set of customers with minimum cost. Being NP-Hard problems, using exact methods to solve large instances of these problems can be seriously compromised by the high computational times required to obtain the optimal solution. To overcome this difficulty, a significant number of heuristic algorithms of various types have been proposed with the aim of finding good quality solutions in reasonable computational times. We propose a Scatter Search approach to solve effectively the Uncapacitated Facility Location Problem (UFLP). The algorithm was tested on the standard testbed for the UFLP obtained state-of-theart results. Comparisons with current best-performing algorithms for the UFLP show that our algorithm exhibits excellent performance. Keywords: UFLP
Scatter Search FLP
1 Introduction Facility Location Problems are widely studied problems in the literature with several practical applications, reaching areas such as telecommunications, design of a supply chain management, transport utilities and water distribution networks. A well-known variant of this problem is the Uncapacitated Facility Location Problem (UFLP). This problem can be formulated as: Minimize s:t:
Xm Xn
C x þ j¼1 ij ij
i¼1
Xm
x i¼1 ij
Xm i¼1
F i yi
¼ 1 8 j ¼ 1; . . .; n
xij yi 8 j ¼ 1; . . .; n
ð1Þ ð2Þ
i ¼ 1; . . .; m
ð3Þ
xij 0 8 j ¼ 1; . . .; n i ¼ 1; . . .; m
ð4Þ
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 488–494, 2022. https://doi.org/10.1007/978-3-030-93247-3_48
A Scatter Search Algorithm for the Uncapacitated Facility Location Problem
489
yi 2 f0; 1g 8 i ¼ 1; . . .; m
ð5Þ
Where m represents the number of possible locations to open a facility and n the number of costumers to be served. F i indicates the fixed cost for opening a facility at location i. Cij represents the unit shipment cost between a facility i and a costumer j. The continuous variable xij represents the amount sent from facility i to costumer j and yi indicates if facility i is open (or not). The objective is to locate a set of facilities in such way that the total sum of the costs for opening those facilities and the transportation costs for serving all costumers is minimized. The UFLP problem has been widely studied for the past 50 years with the development of exact and heuristics methods. Such examples are the well-known Tabu Search [1–3], where some algorithms are quite similar, presenting some differences in flexible memory [2] to preserve facilities switching movements and to intensify the search in a more promising region, gain functions to measure the attractiveness of the movement [1] and even a procedure to create a starting solution embedded in a traditional Tabu Search [4, 5] procedure. Recent algorithms use information obtained by combining two or more heuristics producing good quality results in low computational time. These are called hybrid algorithms and there are some examples in literature such as the PBS algorithm [6] (Population-Based Search) proposed by Wayne Pullan (using Genetic Algorithm with a greedy algorithm) and the H-RW [7] algorithm proposed by Resende and Werneck (using Genetic Algorithm, Tabu Search and Scatter Search). A relatively new algorithm named Monkey Algorithm was proposed by Atta et al. [8]. This algorithm is based on the swarm intelligence algorithm and consists in three main processes: an improvement process (clim), a method to accelerate the search for the optimum solution (watch-jump) and a perturbation process (somersault) to avoid falling into previous known solution. The authors compare the proposed algorithm with other recent heuristics (Firefly Algorithm and the Artificial Bee Colony) achieving fair computational results on ORLIB dataset. Traditional Genetic Algorithms [9], Artificial Neural Networks [10, 11], Lagrangean Relaxation algorithms [12] and Dual Ascent procedure [13] are also examples of proposed algorithms to solve the UFLP, achieved good results for well-known instances in the literature. The main contribution of this work is to demonstrate that the proposed SS algorithm is a simple and efficient procedure for solving the UFLP and can be applied to other Facility Location Problems. The rest of the paper is organized as follows. The methodology including the proposed algorithm for solving the UFLP are described in Sect. 2. Experimental results are presented and discussed in Sect. 3. Finally, Sect. 4 completes this paper, showing the conclusions and future directions of search.
490
T. Matos
2 Scatter Search for the UFLP The Scatter Search (SS) is an evolutionary metaheuristic proposed by Glover [14]. The SS method combines solution vectors, aiming to capture information that cannot be obtained separately in the original vectors, and uses auxiliary heuristic approaches to evaluate the produced combinations and generate new solution vectors. The population of solutions is always evolving in successive generations, as expected in an evolutionary method. The solution population is represented by a set of reference solutions whose formation require good quality solutions and diversified solutions. The method proceeds to the combination of solutions generating new ones, through the application of weighted linear combinations of subsets of solutions that are later treated by auxiliary heuristic procedures. The SS is an evolutionary method that uses techniques of intensification, diversification, and a combination of solutions. In this algorithm, components are combined, making this algorithm a very robust method with many applications not only in the FLP but also in other areas of Operational Research. The proposed algorithm to solve the UFLP problem is given in Fig. 1 and makes use of the metaheuristic Scatter Search (SS). The procedure of the SS starts with a set of initial solutions (seeds). Then the algorithm tries to produce a large number of random solutions with different characteristics from the seeds (diversification generation method). A local search procedure (improvement method) is applied to each of the solutions (and seeds) to improve them. These improved solutions (and seeds) form the population. Then, a small size population (reference set) consisting of elite and diversified solutions (to force diversity) is obtained (forming the reference set update method). A subset of solutions is defined (subset generation method) and combined with solutions (usually in pairs) of that smaller population (combination method), obtaining new solutions that are improved (improvement method) again. Then, these solutions will populate the reference set (reference set update method). The process is repeated until no new solutions are found in the reference set. The algorithm can be divided into two phases: The initial phase and the scatter search phase. The overall procedure that encompasses the two phases is the following: Initial Phase 1. 2. 3. 4.
Starts by producing solutions in the solutions generation method. Apply an improvement method to the solutions obtained in 1. Calls the reference set (refset) update method with improved solutions obtained in 2. If the desired quality level is not reached go to 1.
Scatter Search Phase 5. 6. 7. 8. 9. 10.
Calls the subset generation method upon the reference set. Obtain new solutions through the combination method. Apply an improvement method to the solutions obtained in 6. Calls the reference set update method. If new solutions are obtained go to 5. Terminate.
A Scatter Search Algorithm for the Uncapacitated Facility Location Problem
491
Fig. 1. Scatter Search framework.
The diversification generation method aims to generate diversified solutions so that the algorithm can go through several alternative solutions with the objective of finding a good solution. A pseudo-random method is used to generate the random solutions so that at each new execution of the algorithm, the random solutions are always the same. The improvement method will play a crucial role in the algorithm, since the more robust and efficient the method, the better results will be found later. The method is based on the Tabu Search approach proposed by L. Michel and P. V. Hentenryck [1] and will be applied to diversified solutions and solutions obtained after the solutions combination method. The method for creating and updating the reference set consists of two subsets. One subset is made by improved solutions, and another is based on diversified solutions. Both subsets are based on the reference set (Refset). Next, the subset generation method is called. This method specifies how subsets are selected to proceed with the application of the combination method. The main objective of the method is to find a balance between diversification and intensification, that is, the choice of the number of diversified solutions and the improved solutions to carry out the combination of the solutions. If we want to diversify the solutions, we will have to
492
T. Matos
include more solutions from the reference set of diversified solutions. If we intend to intensify, we will have to include more from the reference set of improved solutions. The solutions combination combines solutions to diversify and intensify the search. The method will have a great influence on finding the final solution because the more finely tuned the balance between diversification and intensification, the probability of finding better solutions will increase substantially. The method will combine the solutions based on ratios so that each solution has a different weight (this weight is based on the objective value of the solution). This ratio will also allow the application of diversification and intensification techniques.
3 Results The performance of the proposed algorithm was tested on a well-known benchmark producing extremely competitive results. The SS algorithm was coded in C and the experiments were conducted on an Intel® Pentium(R) CPU G645 @ 2.90 GHz - 8 GB RAM and gcc compiler (using optimization) and Ubuntu operating system. The table below (Table 1) summarizes the instances considered in our computational experiments. The reference for all instances considered in this paper could be obtained in the work of Resende [7]. Table 1. List of all instances used to compare the proposed algorithm Instance name #Instance Orlib 15 Galvão Raggi (GR) 50 FPP11* 30 FPP17* 30 Bilde Krarup (BK) 220 PCodes 32 Total 377 *In caparison results with other algorithms, we group these instances as one.
The proposed algorithm was compared with Pullan [6] (AMD Opteron 252 2.6 GHz 4 GB) population-based algorithm, H-RW [7] (SGI Challenge with 28 196MHz MIPS R10000 processors - only one processor has used) hybrid algorithm and Michel and Van Hentenryck’s [1] (850 MHz Intel Pentium III running Linux) tabu search algorithm (we refer this method as TS). Some instances could not be compared due to a lack of comparison results. The table below (Table 2) shows the results obtained with SS, Pullan, H-RW and TS for common instances. The table below presents the average computational time for each of the different algorithms. As all authors use different machines to process their
A Scatter Search Algorithm for the Uncapacitated Facility Location Problem
493
algorithms, it is not possible to make a direct comparison regarding this parameter. In Þ 100 (in the following table, the column GAP presents the gap computed as ðUBZ Z which Z denotes the optimal solution) and CPU presents the computational time (in seconds) needed to achieve the best upper bound (UB). Table 2. SS results and comparison with other algorithms Classes
Pullan DEV Orlib 0.000 Galvão Raggi (GR) 0.000 FPP 0.005 Bilde Krarup (BK) 0.000 PCodes 0.000
CPU 0.010 0.010 8.210 1 as using information beyond the previous value xt to determine xt+1 . In practice we usually work with optimizers that define dynamical systems in which l(x(t)) and l(xt ) get closer to the minimum value of l as t increases. Given l : Rn → R we can construct the gradient descent optimizer d(x) = −∇l(x) and the Newton’s method optimizer d(x) = −∇2 (l(x))−1 ∇l(x), both with dimension 1. We can also construct the momentum optimizer d(x, y) = (y, −y − ∇l(x)) with dimension 2.
Generalized Optimization
2.1
527
Optimization Schemes
Definition 2. An optimization scheme u : (Rn → R) → (Rkn → Rkn ) is an n-indexed family of maps from objectives l : Rn → R to optimizers d : Rkn → Rkn . For example, the gradient descent optimization scheme is u(l)(x) = −∇l(x) and the momentum optimization scheme is u(l)(x, y) = (y, −y − ∇l(x)). In some situations we may be able to improve the convergence rate of the dynamical systems defined by optimization schemes by precomposing an invertible function f : Rm → Rn . That is, rather than optimize the function l : Rn → R we optimize l ◦ f : Rm → R. However, for many optimization schemes there are classes of transformations to which they are invariant: applying any such transformation to the data cannot change the trajectory. Definition 3. Suppose f : Rm → Rn is an invertible transformation and write fk for the map (f × f × · · · ) : Rkm → Rkn . The optimization scheme u is invariant to f if u(l ◦ f ) = fk−1 ◦ u(l) ◦ fk . Proposition 1. Recall that an invertible linear transformation is a function f (x) = Ax where the matrix A has an inverse A−1 and an orthogonal linear transformation is an invertible linear transformation where A−1 = AT . Newton’s method is invariant to all invertible linear transformations, whereas both gradient descent and momentum are invariant to orthogonal linear transformations. Proof. First, we will show that the Newton’s method optimizer scheme N EW (l)(x) = −(∇2 l(x))−1 ∇l(x) is invariant to invertible linear transformations. Consider any function of the form f (x) = Ax where A is invertible. We have: 2
N EW (l ◦ f )(x) = −(∇ (l ◦ f )(x)) −A
−1
2
(∇ l(Ax))
−1
−1
∇l(Ax) = −f
∇(l ◦ f )(x) = −A
−1
2
((∇ l(f (x)))
−1
−1
2
(∇ l(Ax))
−1
∇l(f (x))) = f
A
−1
−T
T
A ∇l(Ax) =
(N EW (l)(f (x)))
Next, we will show that the gradient descent optimizer scheme GRAD(l)(x) = ∇l(x) is invariant to orthogonal linear transformations, but not to linear transformations in general. Consider any function of the form f (x) = Ax where A is an orthogonal matrix. Then the following holds only when AT = A−1 : T
GRAD(l ◦ f )(x) = −∇(l ◦ f )(x) = −A (∇l(Ax)) = −A
−1
(∇l(Ax)) = −f
−1
(GRAD(l)(f (x)))
Next, we will show that the momentum optimizer scheme M OM (l)(x, y) = (y, y + ∇l(x)) is also invariant to orthogonal linear transformations, but not to linear transformations in general. Consider any function of the form f (x) = Ax where A is an orthogonal matrix. Then the following holds only when AT = A−1 : T
M OM (l ◦ f )(x, y)x = y = A Ay = f M OM (l ◦ f )(x, y)y = −y − ∇(l ◦ f )(x)) = −A
−1
−1
(M OM (l)(f (x), f (y)))x T
Ay − A ∇l(Ax)) = f
−1
(M OM (l)(f (x), f (y)))y
528
D. Shiebler
In order to interpret these invariance properties it is helpful to consider how they affect the discrete dynamical system defined by an optimization scheme. Proposition 2. Given an objective function l : Rn → R and an optimization scheme u : (Rn → R) → (Rkn → Rkn ) that is invariant to the invertible linear function f : Rm → Rn , the system yt+1 = yt + αu(l ◦ f )(yt ) cannot converge faster than the system xt+1 = xt + αu(l)(xt ). Proof. Consider starting at some point x0 ∈ Rkn and repeatedly taking Euler steps xt+α = xt + αu(l)(xt ). Now suppose instead that we start at the point y0 = fk−1 x0 and take Euler steps yt+α = yt + αu(l ◦ f )(yt ). We will prove by induction that yt+α = fk−1 (xt+α ), and therefore the two sequences converge at the same rate. The base case holds by definition and by induction we can see that: yt+α = yt + αu(l ◦ f )(yt ) = fk−1 (xt ) + αfk−1 (u(l)(xt )) = fk−1 (xt+α ). Propositions 1 and 2 together give some insight into why Newton’s method can perform so much better than gradient descent for applications where both methods are computationally feasible [2]. Whereas gradient descent can be led astray by bad data scaling, Newton’s method steps are always scaled optimally and therefore cannot be improved by data rescaling. It is important to note that Proposition 2 only applies to linear transformation functions f . Since Euler’s method is itself a linear method, it does not necessarily preserve non-linear invariance properties.
3
Generalized Optimization
In this section we will use Cartesian differential categories [8] and Cartesian reverse derivative categories [3] to generalize standard results on the behavior of gradient descent as well as the results in Sect. 2. Definition 4. An optimization domain is a tuple (Base, X) such that each morphism f : A → B in the Cartesian reverse derivative category Base has an additive inverse −f and each homset C[∗, A] out of the terminal object ∗ is further equipped with a multiplication operation f g and a multiplicative identity map 1A : ∗ → A to form a commutative ring with the left additive structure +. X is an object in Base such that the homset f ∈ C[∗, X] is further equipped with a total order f ≤ g to form an ordered commutative ring. Given an optimization domain (Base, X) the object X represents the space of objective values to optimize and we refer to morphisms into X as objectives. We abbreviate the map 1B ◦!A : A → B as 1AB , where !A : A → ∗ is the unique map into the terminal object ∗. Note that any map f : A → B in Base has the additive inverse −f = (−1AB )f . For example, the objectives in the standard domain (Euc, R) are functions l : Rn → R. Given an ordered commutative ring r we can form the r-polynomial domain (Polyr , 1) in which objectives are r-polynomials lP : n → 1.
Generalized Optimization
529
Definition 5. An objective l : A → X is bounded below in (Base, X) if there exists some x : ∗ → X such that for any a : ∗ → A we have x ≤ l ◦ a. In both the standard and r-polynomial domains an objective is bounded below if its image has an infimum. 3.1
Generalized Gradient and Generalized n-Derivative
Definition 6. The generalized gradient of the objective l : A → X in (Base, X) is R[l]1 : A → A where R[l]1 = R[l] ◦ idA , 1AX . In the standard domain the generalized gradient of l : Rn → R is just the gradient the generalized gradient of lP : R[l]1 (x) = ∇l(x) and in the r-polynomial domain ∂lP ∂lP ∂lP n → 1 is R[lP ]1 (x) = ∂x (x), · · · , (x) where ∂xn ∂xi is the formal derivative 1 of the polynomial lP in xi . Definition 7. The generalized n-derivative of the morphism f : X → A in (Base, X) is Dn [f ] : X → A where D1 [f ] = D[f ] ◦ idX , 1XX and Dn [f ] = D[Dn−1 [f ]] ◦ idX , 1XX . In the standard domain the generalized n-derivative of f : R → R is the n n-derivative f (n) = ∂∂xnf and in the r-polynomial domain the generalized nn derivative of lP : 1 → 1 is the formal n-derivative ∂∂xlnP . The derivative over the reals has a natural interpretation as a rate of change. We can generalize this as follows: Definition 8. We say that a morphism f : X → X in Base is n-smooth in (Base, X) if whenever Dk [f ] ◦ t ≥ 0X : ∗ → X for all t1 ≤ t ≤ t2 : ∗ → X and k ≤ n we have that f ◦ t1 ≤ f ◦ t2 : ∗ → X. f is n-smooth if it cannot decrease on any interval over which its generalized derivatives of order n and below are non-negative. Some examples include: • Any f : R → R is trivially 1-smooth in the standard domain. • When r is a dense subring of a real-closed field then any polynomial lP : 1 → 1 is 1-smooth in the r-polynomial domain [7]. n • For any r, the polynomial lP = k=0 ck tk : 1 → 1 of degree n is n-smooth in the r-polynomial n domain since n for any t1 we can use the binomial n theorem to write lP (t) = k=0 ck tk = k=0 ck (t1 + (t − t1 ))k = lP (t1 ) + k=1 ck (t − t1 )k where ck is a constant such that (ck )(k!) = Dk [lP ](t1 ). Note that ck must exist by the definition of the formal derivative of lP , and must be non-negative if Dk [lP ](t1 ) is non-negative. 3.2
Optimization Functors
In this section we generalize optimization schemes (Sect. 2.1) to arbitrary optimization domains. This will enable us to characterize the invariance properties of our generalized optimization schemes in terms of the categories out of which they are functorial. Given an optimization domain (Base, X) we can define the following categories:
530
D. Shiebler
Definition 9. The objects in the category Objective over the optimization domain (Base, X) are objectives l : A → X such that there exists an inverse −1 −1 function R[l]−1 1 : A → A where R[l]1 ◦ R[l]1 = R[l]1 ◦ R[l]1 = idA : A → A, and the morphisms between l : A → X and l : B → X are morphisms f : A → B where l ◦ f = l. Note that Objective is a subcategory of the slice category Base/X. In the standard domain the objects in Objective are objectives l : Rn → R such that the function ∇l : Rn → Rn is invertible. In the r-polynomial domain, the objects in Objective are r-polynomials lP : n → 1 such that the function ∂lP ∂lP , · · · , ∂x : n → n is invertible. ∂x 1 n Definition 10. A generalized optimizer over the optimization domain (Base, X) with state space A ∈ Base and dimension k ∈ N is an endomorphism d : Ak → Ak in Base. The objects in the category Optimizer over (Base, X) are generalized optimizers, and the morphisms between the generalized optimizers d : Ak → Ak and d : B k → B k are Base-morphisms f : A → B such that f k ◦ d = d ◦ f k : Ak → B k . Note that morphisms only exist between generalized optimizers with the same dimension. The composition of morphisms in Optimizer is the same as in Base. Recall that Ak and f k are respectively A and f tensored with themselves k times. In the standard domain a generalized optimizer with dimension k is a tuple (Rn , d) where d : Rkn → Rkn is an optimizer (Definition 1). Definition 11. Given a subcategory D of Objective, an optimization functor over D is a functor D → Optimizer that maps the objective l : A → X to a generalized optimizer over (Base, X) with state space A. Optimization functors are generalizations of optimization schemes (Definition 2) that map objectives to generalized optimizers. Explicitly, an optimization scheme u that maps l : Rn → R to u(l) : Rkn → Rkn defines an optimization functor in the standard domain. The invariance properties of optimization functors are represented by the subcategory D ⊆ Objective out of which they are functorial. Concretely, consider the subcategory ObjectiveI of Objective in which morphisms are limited to invertible linear morphisms l in Base and the subcategory Objective⊥ of ObjectiveI in which the inverse of l is l† . In both the standard domain and rpolynomial domain, the morphisms in ObjectiveI are linear maps defined by an invertible matrix and the morphisms in Objective⊥ are linear maps defined by an orthogonal matrix (matrix inverse is equal to matrix transpose). We will now generalize Proposition 1 by defining generalized gradient descent and momentum functors that are functorial out of Objective⊥ and a generalized Newton’s method functor that is functorial out of ObjectiveI . Definition 12. Generalized gradient descent sends the objective l : A → X to the generalized optimizer −R[l]1 : A → A with dimension 1 and generalized momentum sends the objective l : A → X to the generalized optimizer π1 , −π1 − (R[l]1 ◦ π0 ) : A2 → A2 with dimension 2.
Generalized Optimization
531
Generalized momentum and generalized gradient descent have a very similar structure, with the major difference between the two being that generalized momentum uses a placeholder variable and generalized gradient descent does not. In the standard domain we have that −R[l]1 (x) = −∇l(x) and (π1 , −π1 − (R[l]1 ◦ π0 ) )(x, y) = (y, −y − ∇l(x)), so generalized gradient descent and generalized momentum are equivalent to the gradient descent and momentum optimization schemes that we defined in Sect. 2.1. Similarly, in the rpolynomial domain generalized gradient descent maps lP : n → 1 to −R[lP ]1 : n → n. Since Newton’s method involves the computation of an inverse Hessian it is not immediately obvious how we can express it in terms of Cartesian reverse derivatives. However, by the inverse function theorem we can rewrite the inverse Hessian as the Jacobian of the inverse gradient function, which makes this easier. That is: (∇2 l)(x)−1 = J∇l (x)−1 = J(∇l)−1 (∇l(x)) where J∇l (x) = (∇2 l)(x) is the Hessian of l at x, J(∇l)−1 (∇l(x)) is the Jacobian of the inverse gradient function evaluated at ∇l(x), and the second equality holds by the inverse function theorem. We can therefore generalize the Newton’s method term −∇2 (l)−1 ∇l as −R[R[l]−1 1 ] ◦ R[l]1 , R[l]1 : X → X and generalize Newton’s method as follows: Definition 13. Generalized Newton’s method sends l : A → X to the generalized optimizer −R[R[l]−1 1 ] ◦ R[l]1 , R[l]1 : A → A with dimension 1. In the r-polynomial domain generalized Newton’s Method maps the polynomial lP : n → 1 to −R[R[lP ]−1 1 ] ◦ R[lP ]1 , R[lP ]1 : n → n. We can now present the main result in this section, which is a generalization of Proposition 1: Proposition 3. Generalized Newton’s method is a functor from ObjectiveI to Optimizer, whereas both generalized gradient descent and generalized momentum are functors from Objective⊥ to Optimizer. Proof. Since generalized gradient descent, generalized momentum and generalized Newton’s method all act as the identity on morphisms, we simply need to show that each functor maps a morphism in its source category to a morphism in its target category. First we show that generalized Newton’s method N EW (l) = R[R[l]−1 1 ] ◦ R[l]1 , R[l]1 is a functor out of ObjectiveI . Given an objective l : A → X and an invertible linear map f : B → A we have: ∗ f ◦ N EW (l ◦ f ) = −f ◦ R[R[l ◦ f ]−1 1 ] ◦ R[l ◦ f ]1 , R[l ◦ f ]1 = −† −f ◦ f −1 ◦ R[R[l]−1 × f −† ) ◦ f † ◦ R[l]1 ◦ f, f † ◦ R[l]1 ◦ f = 1 ] ◦ (f −† −R[R[l]−1 ◦ f † ◦ R[l]1 ◦ f, f −† ◦ f † ◦ R[l]1 ◦ f = 1 ] ◦ f
−R[R[l]−1 1 ] ◦ R[l]1 , R[l]1 ◦ f = N EW (l) ◦ f where ∗ holds by: −1
∗∗
R[R[l ◦ f ]1 ] =
R[f
−1
−1
◦ R[l]1
◦f
−†
] = R[f
−†
] ◦ (idB × R[f f
f f
−1
−1
−1
◦ R[R[l]1 ] ◦ (idA × R[f −1
◦ R[R[l]1 ] ◦ (idA × f
−†
−1
) ◦ f
−1
−1
−1
◦ R[l]1 ]) ◦ π0 , f
◦ R[f
−1
−1
]) ◦ π0 , R[l]1
−†
◦ π0 , π1 = f
−†
◦ π0 , π1 =
◦ f
−†
◦ π0 , π1 =
◦ π0 , π1 ◦ f
−†
◦ π0 , π1 =
◦
−1
−1 R[l]1 ]
−1
◦ R[R[l]1 ] ◦ (f
−†
×f
−†
)
532
D. Shiebler
and where ∗∗ holds by: −1 −1 −† ◦ R[l]−1 ] ◦ (1A × idB ) = f −1 ◦ R[l]−1 R[l ◦ f ]−1 1 =f 1 ◦ R[f 1 ◦f
Next we show that generalized gradient descent GRAD(l) = (1, A, R[l]1 ) is a functor out of Objective⊥ . Given an objective l : A → X and an invertible linear map f : B → A where f ◦ f † = idA and f † ◦ f = idB we have: f ◦ GRAD(l ◦ f ) = −f ◦ R[l ◦ f ]1 = −f ◦ R[l ◦ f ] ◦ idB , 1BX = −f ◦ R[f ] ◦ (idB × R[l]1 ) ◦ idB , f = −f ◦ f † ◦ π1 ◦ (idB × R[l]1 ) ◦ idB , f = −π1 ◦ (idB × R[l]1 ) ◦ idB , f = −R[l]1 ◦ f = GRAD(l) ◦ f Next we show that generalized momentum M OM (l) = (1, A, π1 , π1 + (R[l]1 ◦ π0 ) ) is a functor out of Objective⊥ . Given an objective l : A → X and an invertible linear map f : B → A where f ◦ f † = idA and f † ◦ f = idB we have: f 2 ◦ M OM (l ◦ f ) = (f × f ) ◦ M OM (l ◦ f ) = (f × f ) ◦ π1 , −π1 − (R[l ◦ f ]1 ◦ π0 ) = = f ◦ π1 , f ◦ (−π1 − (R[l ◦ f ]1 ◦ π0 )) = f ◦ π1 , −f ◦ π1 − (f ◦ R[l ◦ f ]1 ◦ π0 ) = f ◦ π1 , −f ◦ π1 − (R[l]1 ◦ f ◦ π0 ) = π1 , −π1 − (R[l]1 ◦ π0 ) ◦ (f × f ) = M OM (l) ◦ (f × f ) = M OM (l) ◦ f 2
Proposition 3 implies that the invariance properties of our optimization functors mirror the invariance properties of their optimization scheme counterparts. Not only does Proposition 3 directly imply Proposition 1, but it also implies that the invariance properties that gradient descent, momentum, and Newton’s method enjoy are not dependent on the underlying category over which they are defined. 3.3
Generalized Optimization Flows
In Sect. 2 we demonstrated how we can derive continuous and discrete dynamical systems from an optimizer d : Rkn → Rkn . In this section we extend this insight to generalized optimizers. To do this, we define a morphism s : X → Ak whose Cartesian derivative is defined by a generalized optimizer d : Ak → Ak . Since we can interpret morphisms in Base[∗, X] as either times t or objective values x, the morphism s : X → Ak describes how the state of our dynamical system evolves in time. Formally we can put this together in the following structure: Definition 14. A generalized optimization flow over the optimization domain (Base, X) with state space A ∈ Base and dimension k ∈ N is a tuple (l, d, s, τ ) where l : A → X is an objective, d : Ak → Ak is a generalized optimizer, s : X → Ak is a morphism in Base and τ is an interval in Base[∗, X] such that for t ∈ τ we have d ◦ s ◦ t = D1 [s] ◦ t : ∗ → Ak .
Generalized Optimization
533
Intuitively, l is an objective, d is a generalized optimizer, and s is the state map that maps times in τ to the system state such that d ◦ s : X → Ak describes the Cartesian derivative of the state map D1 [s]. In the standard domain we can define a generalized optimization flow (l, d, s, R) from an optimizer d : Rkn → Rkn and an initial state s0 ∈ Rkn by t defining a state map s : R → Rkn where s(t) = s0 + 0 d(s(t ))dt . We can think of a state map in the standard domain as a simulation of Euler’s method with infinitesimal α. Definition 15. A generalized optimization flow (l, d, s, τ ) over the optimization domain (Base, X) is an n-descending flow if for any t ∈ τ and k ≤ n we have Dk [l ◦ π0 ◦ s] ◦ t ≤ 0X : ∗ → X. Note that if (l, d, s, τ ) is an n-descending flow and l ◦ π0 ◦ s : X → X is n-smooth (Definition 8), then l ◦ π0 ◦ s must be monotonically decreasing in t on τ . Definition 16. The generalized optimization flow (l, d, s, τ ) over the optimization domain (Base, X) converges if for any δ > 0X : ∗ → X there exists some t ∈ τ such that for any t ≤ t ∈ τ we have −δ ≤ (l ◦ π0 ◦ s ◦ t ) − (l ◦ π0 ◦ s ◦ t) ≤ δ In the standard domain this reduces to a familiar definition of convergence [1]: a flow converges if there exists a time t after which the value of the objective l does not change by more than an arbitrarily small amount. Now suppose (l, d, s, τ ) is an n-descending flow, l ◦ π0 ◦ s : X → X is nsmooth and l is bounded below (Definition 5). Since l ◦ π0 ◦ s must decrease monotonically in t it must be that (l, d, s, τ ) converges. In the next section we give examples of optimization flows defined by the generalized gradient that satisfy these conditions. 3.3.1
Generalized Gradient Flows
Definition 17. A generalized gradient flow is a generalized optimization flow of the form (l, −R[l]1 , s, τ ). Given a smooth objective l : Rn → R an example generalized gradient flow in t the standard domain is (l, −∇l, s, R) where s(t) = s0 + 0 −∇l(s(t ))dt for some s0 ∈ Rn . One of the most useful properties of a generalized gradient flow is that we can write its Cartesian derivative with an inner product-like structure: Proposition 4. Given a choice of time t ∈ τ and a generalized gradient flow (l, −R[l]1 , s, τ ) we have D1 [l ◦ π0 ◦ s] ◦ t = −R[l]†st ◦ R[l]st ◦ 1X : ∗ → X where R[l]st = R[l] ◦ s ◦ t◦!X , idX : X → A.
534
D. Shiebler
Proof. D1 [l ◦ s] ◦ t = D[l ◦ s] ◦ t, 1X = D[l] ◦ s ◦ π0 , D[s] ◦ t, 1X = D[l] ◦ s, D[s] ◦ idX , 1X
◦ t = D[l] ◦ s, d ◦ s ◦ t = D[l] ◦ s, −R[l] ◦ idA , 1AX ◦ s ◦ t = −D[l] ◦ s, R[l] ◦ s, 1X
◦ t = −π1 ◦ R[R[l]] ◦ ( idA , 1AX × idA ) ◦ s, R[l] ◦ s, 1X
◦ t = −π1 ◦ R[R[l]] ◦ s, 1X , R[l] ◦ s, 1X
◦ t = −π1 ◦ R[R[l]] ◦ s ◦ t, 1X , R[l] ◦ s, 1X
= −π1 ◦ R[R[l]] ◦ ( s ◦ t, 1X × idA ) ◦ R[l] ◦ s ◦ t, 1X = −π1 ◦ R[R[l]] ◦ ( s ◦ t, 1X × idA ) ◦ R[l]st ◦ 1X = −R[R[l] ◦ s ◦ t◦!X , idX ] ◦ 1X , R[l]st ◦ 1X = †
−(R[l] ◦ s ◦ t◦!X , idX ) ◦ π1 ◦ 1X , R[l]st ◦ 1X = †
†
−(R[l] ◦ s ◦ t◦!X , idX ) ◦ R[l]st ◦ 1X = −R[l]st ◦ R[l]st ◦ 1X
Intuitively, s ◦ t : ∗ → A is the state at time t and R[l]st ◦ 1X : ∗ → A is the value of the generalized gradient of l at time t. To understand the importance of this result consider the following definition: Definition 18. (Base, X) supports generalized gradient-based optimization when any generalized gradient flow over (Base, X) is a 1-descending flow. Intuitively, an optimization domain supports generalized gradient-based optimization if loss decreases in the direction of the gradient. Proposition 4 is important because it helps us identify the optimization domains for which this holds. For example, Proposition 4 implies that both the standard domain and any r-polynomial domain support generalized gradient-based optimization: • In the standard domain −R[l]†st ◦ R[l]st ◦ 1R = − ∇l(s(t)) 2 which must be non-positive by the definition of a norm. As a result, any generalized gradient flow (l, −R[l], s, τ ) in the standard domain converges if l is bounded below. n ∂lP P • In the r-polynomial domain −R[lP ]†st ◦ R[lP ]st ◦ 11 = − i=1 ∂l ∂xi (st ) ∂xi (st ) which must be non-positive since in an ordered ring no negative element is a square. If r is a dense subring of a real-closed field then any generalized gradient flow (l, −R[l], s, τ ) in the r-polynomial domain converges if l is bounded below.
References 1. Ang, A.: Convergence of gradient flow. In: Course Notes at UMONS (2020) 2. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004). ISBN: 0521833787, http://www.amazon.com/exec/obidos/ redirect?tag=citeulike-20%5C&path=ASIN/0521833787 3. Cockett, R., et al.: Reverse derivative categories. arXiv e-prints arXiv:1910.07065 (2019) 4. Cruttwell, G.S.H., et al.: Categorical foundations of gradient-based learning. arXiv e-prints arXiv:2103.01931 (2021). [cs.LG] 5. Elliott, C.: The simple essence of automatic differentiation. In: Proceedings of the ACM on Programming Languages 2.ICFP, pp. 1–29 (2018) 6. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Generalized Optimization
535
7. Nombre: Does the derivative of a polynomial over an ordered ring behave like a rate of change? (2021). https://math.stackexchange.com/q/4170920 8. Robert, A.G., Seely, R.A., Blute, R.F., Cockett, J.R.B.: Cartesian differential categories. Theory Appl. Categories 22(23), 622–672 (2009) 9. Wilson, P., Zanasi, F.: Reverse derivative ascent: a categorical approach to learning boolean circuits. In: Electronic Proceedings in Theoretical Computer Science, vol. 333, pp. 247–260 February 2021. ISSN: 2075–2180, https://doi.org/10.4204/eptcs. 333.17
Analysis of Non-linear Structural Systems via Hybrid Algorithms Sinan Melih Nigdeli1(&), Gebrail Bekdaş1, Melda Yücel1, Aylin Ece Kayabekir2, and Yusuf Cengiz Toklu3 1
Department of Civil Engineering, Istanbul University-Cerrahpaşa, 34320 Avcılar, Istanbul, Turkey {melihnig,bekdas}@iuc.edu.tr, [email protected] 2 Department of Civil Engineering, Istanbul Gelisim University, 34310 Avcılar, Istanbul, Turkey [email protected] 3 Department of Civil Engineering, Istanbul Beykent University, 34398 Sarıyer, Istanbul, Turkey [email protected]
Abstract. Metaheuristic methods are commonly used in the problems which are treating optimization of structural systems as to the topology, shape and size. It is currently shown that metaheuristic methods can also be used in the analysis of structural system by an application of the well-known mechanical principle of minimum energy. This method is called total potential optimization using metaheuristic methods (TPO/MA), and it is shown that this method has certain advantages in dealing with nonlinear problems and with the problems where classical methods including the Finite Element Method has some difficulties. In this paper, a retaining wall example that is generated via plane-strain members is presented and hybrid algorithms using the Jaya algorithm are investigated. The hybrid algorithms may have advantages on the needed iteration number to reach the final value. Keywords: TPO/MA Metaheuristics analysis Optimization
Hybrid algorithms Structural
1 Introduction The basic principle of mechanics defines the minimum total potential energies of the structural systems as the system equilibrium position. As it is known, the system potential energy is equal to the sum of the energy created by the external effects and the deformation energy formed in the system due to the effects. In other words, as a result of external effects, the system comes to a deformation state (equilibrium position) which will minimize its total potential energy. Analysis of structural systems can be defined as the process of finding structural deformations and their internal effects according to this basic principle of mechanics. For this purpose, from past to present, design engineers have developed and used various numerical methods and approaches for the analysis of structural systems. The general approach in these methods is to © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 536–545, 2022. https://doi.org/10.1007/978-3-030-93247-3_53
Analysis of Non-linear Structural Systems via Hybrid Algorithms
537
determine the matrices to define the structural system and the loads, and to determine the displacements of the structural system, and to obtain the cross-section effects by using these displacements. Although this approach gives sufficient approximation results in linear analysis, it may not be very effective in nonlinear analysis. For this reason, it may be necessary to use a method based on iterative analysis for the analysis of nonlinear systems. In recent years, a method that is the same as the current analysis methods in terms of being based on energy theorems, but completely different in terms of analysis approach has been proposed by Toklu [1]. The method can be defined as a minimization process based on finding the system deformation state where the total potential energy is minimum with the help of metaheuristic algorithms. In other words, the method is an optimization process in which the design variables are displacements and the objective function is defined as the minimum system energy. Compared to the existing numerical methods, the solution of the system displacements defined as unknown in the current methods is based on mathematical approaches, while TPO/MA iteratively determines the minimum situation among various randomly defined displacement situations. That is, displacements, which are the unknowns of the existing methods, are defined in the TPO/MA in the first step through an optimization process. With this approach, TPO/MA provides a great convenience compared to existing methods, as well as being very effective in terms of considering the nonlinear behavior of the system without requiring any additional processing. Despite this superiority in TPO/MA analysis, since it is an iterative minimization process based on a metaheuristic algorithm, it caused the analysis of some systems to be somewhat long in terms of computation time. However, especially in recent years, developing processor technology and effective algorithms have enabled the method to perform close to existing methods in terms of computation time. Scientific studies on the TPO/MA method have proven that it is a high-performance, easy-to-apply and effective method, whether the system is linear or nonlinear. Among the mentioned scientific studies, it is seen that various structural systems such as trusses, cables, tensegrics, plates and various situations related to these systems are discussed [1–11]. In this study, the analysis of systems consisting of plane strain elements with TPO/MA is presented. In the study, linear and non-linear state analyzes of a retaining wall sample defined by plate elements were performed. As metaheuristic algorithms, flower pollination and Jaya algorithms have been used, which have been proven effective by scientific studies. Some modifications and hybrid methods have also been developed to improve the algorithms in terms of analysis computation time. In the tests, it was understood that the proposed modification and hybrid algorithms gave very effective results in terms of minimization process calculation time.
2 The Plane-Strain Members The plates problems can be considered as a structural system that is generated via triangular elements given in Fig. 1. In the figure, u(x, y) and v(x, y) are displacement fields in x and y directions respectively. Considering the linear variation in displacements, these displacement fields in the x-y plane can be defined as
538
S. M. Nigdeli et al.
uðx; yÞ ¼ ui þ C 1 x þ C 2 y
ð1Þ
vðx; yÞ ¼ vi þ C 3 x þ C 4 y
ð2Þ
where ui and vi are displacements at the node symbolized with i. The constants symbolized with C1, C2, C3 and C4 can be obtained by partial derivatives of Eqs. (1) and (2) with respect to x and y as follows:
cxy ¼
ex ¼
@u ¼ C1 @x
ð3Þ
ey ¼
@v ¼ C4 @y
ð4Þ
@u @v þ ¼ C2 þ C3 @y @x
ð5Þ
in which the normal strain in x-direction, normal strain in y-direction and shear strain are symbolized with ex, ey and cxy respectively. Nodal displacements at the nodes named i, j and k can be calculated with Eqs. (6)–(8). uð0; 0Þ ¼ ui ; vð0; 0Þ ¼ vi
ð6Þ
uðaj ; bj Þ ¼ uj ; vðaj ; bj Þ ¼ vj
ð7Þ
uðak ; bk Þ ¼ uk ; vðak ; bk Þ ¼ vk
ð8Þ
y ak k j bk bj x i
aj
Fig. 1. Triangular elements used in the generation of plate systems.
Analysis of Non-linear Structural Systems via Hybrid Algorithms
539
Considering Eqs. (1) and (2), the relation of these nodal displacements can be written as 2
3 2 aj uj 6 vj 7 6 0 6 7¼6 4 uk 5 4 ak 0 vk
bj 0 bk 0
0 aj 0 ak
32 3 2 3 ui C1 0 6 C 2 7 6 vi 7 bj 7 76 7 þ 6 7 0 54 C 3 5 4 u i 5 bk C4 vi
ð9Þ
and by the solution of this equation, C1, C2, C3 and C4 can be formulized as in Eqs. (10)–(13). The strain energy density (e) for an elastic body with two dimensions can be determined with Eq. (14). By substituting Eqs. (10)–(13) in Eqs. (3)–(5), the stresses are obtained as in Eqs. (15)–(17) for linear cases, where E and m express elasticity modulus and Poisson’s ratio respectively. C1 ¼
bk ðuj ui Þ bj ðuk ui Þ þ aj bk ak bj ak bj aj bk
ð10Þ
C2 ¼
ak ðuj ui Þ aj ðuk ui Þ þ ak bj aj bk aj bk ak bj
ð11Þ
C3 ¼
bk ðvj vi Þ bj ðvk vi Þ þ aj bk ak bj ak bj aj bk
ð12Þ
C4 ¼
ak ðvj vi Þ aj ðvk vi Þ þ ak bj aj bk aj bk ak bj
ð13Þ
1 rde ¼ ðrx ex þ ry ey þ sxy cxy Þ 2 e¼0
ð14Þ
rx ¼
E ðð1 mÞex þ mey Þ ð1 þ mÞð1 2mÞ
ð15Þ
ry ¼
E ðmex þ ð1 mÞey Þ ð1 þ mÞð1 2mÞ
ð16Þ
E 1 2m ð cxy Þ ð1 þ mÞð1 2mÞ 2
ð17Þ
Z e¼
e
sxy ¼
Considering nonlinear stress–strain relations, formulation of normal stress in xdirection (rx), normal stress in y-direction (ry) and shear stress (sxy) can be written as rx ¼
E ðð1 mÞex þ mey Þ3 ð1 þ mÞð1 2mÞ
ð18Þ
ry ¼
E ðmex þ ð1 mÞey Þ3 ð1 þ mÞð1 2mÞ
ð19Þ
540
S. M. Nigdeli et al.
sxy ¼
E 1 2m 3 ð cxy Þ ð1 þ mÞð1 2mÞ 2
ð20Þ
For mth triangular element, the strain energy (Um) can be calculated by multiplying strain energy density (em) and volume (Vm) of the element (Eq. (21)). The formulation of the volume (Vm) are given in Eq. (22), where t is thickness of the element. The strain energy equation for a system with n elements is written as in Eq. (23). The total potential energy (Pp) is found by subtracting the work done by the external forces from the total strain energy. For a system with p nodes and point loads, Pxi in x and Pyi in y directions, Pp is formulated via Eq. (24). U m ¼ em V m Vm ¼
ð21Þ
ðaj bk ak bj Þt 2
U¼
n X
ð22Þ ð23Þ
Um
m¼1
Pp ¼ U
p X
Pxi ui þ Pyi vi
ð24Þ
i¼1
3 The Optimization Algorithm The first algorithm was developed by Yang as an optimization method called as flower pollination algorithm (FPA) by considering the natural behavior of flowery plants as pollination process [12]. This process is divided into two different stages and determined according to a special parameter based on the pollination style. In this regard, a parameter called switch probability (sp) is utilized to realize the process either global (Eq. 25) or local pollination (Eq. 26). X new;i ¼ X old;i þ LðX old;i gÞ
ð25Þ
X new;i ¼ X old;i þ 2 ðX j X k Þ
ð26Þ
Here, to reach the minimization target, better new solutions are replaced with old ones. Also, rand(0, 1) is a function that provides the generation of random values between numbers in brackets. g* shows the best candidate solution in terms of minimum objective function that it is selected among all flower pollination. Also, L means to random flight function as Lévy distribution. ɛ is a random value ranged within 0 and 1, besides that Xj and Xk reflect the different solutions determined randomly. The second method is Jaya algorithm (JA) proposed by Rao [13]. The main target of JA is to reach the best solution (g*) besides moving away from the worst solution (gw). So, the optimum solution is provided with a victory approach due to the mentioned JA principle. Additionally, the Jaya word is meant to victory in the Sanskrit
Analysis of Non-linear Structural Systems via Hybrid Algorithms
541
language that it suits the main target of JA while the optimization process is performed. This process can be carried out via Eq. (27) by including only one phase and not utilizing any specific parameter. X new;i ¼ X old;i þ randð0; 1Þðg X old;i Þ randð0; 1Þðgw X old;i Þ
ð27Þ
To improve the process of structural analyses, JA is handled owing to that JA has only one phase, besides different phases or parameters can be added to it which is open to modification by adding new phases. Three novel and hybrid algorithms are developed with improving the JA process. The first one is developed with JA by combining Lévy distribution, which provides the randomization via pollinator flight in FPA. To realize this, determining of random values with (rand(0, 1)) within the expression of JA (Eq. (27)) is changed via Lévy distribution (L). Also, a second phase is added from the student phase of Teachinglearning-based optimization (TLBO). The existing phases are chosen with was switch probability. In this sense, the first hybrid algorithm is named JALS. TLBO was developed by Rao et al. [14] by inspiring the teach-learn process between teacher and students. TLBO comprises two separate stages, which are called the teacher and student phases. The second stage is known as the student phase where students improve their knowledge and grade levels themselves by applying interaction with each other and making some investigations, etc. This phase can be formalized via Eq. (28). Here, it shows that Xi and Xj are different candidate solutions determined as randomly. X new;i ¼
X old;i þ randð0; 1ÞðX i X j Þ; f ðX i Þ [ f ðX j Þ X old;i þ randð0; 1ÞðX j X i Þ; f ðX i Þ \ f ðX j Þ
ð28Þ
In the second algorithm, JA is handled together with the student phase of TLBO to evaluate the other candidate solutions out of the best and worst. This hybrid algorithm provides the randomization of solutions comparing to single phase usage by JA. While applying this algorithm, these two phases are also performed successively and symbolized with JA2SP. The last algorithm is proposed by considering JA with the student phase of TLBO. These phases are selected through switch probability (sp) of FPA. The reason of this modification is decreasing of optimization phase number by virtue of increasing analysis time with the usage of multiple phases to find the optimal results. So, this algorithm can be represented as JA1SP. Additionally, it is possible to say that JA1SP has not any special parameters due to the sp parameter is changed with the randomization process in the range of [0, 1].
4 Numerical Example The structural model and loading conditions of cantilever retaining can be seen in Fig. 2. This structure has a thickness of 1 mm besides 14 members-16 nodes created through system meshing. Moreover, as some material characteristics for wall, elasticity modulus (E) is 32 106 kN/m2; Poisson ratio is 0.2. The optimization outcomes are represented in Tables 1 and 2 for the linear and non-linear cases.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Energy (kNm) Iteration number
Node
JA Dx (mm) 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 −0.00001 −0.00001 −0.00001 0.00000 −0.00001 0.00001 −0.00002 0.00000 −0.00002 0.00000 −0.00014 −0.00003 −0.00014 0.00004 −0.00034 −0.00003 −0.00034 0.00005 −0.00057 −0.00002 −0.00057 0.00005 −0.00081 −0.00001 −0.00081 0.00005 −0.03156734444 761048
FPA Dx (mm) 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 −0.00001 −0.00001 −0.00001 0.00000 −0.00001 0.00001 −0.00002 0.00000 −0.00002 0.00000 −0.00014 −0.00003 −0.00014 0.00004 −0.00034 −0.00003 −0.00034 0.00005 −0.00057 −0.00002 −0.00057 0.00005 −0.00081 −0.00001 −0.00081 0.00005 −0.03156753578
1462853
1665053
JA1SP Dx (mm) 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 −0.00001 −0.00001 −0.00001 0.00000 −0.00001 0.00001 −0.00002 0.00000 −0.00002 0.00000 −0.00014 −0.00003 −0.00014 0.00004 −0.00034 −0.00003 −0.00034 0.00005 −0.00057 −0.00002 −0.00057 0.00005 −0.00081 −0.00001 −0.00081 0.00005 −0.03156753578 1196979
JA2SP Dy (mm) 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 −0.00001 −0.00001 −0.00001 0.00000 −0.00001 0.00001 −0.00002 0.00000 −0.00002 0.00000 −0.00014 −0.00003 −0.00014 0.00004 −0.00034 −0.00003 −0.00034 0.00005 −0.00057 −0.00002 −0.00057 0.00005 −0.00081 −0.00001 −0.00081 0.00005 −0.03156753578
Table 1. Optimum results for linear solution case of retaining wall.
1326900
JALS Dx (mm) 0.00000 0.00000 0.00000 −0.00037 0.00128 0.00000 0.00172 −0.00022 0.00255 0.00000 0.00236 −0.00071 0.00500 0.00000 0.00500 −0.00031 0.00480 −0.00154 0.00500 −0.00219 0.00487 −0.00327 0.00500 −0.00336 0.00500 −0.00500 0.00500 −0.00486 0.00500 −0.00500 0.00500 −0.00500 −0.03156753578
542 S. M. Nigdeli et al.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Energy (kNm) Iteration number
Node
FPA Dx (mm) 0.0000 0.0000 0.0000 −0.0024 −0.0020 0.0000 −0.0131 −0.0123 −0.0173 0.0000 −0.0228 0.0141 −0.0360 0.0000 −0.0422 0.0056 −0.1189 −0.0246 −0.1123 0.0352 −0.2833 −0.0270 −0.2788 0.0425 −0.4930 −0.0240 −0.4903 0.0456 −0.7307 −0.0156 −0.7298 0.0428 −40.719901868 675955
JA Dx (mm) 0.0000 0.0000 0.0000 −0.0024 −0.0020 0.0000 −0.0131 −0.0123 −0.0173 0.0000 −0.0228 0.0141 −0.0360 0.0000 0.0056 −0.0422 −0.1189 −0.0246 −0.1123 0.0352 −0.2833 −0.0270 −0.2788 0.0425 −0.4930 −0.0240 −0.4903 0.0456 −0.7307 −0.0156 −0.7298 0.0428 −40.719901868 300302
JA1SP Dx (mm) 0.0000 0.0000 0.0000 −0.0024 −0.0020 0.0000 −0.0131 −0.0123 −0.0173 0.0000 −0.0228 0.0141 −0.0360 0.0000 −0.0422 0.0056 −0.1189 −0.0246 −0.1123 0.0352 −0.2833 −0.0270 −0.2788 0.0425 −0.4930 −0.0240 −0.4903 0.0456 −0.7307 −0.0156 −0.7298 0.0428 −40.719901868 529275
JA2SP Dy (mm) 0.0000 0.0000 0.0000 −0.0024 −0.0020 0.0000 −0.0131 −0.0123 −0.0173 0.0000 −0.0228 0.0141 −0.0360 0.0000 −0.0422 0.0056 −0.1189 −0.0246 −0.1123 0.0352 −0.2833 −0.0270 −0.2788 0.0425 −0.4930 −0.0240 −0.4903 0.0456 −0.7307 −0.0156 −0.7298 0.0428 −40.719901868 109687
Table 2. Optimum results for non-linear solution case of retaining wall. JALS Dx (mm) 0.0000 0.0000 0.0000 0.0000 −0.0020 −0.0020 −0.0131 −0.0131 −0.0173 −0.0173 −0.0228 −0.0228 −0.0360 −0.0360 −0.0422 −0.0422 −0.1189 −0.1189 −0.1123 −0.1123 −0.2833 −0.2833 −0.2788 −0.2788 −0.4930 −0.4930 −0.4903 −0.4903 −0.7307 −0.7307 −0.7298 −0.7298 −40.719901868 176397
Analysis of Non-linear Structural Systems via Hybrid Algorithms 543
544
S. M. Nigdeli et al.
Fig. 2. Structural model of cantilever retaining wall [15]
5 Conclusion According to the results, all classical and hybrid algorithms are effective to find the same energy value as the final result. This situation is both effective in linear and nonlinear cases. According to linear case results, the classical JA algorithm needs the least number of iterations, but it cannot be said for the non-linear case. In that situation, the best algorithms are JA2SP (although the computing time is double due to applying two phases in an iteration) and JALS. Due to these different findings, different modifications of the algorithms may lead to a better and advanced solution for various problems.
Analysis of Non-linear Structural Systems via Hybrid Algorithms
545
As conclusion, by classical and hybrid algorithms, TPO/MA is an alternative structural analysis tool for nonlinear problems. The effectiveness of the methods will be increased by solving new types of problems. Acknowledgments. This study was funded by Scientific Research Projects Coordination Unit of Istanbul University-Cerrahpasa. Project number: FYO-2019-32735.
References 1. Toklu, Y.C.: Nonlinear analysis of trusses through energy minimization. Comput. Struct. 82 (20–21), 1581–1589 (2004) 2. Toklu, Y.C., Temür, R., Bekdaş, G.: Computation of nonunique solutions for trusses undergoing large deflections. Int. J. Comput. Meth. 12(03), 1550022 (2015) 3. Nigdeli, S.M., Bekdaş, G., Toklu, Y.C.: Total potential energy minimization using metaheuristic algorithms for spatial cable systems with increasing second order effects. In: 12th International Congress on Mechanics (HSTAM2019), pp. 22–25, September 2019 4. Bekdaş, G., Kayabekir, A.E., Nigdeli, S.M., Toklu, Y.C.: Advanced energy-based analyses of trusses employing hybrid metaheuristics. Struct. Des. Tall Spec. Build. 28(9), e1609 (2019) 5. Toklu, Y.C., Uzun, F.: Analysis of tensegric structures by total potential optimization using metaheuristic algorithms. J. Aerosp. Eng. 29(5), 04016023 (2016) 6. Toklu, Y.C., et al.: Total potential optimization using metaheuristic algorithms for solving nonlinear plane strain systems. Appl. Sci. 11(7), 3220 (2021) 7. Toklu, Y.C., Bekdaş, G., Kayabekir, A.E., Nigdeli, S.M., Yücel, M.: Total potential optimization using hybrid metaheuristics: a tunnel problem solved via plane stress members. In: Nigdeli, S.M., Bekdaş, G., Kayabekir, A.E., Yucel, M. (eds.) Advances in Structural Engineering—Optimization. SSDC, vol. 326, pp. 221–236. Springer, Cham (2021). https:// doi.org/10.1007/978-3-030-61848-3_8 8. Toklu, Y.C., Kayabekir, A.E., Bekdaş, G., Nigdeli, S.M., Yücel, M.: Analysis of plane-stress systems via total potential optimization method considering nonlinear behavior. J. Struct. Eng. 146(11), 04020249 (2020) 9. Toklu, Y.C., Bekdaş, G., Kayabekir, A.E., Nigdeli, S.M., Yücel, M.: Total potential optimization using metaheuristics: analysis of cantilever beam via plane-stress members. In: Nigdeli, S.M., Kim, J.H., Bekdaş, G., Yadav, A. (eds.) ICHSA 2020. AISC, vol. 1275, pp. 127–138. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-8603-3_12 10. Kayabekir, A.E., Toklu, Y.C., Bekdaş, G., Nigdeli, S.M., Yücel, M., Geem, Z.W.: A novel hybrid harmony search approach for the analysis of plane stress systems via total potential optimization. Appl. Sci. 10(7), 2301 (2020) 11. Toklu, Y.C., Bekdas, G., Nigdeli, S.M.: Metaheuristics for Structural Design and Analysis. Wiley, Hoboken (2021) 12. Yang, X.-S.: Flower pollination algorithm for global optimization. In: Durand-Lose, J., Jonoska, N. (eds.) UCNC 2012. LNCS, vol. 7445, pp. 240–249. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32894-7_27 13. Rao, R.: Jaya: a simple and new optimization algorithm for solving constrained and unconstrained optimization problems. Int. J. Ind. Eng. Comput. 7(1), 19–34 (2016) 14. Rao, R.V., Savsani, V.J., Vakharia, D.P.: Teaching–learning-based optimization: a novel method for constrained mechanical design optimization problems. Comput. Aided Des. 43 (3), 303–315 (2011) 15. Topçu, A.: Sonlu Elemanlar Metodu, Eskişehir Osmangazi Üniversitesi, April 2019. http:// mmf2.ogu.edu.tr/atopcu/
Ising Model Formulation for Job-Shop Scheduling Problems Based on Colored Timed Petri Nets Kohei Kaneshima1 and Morikazu Nakamura2(B) 1
Graduate School of Engineering and Science, University of the Ryukyus, Nishihara, Okinawa 903-0213, Japan [email protected] 2 Computer Science and Intelligent Systems, University of the Ryukyus, Nishihara, Okinawa 903-0213, Japan [email protected]
Abstract. This paper presents a colored timed Petri net-based Ising model formulation for job-shop scheduling problems. By extracting fundamental properties of Petri nets such as the structural precedence relation, the firing conflicts, we can incrementally construct the corresponding Ising model for a given job-shop scheduling problem. Our approach can overcome the difficulty of Ising model formulation for quantum annealing. This paper presents the formal composition method, an illustrated example, and some results of the computational evaluation for our binary search-based quantum annealing process. Keywords: Ising model problem · Optimization
1
· Quantum annealing · Petri net · Scheduling
Introduction
Combinatorial optimization has been considered a fundamental research field in computer sciences and operations research. We reduce various problems in our real life into combinatorial optimization problems for minimizing costs or maximizing throughputs. In addition, recent hot areas such as machine learning and IoT data incentive applications require combinatorial optimization as a core processing step. Many practical combinatorial optimization problems are known as NP-hard; polynomial-time deterministic algorithms have not been developed thus far [1]. The mathematical programming approach reduces the search space drastically by using mathematical techniques and obtaining the exact solution for small but practical problems. Heuristic algorithm approaches find the reasonable quality of feasible solutions, in which the reasonability means its practicality even though it is not the exact solution [2]. Meta-heuristic algorithms are not problem-specific but applicable to many varieties of combinatorial optimization problems [3]. Genetic algorithms, simulated annealing, and tabu search are well-known meta-heuristics. Quantum c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 546–554, 2022. https://doi.org/10.1007/978-3-030-93247-3_54
Ising Model Formulation for Job-Shop Scheduling Problems
547
annealing is a new meta-heuristic algorithm inspired by the quantum mechanics for combinatorial optimization [4,5]. Once we formulate a target combinatorial optimization problem into the Ising model, the annealing process finds the lowest energy state of the model. The state gives us a solution for the target problem. D-wave is the first commercial machine of quantum annealing [6], and there are several digital machines for performing simulated quantum annealing. To date, many application examples of quantum annealing are shown in the literature [7] even though we have several difficulties to utilizing much more the new platform efficiently. The Ising model formulation is an obstacle to improving the usability of quantum annealing. The formulation requires our deep knowledge of mathematics and sufficient skills obtained from experience. To overcome this difficulty, we proposed a model-based approach for Ising model formulation [8], in which our method can systematically generate the Ising model from the Petri net model of a target problem. Our previous work presented the formal framework of the Petri net modeling-based Ising model formulation. We introduced binary quadratic nets, a class of colored Petri net, to represent Ising models with Petri net forms. We use higher-classes of Petri nets such as colored timed Petri net to model target problems. We call the class of Petri nets the problem-domain Petri net; we systematically convert problemdomain Petri net models into binary quadratic nets. Thus, we can easily obtain the corresponding Ising model if we model the target problem with high-level Petri nets. That is, our method drastically reduces the difficulty of the Ising model formulation. In our previous paper [8], we showed a QUBO model formulation for a jobshop scheduling problem as an example without a detailed explanation. The formulation was for the single available resource case, where each resource type contains only a single available resource. This paper presents the detailed QUBO model formulation of job-shop scheduling problems and shows some evaluation results.
2
Quantum Annealing
Quantum annealing is a new optimization algorithm inspired by quantum mechanics for combinatorial optimization. The annealing process searches optimal solutions composed of values of the Ising variables, s = (s1 , s2 , . . . , sN ), si ∈ {−1, +1} to minimize the energy represented by the Hamiltonian: HP (s) =
N i=1
hi si +
Ji,j si sj ,
(1)
i high;
5
Evaluation
We implemented our method with Python, where we utilize CPNTools [10] for GUI software to create Petri net models and SNAKES [12] to represent Petri net objects in Python programs. PyQubo is a useful software to convert Python objects to the specific formats of annealing machines [13]. We evaluated the number of iterations in the binary search in Algorithm 1. Figure 2 depicts the result. The horizontal axis indicates the number of jobs in the job-shop scheduling problem instance, where jssx denotes that the number of jobs is x. The vertical axis shows the number of iterations on average among 100 runs. The result shows the limited number of iterations we need to obtain an optimal schedule even though we cannot confirm the optimality. From the characteristics of quantum annealing, we sometimes fail to obtain feasible solutions. Careful tuning of parameters A, B, and C in (26) and other
Ising Model Formulation for Job-Shop Scheduling Problems
553
annealing parameters are required in practical use. Figure 3 shows the ratio of infeasible solutions among 100 runs. The ratio becomes bigger for a larger size of instances. In the experiment, we used the same parameters for different sizes of instances. The ratio can be reduced when we tune the parameters according to the problem size.
Fig. 2. Iterations in binary search
6
Fig. 3. Ratio of infeasible solutions
Conclusion
We propose a colored timed Petri net-based Ising model formulation for job-shop scheduling problems. Our approach can overcome the difficulty of Ising model formulation for quantum annealing. This paper presents the formal composition method, an illustrated example, and some results of the computational evaluation for our binary search-based quantum annealing process. In the future, we will extend the method to the multiple resource requirement problems. They are more practical formulations than the conventional ones, but we need to reduce the complexity of the generated model for efficient quantum annealing.
References 1. Garey, M.R., Johnson, D.S.: Computers and Intractability; A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York (1990) 2. Hoos, H.H., St¨ utzle, T.: 1 - introduction. In: Hoos, H.H., St¨ utzle, T. (eds.) Stochastic Local Search. The Morgan Kaufmann Series in Artificial Intelligence, pp. 13–59. Morgan Kaufmann, San Francisco (2005). https://www.sciencedirect.com/science/ article/pii/B9781558608726500184 3. Gendreau, M., Potvin, J.Y.: Metaheuristics in combinatorial optimization. Ann. Oper. Res. 140(1), 189–213 (2005). https://doi.org/10.1007/s10479-005-3971-7 4. Kadowaki, T., Nishimori, H.: Quantum annealing in the transverse Ising model. Phys. Rev. E 58, 5355–5363 (1998). https://doi.org/10.1103/PhysRevE.58.5355 5. Farhi, E., Goldstone, J., Gutmann, S., Lapan, J., Lundgren, A., Preda, D.: A quantum adiabatic evolution algorithm applied to random instances of an NPcomplete problem. Science 292(5516), 472–475 (2001). https://science.sciencemag. org/content/292/5516/472
554
K. Kaneshima and M. Nakamura
6. Johnson, M.W., et al.: Quantum annealing with manufactured spins. Nature 473(7346), 194–198 (2011). https://doi.org/10.1038/nature10012 7. Lucas, A.: Ising formulations of many np problems. Front. Phys. 2, 5 (2014) 8. Nakamura, M., Kaneshima, K., Yoshida, T.: Petri net modeling for Ising model formulation in quantum annealing. Appl. Sci. 11(16), 7574 (2021). https://www. mdpi.com/2076-3417/11/16/7574 9. Murata, T.: Petri nets: properties, analysis and applications. Proc. IEEE 77(4), 541–580 (1989) 10. Jensen, K., Kristensen, L.M., Wells, L.: Coloured petri nets and CPN tools for modelling and validation of concurrent systems. Int. J. Softw. Tools Technol. Transf. 9(3), 213–254 (2007). https://doi.org/10.1007/s10009-007-0038-x 11. Venturelli, D., Marchand, D.J.J., Rojo, G.: Job shop scheduling solver based on quantum annealing. arXiv:1506.08479v2 [quant-ph] (2016) 12. Pommereau, F.: SNAKES: a flexible high-level petri nets library (tool paper). In: Devillers, R., Valmari, A. (eds.) PETRI NETS 2015. LNCS, vol. 9115, pp. 254–265. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19488-2 13 13. Tanahashi, K., Takayanagi, S., Motohashi, T., Tanaka, S.: Application of ising machines and a software development for ising machines. J. Phys. Soc. Jpn 88(6), 061010 (2019). https://doi.org/10.7566/JPSJ.88.061010
Imbalanced Sample Generation and Evaluation for Power System Transient Stability Using CTGAN Gengshi Han, Shunyu Liu, Kaixuan Chen, Na Yu, Zunlei Feng, and Mingli Song(B) Zhejiang University, Hangzhou 310007, Zhejiang, China {hangengshi,liushunyu,chenkx,na yu,zunleifeng,brooksong}@zju.edu.cn
Abstract. Although deep learning has achieved impressive advances in transient stability assessment of power systems, the insufficient and imbalanced samples still trap the training effect of the data-driven methods. This paper proposes a controllable sample generation framework based on Conditional Tabular Generative Adversarial Network (CTGAN) to generate specified transient stability samples. To fit the complex feature distribution of the transient stability samples, the proposed framework firstly models the samples as tabular data and uses Gaussian mixture models to normalize the tabular data. Then we transform multiple conditions into a single conditional vector to enable multiconditional generation. Furthermore, this paper introduces three evaluation metrics to verify the quality of generated samples based on the proposed framework. Experimental results on the IEEE 39-bus system show that the proposed framework effectively balances the transient stability samples and significantly improves the performance of transient stability assessment models. Keywords: Power system · Transient stability Conditional generative adversarial network
1
· Sample generation ·
Introduction
Power system transient stability assessment is one of the most significant ways to ensure the security and stability of power systems. It assesses the ability of a power system to recover to the original secure state or transition to a new secure state after withstanding a specific disturbance [1]. Therefore, fast and accurate transient stability assessment is needed to deal with emergencies in time and effectively ensure the secure operation of power systems. However, Time Domain Simulation (TDS), a traditional method of transient stability assessment, is extremely time-consuming due to the nonlinear complexity of power systems [2]. In recent years, to improve the computational speed of assessment models, several transient stability assessment methods based on deep learning c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 555–565, 2022. https://doi.org/10.1007/978-3-030-93247-3_55
556
G. Han et al.
are proposed [3–6]. These assessment methods are usually data-driven, and need large-scale valid samples [7–10]. However, there are two problems that need to be addressed when training the assessment model in practice. Firstly, the insufficient samples cannot effectively represent the distribution of features, resulting in the risk of model overfitting. Moreover, since the category distribution of samples is highly imbalanced, the learning of the unstable samples is usually inhibited, leading to poor performance of trained models on unstable samples. To solve the insufficient and imbalanced samples and improve the performance of transient stability assessment models, we use a sample generation model to supplement the transient stability samples, especially the unstable samples. Generative Adversarial Network (GAN) [11] is widely used in sample generation tasks, which trains a generator and a discriminator in the adversarial process. However, the generation process of GAN is uncontrollable, resulting in a large number of unnecessary samples. Instead, Conditional GAN (CGAN) realized a conditional generation mechanism based on the architecture of GAN to generate required samples [12]. Furthermore, since transient stability samples are usually recorded as tabular data, we focus on the dedicated CTGAN method which implemented mode-specific normalization and conditional generation for tabular data [13]. Therefore, this paper proposes an imbalanced sample generation framework based on CTGAN for power system transient stability. Considering the structural characteristics of transient stability samples, the generation framework firstly models the samples as tabular data, and uses the Gaussian Mixture Model (GMM) to normalize the tabular data [14,15]. Multiple conditions, including the transient stability and the load level, are converted into a single condition vector to enable multi-conditional generation. Besides, we design a multi-metric evaluation to effectively evaluate the obtained sample generation framework. The evaluation includes the effect of conditional generation, distance calculation, and the performance of transient stability assessment models trained with generated samples. Case studies on the IEEE 39-bus system show that the proposed framework can effectively balance the transient stability samples and significantly improve the performance of transient stability assessment models.
2
Sample Generation Framework for Transient Stability
In this section, we detail the proposed sample generation framework based on CTGAN for power system transient stability. As shown in Fig. 1, the proposed sample generation framework first models transient stability samples as tabular data, then transform the data using one-hot code and GMM normalization, and finally train the CTGAN model. 2.1
Transient Stability Sample Representation
To construct appropriate input characteristics, we should not only consider the correlation between characteristics and transient stability, but also consider
Imbalanced Sample Generation and Evaluation
557
whether the characteristics can be obtained in real time or quickly calculated in actual power system. Assuming that there is no another fault in the transient process, the transient stability of the power system has been determined at the moment of fault removal. Therefore, we take the values at the moment of fault clearing as the representation of transient stability samples. A transient stability sample is represented by the voltage magnitude and voltage angle of bus nodes, active power and reactive power of load nodes, active power and reactive power of generator nodes at the moment of fault clearing.
Generation [ 0 0 ... 0 1 0 ... 0 0 ]
One-hot code of condition
Normalization with GMM Power system
Tabular data of transient stability samples Train data
Condition, z
Generator
Discriminator CTGAN model training
Fig. 1. Illustration of the proposed imbalanced sample generation framework based on CTGAN for power system transient stability.
2.2
Transformation of Multi-condition Vector
With the basic idea of conditional generation, the transient stability and the load level of transient stability samples are used as generation conditions to realize multi-conditional generation. However, in common CGANs, the conditional vector is a one-hot code, which can only represent a single condition. Therefore, a simple transformation method for multi-condition vector is designed in the proposed generation framework, which aims to convert multiple condition vectors into a single condition vector: cond1 ⊕ cond2 ⊕ · · · ⊕ condn
(1)
where cond represents conditions, n represents the number of conditions, and ⊕ represents the operation of serially concatenate. The specific principle is that n condition vectors can be serially concatenated, and then transformed into one condition vector as the condition input of CTGAN model.
558
2.3
G. Han et al.
Normalization with GMM
To eliminate the dimensional influence between different characteristics, it is important to transform the samples through appropriate methods before inputting them into the model for training. Transient stability samples are composed of the feature values of bus, load, and generator nodes. However, these continuous values cannot be normalized by one-hot code. Considering the complex distribution of transient stability samples, the general min-max normalization is unable to fit the complex distribution. Therefore, when processing transient stability samples, the variational GMM is used to process continuous values to fit the complex distribution of each feature. The basic steps of the normalization are elaborated as follows: Learning GMM. For each continuous column Ci , we use a variational Gaussian mixture model to learn a GMM distribution: PCi (ci,j ) =
mi
μk N (ci,j ; ηk , φk )
(2)
k=1
where mi is the number of modes, μk , ηk and φk are weight, mean value and standard deviation of the k th mode, respectively. Calculating Probability Density. For each value ci,j in column Ci , we calculate the probability density of each mode: ρk = μk N (ci,j ; ηk , φk )
(3)
Normalization. We find the highest ρk in mi modes and normalizing it. For instance, if the highest probability density η2 in three modes η1 , η2 , η3 , the value ci,j can be transformed to a one-hot code [0, 1, 0] and a scalar βi,j = (ci,j − η2 )/4φ2 normalized to [−1, 1]. 2.4
CTGAN-Based Network
We adopt CTGAN model as the basic sample generation model, which includes a generator and a discriminator. And we construct the generator and the discriminator with fully connected layers respectively. The processed transient stability samples are applied as the training input of the constructed CTGAN-based network. In the training process, the discriminator and generator are trained by turns to obtain the model for the sample generation framework. To test the model, we apply it to the generation task of transient stability samples with labels., And we can also control the generating conditions to generate samples with specific labels purposefully, such as controlling the model to generate transient unstable samples.
Imbalanced Sample Generation and Evaluation
3
559
Multi-metric Evaluation
After realizing the generation framework of power system transient stability samples, it is necessary to evaluate the generation framework. This paper designs a multi-metric evaluation for the transient stability samples generation framework. As shown in Fig. 2, the evaluation is composed of the following three metrics: the effect of conditional generation, the distance between real samples and generated synthetic samples, and the performance of assessment models trained with generated samples.
Evaluation control
generate
Conditional generation dimension reduction and binning
calculate
Discrete probability distributions Distance calculation train
test Performance of assessment models
Fig. 2. Illustration of three evaluation metrics for the sample generation framework based on CTGAN.
3.1
Conditional Generation
The power system transient stability sample generation framework should have the ability to control the transient stability and the load level characteristics of power system samples in generating process. By comparing the proportions of transient stability samples that generated under different settings (without setting conditions, setting conditions as transient stable, and setting conditions as transient unstable), the condition generation ability of the transient stability can be evaluated. The same is true for the evaluation of the load level condition. 3.2
Distance Calculation
Without setting the generating conditions, the generated samples should be similar to the real samples as much as possible. Therefore, calculating the similarity
560
G. Han et al.
or distance between the two distributions is an efficient metric for evaluating the generation framework. First, dimensionality reduction methods, such as Principal Component Analysis (PCA) [16], should be used to reduce the dimension of the samples to some appropriate degree. Second, we convert the dimensionality reduced samples into discrete probability distributions through the binning operation. Finally, we calculate the distance between the probability distribution of synthetic samples and real samples. Common methods for measuring the similarity between two distributions are adopted to calculate the distance between the distributions, such as KL divergence, JS divergence, and Wasserstein distance [17]. 3.3
Performance of Assessment Models
To evaluate the generated samples more practically, the performance of the transient stability assessment model trained with generated samples is a proper metric. Some classical networks for classification are selected as the power system transient stability assessment model, the generated samples are used for the training of assessment models, and the performance is obtained by testing the assessment models. More specifically, the real dataset of transient stability samples is randomly divided into Strain and Stest . We randomly generate Sgen and get the united set Sunion = Strain + Sgen . And Strain , Sgen , and Sunion are used for the training of the transient stability assessment models respectively to obtain different assessment models and the models are tested on Stest . These models are tested on the real test set to obtain the accuracy, recall rate of transient stable samples, recall rate of transient unstable samples. And these test scores can be used as the evaluation metric to evaluate the quality of the generation framework.
4
Experiment
In this section, we study our proposed framework on the classical IEEE 39-bus power system [18] and show its excellent performance by evaluating the effect of conditional control, calculating the distance between distributions and the scores of assessment models trained with generated samples. 4.1
Experimental Setup
Time Domain Simulation Samples. Matpower [19] and Power System Analysis Toolbox (PSAT) [20] are applied to obtain the original dataset of real samples, taking the IEEE 39-bus system as the basic system. The power system contains 39 buses, 10 generators, 19 loads and 46 transmission lines. For simulating the transient stability samples, we adopt the following principles: 1. Randomly changing both active and reactive power of all loads from 60% to 145% of basic load level.
Imbalanced Sample Generation and Evaluation
561
2. Using the matpower to compute the optimal power flow for the next TDS. 3. Randomly selecting a fault line, setting a three-phase grounding fault from 20% to 80% and clearing it after a time from 1/60 to 1/3 s. 4. Using the PSAT to do time domain simulation for 10 s. 5. Labeling the stability of generated sample by values of generators after TDS. With the simulation operations above performing, we generate a total of 14,221 transient stability samples that include 11,510 stable samples and 2,711 unstable samples as the original dataset. Generation Model Training. CTGAN is used as the primary sample generation model, which includes a generator and a discriminator. In the generator, two fully connected layers are used, and each fully connected layer is equipped with a batch normalization layer and a ReLU activation layer. The tanh and softmax activation functions are used for the output layer. In the discriminator, two fully connected layers are used, and the dropout layer is used to filter the nodes appropriately to reduce overfitting. 4.2
Evaluation Metrics
The CTGAN-based generation framework of power system transient stability samples is trained with the simulated samples as the training set. After that, it is necessary to evaluate the quality of the generation framework. This paper designs a multi-metric evaluation for the generation framework of transient stability samples, composed of three evaluation metrics. The Effect of Conditional Generation. We evaluate the ability to control the transient stability and the load level of power system samples in generating. Table 1 shows the result of conditional generation with different transient stability condition settings. We set the conditions as follows: no condition, stable, and unstable. When the condition is set as transient stable, the proportion of stable samples generated is increased by 18.7% compared with the samples generated without condition. When the condition is set as unstable, the increment is 48.8%. The result shows that the transient stability ratio of the generated samples can be effectively controlled, and the framework can effectively balance the transient stability samples by generating more unstable samples. Table 1. The result of generation with different transient stability conditions. Condition
Stable proportion (%) Unstable proportion (%)
Without condition
59.92
With condition (stable)
71.10
28.90
With condition (unstable) 40.38
59.62
40.08
562
G. Han et al.
Moreover, the result of conditional generation with different load level condition settings is shown in Table 2. We set the conditions as no condition, and as 18 load levels (60% to 145%, with a step of 5%). We count the number of samples of corresponding load level in the generated samples under the control of generation conditions, and calculate the proportion for comparison. When the condition is set to a specific load level, the proportion of the corresponding load level generated will be higher than that of the samples generated without condition. The results show that the generation framework can effectively control the load level proportion of the generated samples. Table 2. The result of generation with different load level conditions. Condition Generated without condition (%)
Generated with load level condition (%)
Rate of improvement (%)
70%
2.49
3.35
34.54
80%
3.89
4.64
19.32
90%
2.70
4.92
81.90
100%
3.38
4.59
36.09
110%
5.81
8.86
52.60
120%
2.51
2.58
2.67
130%
2.50
4.60
84.53
140%
0.54
1.81
233.76
The Distance Between Real and Generated Sample Distribution. The generated samples should be similar to the real samples as much as possible. Therefore, calculating the distance between the two distributions is an efficient metric for evaluating the generation framework. Table 3 shows the results of JS divergence and Wasserstein distance calculated between distributions. We randomly select 2,000 samples from real samples as A B , repeat the operation to get Sreal , and generate 2,000 samples as set set Sreal A and Sgen and Sgen . From Table 3, we can see that the distance between Sreal A B the distance between Sreal and Sreal are in the same order of magnitude, which means that the samples generated by the generation framework are similar to the real samples in these three distance measurements. The Performance of Assessment Models Trained with Generated Samples. The performance of assessment models trained with generated samples is a valuable metric. In this paper, we select Multilayer Perceptron (MLP) and Decision Tree (DT) as the power system transient stability assessment models for training and testing, since they are classical network models for classification
Imbalanced Sample Generation and Evaluation
563
Table 3. The distance between distributions. Distributions JS divergence Wasserstein distance A B Sreal , Sreal A Sreal , A Sreal ,
B Sreal B Sreal
0.002826
0.001429
0.063141
0.006388
0.063084
0.005939
problems. The hidden layer size of MLP is 200, and the max number of iterations is 500. The max depth of DT is 100. Table 4 shows the test results of assessment models trained with different datasets. We randomly divide the real dataset into Strain with 8,533 samples and Stest with 5,688 samples. We randomly generate Sgen with 8,533 samples and get the united set Sunion = Strain + Sgen . Note that Strain , Sgen , and Sunion are used for the training of the transient stability assessment models respectively to obtain different assessment models and the models are tested on Stest . The scores of the model trained with Sgen are lower than the scores of the model trained with Strain . However, the scores of the model trained with Sunion are higher than the scores of the model trained with Strain . The recall rate of unstable samples is increased by 1.48% in DT, and is increased by 2.74% in MLP. The results show that adding the generated samples into the train set is able to improve the performance of transient stability assessment models, especially for the unstable label, which is the scarce class in the train set. Table 4. The test results of assessment models trained with different datasets. RecallP is the recall rate of stable samples, and RecallN is the recall rate of unstable samples. Assessment model Train dataset RecallP RecallN F1 score Accuracy
5
DT
Strain
0.9770
0.9348
0.9813
0.9694
DT
Sgen
0.9430
0.6856
0.9376
0.8969
DT
Sunion
0.9788
0.9486
0.9837
0.9734
MLP
Strain
0.9883
0.9261
0.9861
0.9771
MLP
Sgen
0.7719
0.8061
0.8509
0.7780
MLP
Sunion
0.9832
0.9515
0.9863
0.9775
Conclusion
In this paper, we attempt to solve the imbalanced distribution and insufficient samples in the research of power system transient stability assessment. We propose a CTGAN-based controllable sample generation framework for transient stability. In the generation framework, firstly, the transient stability samples are processed into tabular data. Then the transient stability and load level are converted into the conditional vector and the variational Gaussian mixture model
564
G. Han et al.
is used to fit and normalize the tabular data. And finally train the CTGAN model with processed samples. Moreover, we design a multi-metric evaluation to effectively evaluate the generation framework from three aspects: the effect of conditional generation, the distance between real and generated sample distribution, and the performance of the assessment model trained with generated samples. Experiments demonstrate that samples generated through the proposed generation framework are valid and effective in multiple metrics. Acknowledgement. This work is funded by National Key Research and Development Project (Grant No: 2018AAA0101503) and State Grid Corporation of China Scientific and Technology Project: Fundamental Theory of Human-in-the-loop HybridAugmented Intelligence for Power Grid Dispatch and Control.
References 1. Wei, W., Yong, T., Huadong, S., Shiyun, X.: A survey on research of power system transient stability based on wide-area measurement information. Power Syst. Technol. 36(9), 81–87 (2012) 2. Tang, C., Graham, C., El-Kady, M., Alden, R.: Transient stability index from conventional time domain simulation. IEEE Trans. Power Syst. 9(3), 1524–1530 (1994) 3. Gao, K., Yang, S., Liu, S., Li, X.: Transient stability assessment for power system based on one-dimensional convolutional neural network. Autom. Electric Power Syst. 43(12), 18–26 (2019) 4. Li, N., Li, B., Gao, L.: Transient stability assessment of power system based on XGBoost and factorization machine. IEEE Access 8, 28403–28414 (2020) 5. Tacchi, M.: Model based transient stability assessment for power systems. In: European Control Conference, p. 328 (2020) 6. Hu, W., et al.: Real-time transient stability assessment in power system based on improved SVM. J. Mod. Power Syst. Clean Energy 7(1), 26–37 (2019) 7. Wang, B., Fang, B., Wang, Y., Liu, H., Liu, Y.: Power system transient stability assessment based on big data and the core vector machine. IEEE Trans. Smart Grid 7(5), 2561–2570 (2016) 8. Vasant, P., Zelinka, I., Weber, G.W.: Intelligent Computing & Optimization. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00979-3 9. Vasant, P., Zelinka, I., Weber, G.W.: Intelligent Computing and Optimization. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33585-4 10. Vasant, P., Zelinka, I., Weber, G.W.: Intelligent Computing and Optimization. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-68154-8 11. Goodfellow, I.J., et al.: Generative adversarial nets. In: Annual Conference on Neural Information Processing Systems, pp. 2672–2680 (2014) 12. Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014) 13. Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Annual Conference on Neural Information Processing Systems, pp. 7333–7343 (2019) 14. Anzai, Y.: Pattern Recognition and Machine Learning. Elsevier, Amsterdam (2012) 15. Tsukakoshi, K., Ida, K.: Analysis of GMM by a gaussian wavelet transform. In: Proceedings of the Conference on Systems Engineering Research, pp. 467–472 (2012)
Imbalanced Sample Generation and Evaluation
565
16. Yata, K., Aoshima, M.: Principal component analysis based clustering for highdimension, low-sample-size data. arXiv preprint arXiv:1503.04525 (2015) 17. Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: International Conference on Machine Learning, pp. 214–223 (2017) 18. Pai, M.: Energy Function Analysis for Power System Stability. Springer, Boston (2012) 19. Zimmerman, R.D., Murillo-S´ anchez, C.E., Thomas, R.J.: MATPOWER: steadystate operations, planning, and analysis tools for power systems research and education. IEEE Trans. Power Syst. 26(1), 12–19 (2010) 20. Ayasun, S., Nwankpa, C.O., Kwatny, H.G.: Voltage stability toolbox for power system education and research. IEEE Trans. Educ. 49(4), 432–442 (2006)
Efficient DC Algorithm for the Index-Tracking Problem F. Hooshmand(&) and S. A. MirHassani Department of Mathematics and Computer Science, Amirkabir University of Technology (Tehran Polytechnic), Tehran, Iran {f.hooshmand.khaligh,a_mirhassani}@aut.ac.ir
Abstract. Index tracking is one of the successful strategies in the portfolio management. This paper reviews three well-known models of index tracking problem, namely return-based, value-based, and beta-based models, and compares their performance in terms of the tracking accuracy on in-sample and outof-sample data over real instances. Due to the low tracking error of the portfolio obtained by the value-based model, and NP-hardness of this problem, an efficient iterative method based on the difference of convex functions is proposed to find high-quality feasible solutions within a short amount of time. Computational results over real-world instances confirm the effectiveness of the proposed method. Keywords: Index tracking problem Value-based model convex functions Iterative DC algorithm
Difference of
1 Introduction Optimization techniques are successfully applied in finance. See [1, 2], and [3]. Specially, portfolio management is a popular research filed in optimization and includes active and passive strategies. In the active strategy, the fund manager frequently (e.g., daily and weekly) checks the status of the portfolio and rebalances it based on the technical and fundamental analyses. However, in the passive strategy, a suitable portfolio is constructed and kept unchanged for a long time. One of the well-known methods of passive management is the index tracking (IT) approach which constructs a portfolio mirroring the index of the market as closely as possible while containing a limited number of assets. Due to the long-time growth of the return of the market index, it is expected that an IT portfolio leads to an appropriate return during a long time. The IT problem has received great attention from researchers and depending on the function used to calculate the tracking-error, different optimization models have been presented in the literature. Concerning the tracking-error function, existing formulations can be classified into return-based, value-based, and beta-based models. The aim of return-based models is to construct a portfolio, the return of which over historical data has minimum deviation from the return of the index. For some related works, see Gaivoronski et al. [4], Mezali and Beasley [5], Sant’Anna et al. [3], and Moeini [6]. Value-based models create a portfolio, the value of which over historical data has minimum deviation from the value of index scaled by a constant factor. For example, © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 566–576, 2022. https://doi.org/10.1007/978-3-030-93247-3_56
Efficient DC Algorithm for the Index-Tracking Problem
567
see Gaivoronski et al. [4] and Guastaroba and Speranza [7]. Beta-based models aim at minimizing the distance between the beta-coefficients associated with the portfolio and the market index. For example, see Canakgoz and Beasley [8] and Chen and Kwon [9]. Enhanced IT (EIT) problem is an extension of IT aiming at outperforming the market index. For example, Guastaroba, et al. [10] proposed a model based on the omega ratio for the EIT problem. Filippi, et al. [11] used two objectives including the maximization of the expected return and the minimization of the tracking error. Due to the NPhardness of IT and EIT problems, they are mostly solved via heuristics and metaheuristics such as genetic algorithm [3], and kernel search [7, 11]. The main contributions of this paper are as follows: First, the IT models are reviewed and good performance of the value-based model is justified by evaluating its tracking-error over in-sample and out-of-sample data. Then, to overcome the difficulty of solving the value-based model, an efficient method based on the difference of convex (DC) functions is proposed. The most relevant paper to our work is the study of Moeini [6] who proposed a DC algorithm for the return-based model. We improve the method of Moeini [6] by introducing a combinatorial cut and a quality-regulator constraint, and embedding the basic DC algorithm into an iterative method. Computational results over real-world instances confirm the superiority of our method over that of Moeini [6] in terms of the solution quality. The results indicate that our method can achieve high quality solutions in a short amount of time. The rest of this paper is organized as follows: Sect. 2 provides an overview on different formulations of IT problem. Section 3 reviews the theory of DC programming. Section 4 reformulates the value-based model as a DC program for which an efficient iterative DC-based algorithm is presented in Sect. 5. The performance of our algorithm is investigated in Sect. 6. Finally, Sect. 7 concludes and offers directions for future research.
2 Comparative Consideration of IT Models 2.1
IT Models
Let I (indexed by i) be the set of assets in the market and consider T (indexed by t; t0 ) as the set of previous time periods for which the historical returns are available. An investor intends to construct a portfolio containing C assets provided that the fraction of capital invested in each selected asset i belongs to the range ½li ; ui and the index is mirrored as closely as possible. Short-selling is not allowed and due to the assumption that the portfolio is kept unchanged for a long time, the transaction costs are neglected. Consider r i;t and r 0t , respectively, as the return of asset i and the index return in time period t, and let bi be the beta coefficient of asset i. In what follows, we present returnbased, value-based, and beta-based models and compare their efficiency in terms of the tracking error. We refer to these models as RM, VM, and BM, respectively, for short. Model RM The model RM minimizes the sum of squared difference between the portfolio and index returns over historical periods t 2 T. Decision variables are defined as follows:
568
F. Hooshmand and S. A. MirHassani
di Binary variable that is 1 if asset i is selected, 0 otherwise. xi The fraction of the capital invested in the selected asset i. RM is formulated as the following mixed integer nonlinear program (MINLP): ðRMÞ min
T X X t¼1
!2 r i;t xi
r 0t
ð1Þ
i2I
s:t: X
di ¼ C
ð2Þ
xi ¼ 1
ð3Þ
i2I
X i2I
l i di x i ui di xi 0
8i 2 I
ð4Þ
8i 2 I
di 2 f0; 1g
ð5Þ
8i 2 I
ð6Þ
Model VM The model VM minimizes the sum of squared difference between the portfolio and index values over historical periods t 2 T assuming that the index value equals 1 (i.e. the amount of investor’s capital) at the beginning of the first period. VM is formulated as the following MINLP: ðVMÞ min
t X X Y t2T
i2I
!
1 þ r i;t0 xi
t0 ¼1
t Y
t0 ¼1
1 þ r 0t0
!2
ð7Þ
s:t: ð2Þð6Þ Model BM BM is formulated as the following MINLP minimizing the difference between the beta coefficients of the portfolio and the market index: ðBMÞ min
X
!2 bi xi 1
i2I
s:t: ð2Þð6Þ
ð8Þ
Efficient DC Algorithm for the Index-Tracking Problem
2.2
569
Evaluation of Models
Here, the performance of models RM, VM, and BM are evaluated on four datasets (available at https://or-brescia.unibs.it/instances) associated with the weekly returns of index FTSE100 (composed of 100 assets). Each dataset consists of 104 weeks of insample observations, and 52 weeks of out-of-sample ones. The in-sample information is included into the model, and then, the portfolio obtained by the model is evaluated over the out-of-sample realizations. The datasets are named based on the market trends (increasing or decreasing) over in-sample and out-of-sample periods as down-down (DD), down-up (DU), up-down (UD), and up-up (UU). Two values C = 10, 15 are examined for the cardinality. All models are solved via solver BARON, and three criteria, namely in-sample absolute deviation (ISAD), out-of-sample absolute deviation (OSAD), and out-of-sample lower deviation (OSLD) are used to evaluate the tracking errors of the portfolios obtained by each model. Considering x as the optimal portfolio, obtained by a given model, the aforementioned criteria are calculated as (9)-(11), where ðuÞ ¼ maxðu; 0Þ. The results are provided in Table 1. 104 1 X ISAD ¼ 104 t¼1 156 1 X OSAD ¼ 52 t¼105 156 1 X OSLD ¼ 52 t¼105
P Q ! Qt t 0 i2I t0 ¼1 1 þ r i;t0 xi t0 ¼1 1 þ r t0 Qt 100 0 t0 ¼1 1 þ r t0 P Q ! Qt t 0 i2I t0 ¼1 1 þ r i;t0 xi t0 ¼1 1 þ r t0 Qt 100 0 t0 ¼1 1 þ r t0
P Qt i2I
t0 ¼1
! Q 1 þ r i;t0 xi tt0 ¼1 1 þ r 0t0 Qt 100 0 t0 ¼1 1 þ r t0
Table 1. Comparison of models RM, VM and BM ID
C
DD DU UD UU DD DU UD UU Ave.
10 10 10 10 15 15 15 15
ISAD ð%Þ OSAD ð%Þ RM VM BM RM VM BM 10.2 0.6 8.7 1.7 2.8 2.6 4.2 0.6 3.5 3.5 1.3 2.8 4.2 0.7 12.2 5.5 9.3 11.2 3.9 0.4 9.4 1.4 1.6 3.9 1.8 0.5 3.2 2.3 2.4 1.6 4.0 0.4 1.8 1.8 1.1 3.5 2.9 0.5 6.9 8.1 5.9 6.5 2.3 0.3 6.5 3.2 3.1 2.3 4.2 0.5 6.5 3.4 3.4 4.3
OSLD ð%Þ RM VM BM 0.6 0.2 1.9 3.5 0.9 1.5 5.3 0.0 1.7 1.3 1.5 0.6 0.0 0.0 0.6 0.7 0.5 3.4 0.0 0.0 3.2 3.2 0.6 0.6 1.8 0.5 1.7
ð9Þ
ð10Þ
ð11Þ
570
F. Hooshmand and S. A. MirHassani
As can be seen, the model VM has better performance regarding tracking errors over in-sample and out-of-sample data. It is worth mentioning that the optimal solution to BM is achieved in about 2 s; however, the resolution process of the models RM and VM via solver BARON is time-consuming and it is stopped with a time limit of 1000 s, and the best solutions found are utilized. The good performance of the model VM, on the one hand, and the difficulty of solving it via optimization solvers, on the other hand, motivate us to propose an efficient DC-based algorithm to solve it.
3 A Review on DC Programming This section provides some basic concepts of DC programming and presents a brief review on the classic DC algorithm (DCA). For a comprehensive detailed description, see Dinh and Le Thi [12]. Let h : Rn ! R be a convex function with the domain dom h ¼ f x 2 Rn : hð xÞ\ þ 1g. The sub-differential of h at x0 2 dom h, denoted by @hðx0 Þ, is stated as below where ha; bi refers to the inner product of a and b. @hðx0 Þ ¼ f y 2 Rn : hð xÞ hðx0 Þ þ hx x0 ; yi 8x 2 Rn g Additionally, the conjugate of hð xÞ, denoted by h ð yÞ, is defined as follows: h ð yÞ ¼ sup xT y hð xÞ x
8y 2 Rn
Considering g and h as lower semi-continuous convex functions, the standard form of a DC program, denoted by Pdc , is defined as follows and the functions g and h are called DC components: Pdc : inf f f ð xÞ ¼ gð xÞ hð xÞ : x 2 Rn g The convention þ 1 ð þ 1Þ ¼ þ 1 is used and hence, dom f ¼ dom g. It is worth mentioning that if the set X is convex, the problem inf f f ð xÞ ¼ gð xÞ hð xÞ : x 2 Xg can be restated as a standard DC program, as demonstrated below: inf fgð xÞ þ vX ð xÞ hð xÞ : x 2 Rn g where vX ð xÞ is an indicator function that is 0 if x 2 X, otherwise þ 1. The dual program associated with Pdc is formulated as below: Ddc : inf fh ð yÞ g ð yÞ : y 2 Rn g The point x 2 dom f is said to be a critical point to Pdc if
Efficient DC Algorithm for the Index-Tracking Problem
571
@gðx Þ \ @hðx Þ 6¼ ; With respect to above definitions, the following theorem states the necessary condition for local optimality. Theorem 1. If x is a local optimal solution to Pdc , then @hðx Þ @gðx Þ. Proof. See Dinh and Le Thi [12]. The general framework of DC algorithm (DCA) for the standard DC program Pdc is provided in Algorithm 1.
Theorem 2. The sequence xðkÞ , obtained by DCA, converges to a critical point for any arbitrary starting point xð0Þ , and the sequence g xðkÞ h xðkÞ is decreasing. Proof. See Dinh and Le Thi [12].
Corollary 1. If h (see Pdc ) is differentiable, the sequence xðkÞ , obtained by DCA, converges to a critical point to Pdc , satisfying the necessary local optimality condition. Proof. Theorem 2 indicates that the sequence xðkÞ , obtained by DCA, converges to a critical point x . Thus, we have @gðx Þ \ @hðx Þ 6¼ ;. However, since h is a differentiable function, @hðx Þ is a singleton set, and accordingly @hðx Þ @gðx Þ. Thus, x satisfies the necessary condition for local optimality, stated in Theorem 1. DCA has been successfully applied to combinatorial optimization and different classes of hard non-convex problems. For a comprehensive overview, see [13].
4 Adopting DCA to Solve VM In this section, first the MINLP model VM is equivalently reformulated as a DC program. Then, DCA is adopted to solve it.
572
4.1
F. Hooshmand and S. A. MirHassani
Reformulation of VM as a DC Program
By P relaxing the binary restriction of variable di and adding the constraints i2I di ð1 di Þ 0, and 0 di 1, VM is equivalently reformulated as VM′. ðVM0 Þ min
t X X Y t2T
i2I
1 þ r i;t0 xi
!
t0 ¼1
t Y t0 ¼1
!2 0
1 þ r t0
s:t: ð2Þð5Þ X di ð 1 di Þ 0 i2I
0 di 18i 2 I P Further, since i2I di ð1 di Þ is a nonnegative concave function, considering m as a sufficiently large positive number, the model VM′ can be equivalently reformulated as VM″, which is a DC program. This idea is taken from Moeini [6]. ðVM00 Þ min
t X X Y t2T
i2I
1 þ r i;t0 xi
!
t0 ¼1
s:t: ð2Þð5Þ; 0 di 1
t Y t0 ¼1
1 þ r 0t0
!2
m
X
! di ð 1 di Þ
i2I
8i 2 I
Therefore, the DC program VM″ is equivalent to VM and can be solved via DCA. Since DC components of the objective functions of VM″ are differentiable, with respect to Corollary 1, DCA converges to a solution satisfying the necessary local optimality condition. Algorithm 2 shows that how DCA is adopted for VM″.
The main advantage of DCA is its short running time; however, it does not guarantee the global optimality of the solution. The quality of the solution obtained by DCA depends on the starting point. Therefore, Moeini [6] suggested that instead of
Efficient DC Algorithm for the Index-Tracking Problem
573
running DCA only once, it is implemented multiple times for different starting points, and finally, the best solution found is returned. For this purpose, he examined five starting points: 1) the optimal solution to the linear programing relaxation of the original model, 2) a modified solution obtained by rounding the previous starting point, 3) the vectors with all 0 entries, 4) the vector with all 0.5 entries, and 5) the vector with all 1 entries. In the following section, we present a novel iterative algorithm based on DCA which is superior to the method of Moeini [6] in terms of solution quality.
5 Iterative DCA-Based Algorithm In our new algorithm, DCA is implemented on the following model (instead of VM″) which we refer to as restricted VM″ (RVM″). ðRVM00 Þ min
t X X Y t2T
i2I
!
1 þ r i;t0 xi
t Y
t0 ¼1
t0 ¼1
1 þ r 0t0
!2
m
X
! di ð 1 di Þ
i2I
s:t: ð2Þð5Þ; 0 di 1 8i 2 I X hsi i:di ¼1
i2I
di 1
8s 2 S
ð12Þ
hsi i:di ¼0
t X X Y t2T
X
ð 1 di Þ þ
t0 ¼1
1 þ r i;t0 xi
!
t Y t0 ¼1
1 þ r 0t0
!2
RHS
ð13Þ
The combinatorial cut (12) removes the solution dhsi from the feasible region, and the quality-regulator cut (13) ensures that the objective function value of the model VM is less than or equal to RHS. At the beginning of the algorithm, these cuts are not contained in RVM″, and they are involved as the algorithm proceeds. Moreover, the parameter UB is defined as the objective value of the best feasible solution, identified so far for the model VM, and it is initialized at þ 1. At each iteration s of the algorithm, DCA is implemented on RVM″ by starting from a given starting point, the detail of which is discussed in Remark 1 (note that at the beginning of the algorithm, due to the absence of the cuts (12) and (13), RVM″ is ~ similar to VM″). The solution returned by DCA is denoted by d; ~x . If at least one component of ~d violates the binary restriction, those components which are sufficiently close to 0 (resp. 1) are replaced by 0 (resp. 1). See Eq. (14) where n is a given accuracy. ~di :¼
8